SOLVER Project

Despite their early successes, Diagnoser and Deducer did not have a clear,
comprehensible structure that is required for the kind of experiments we wish to
perform. Galen was built to remedy this problem, taking advantage of the experience
gained in the design of Diagnoser and Deducer. Additional discussion of the structure
of GALEN can be found in prior annual reports and in the relevant publications.

To determine the generality of our model of expertise in diagnostic reasoning, we are
also investigating domains outside medicine. As with our work in congenital heart
disease, we have concentrated on the design of mechanisms for structuring problem
specific knowledge and for focusing limited computational resources.

One of the Principal Investigators has published results of a study in Expertise in Trial
Advocacy, discussing the significance of current research in expertise in legal problem-
solving. [Johnson, Johnson, and Little, 1985] Research on legal expertise in corporate
acquisition problems has also been investigated. The results of that research suggest
that expert corporate acquisition attorneys differ from novices in their greater reliance
on internalized norms, prototypes and heuristics. Both expert and novice attorneys in
the study went beyond the information provided in task cues in interpreting and
predicting actions and situation scripts in the simulated problems. The subjects
reasoned heuristically as weil as logically. Differences between attorneys in different
specialty areas were not large suggesting that the subjects within a domain of problem
solving such as legal reasoning acquire meta level reasoning skills that apply to issues
within and outside their areas of specialization.

Research is also being completed in a study of cognitive strategies used in making
strategic decisions in business. Corporate acquisitions were again used as the context in
which to examine expertise. Twenty-four executive subjects were asked to perform an
experimental task in which they evaluate companies as candidates for acquisition. The
goals of the research are to test for the existence of specialty-related reasoning
strategies and to determine the importance of strategic and financial information in
problem formulation, problem structuring and choice of Strategies in problem solving.

Research in Progress --

Since human experts are notoriously poor at describing their own knowledge, our work
requires the creation of problem solving tasks through which experts can reveal criteria
for initiating specific hypotheses and methods for investigating those hypotheses.

Current techniques of representing hypotheses and their expectations for diagnosis do
not, however, provide much detailed information about the control Processes experts use
to guide their reasoning. Such control processes typically incorporate highly refined
heuristics about which the experts are almost wholly unaware. New research is being
Proposed to investigate these control structures in legal reasoning, specifically in
reasoning by analogy in appellate decision making. Reasoning by analogy appears to be
an important inference tool used by experts in many domains as a fundamental
problem solving tool. The ability to form plausible analogies lies at the heart of much
of the expert ability to be generative when faced with unfamiliar problems. This
research will include the implementation of a cognitive simulation of the reasoning by
analogy process based upon data obtained by observation of experts solving problems,
The results of the simulation will be validated by comparison with human subject data.

We are also investigating several research questions relevant to the architecture of
Galen. We have designed an interface to Galen so that users who are unfamiliar with
the inner workings of the program can interactively enter case data. Designing the
interface raised questions about what forms of data are necessary to adequately and
completely represent all possible cases.

One project to test the extensibility of GALEN into other domains is being conducted

Privileged Communication 251 E. H. Shortliffe
SOLVER Project

by a graduate student in the Graduate School of Management. His thesis, Auditing
Internal Controls: A computational model of the review process, includes the
construction of a working expert system using GALEN. The objective of this study is
to formulate and test a model of the processes employed by audit managers and
partners in reviewing and evaluating internal accounting controls.

Another project explores the extension of the GALEN architecture into a problem in
plant pathology. The main purpose of this research is to find out how the basic
postulates about expert reasoning made in Galen hold in a second diagnostic domain.
The problem domain chosen for this purpose is Plant Pathology. In collaboration with
Professor Paul Teng of the Plant Pathology Department of the University of Minnesota
a prototype knowledge base has been implemented. Currently, the knowledge base can
diagnose ten potato diseases and has 124 rules. The system is going through evaluation
and fine tuning to bring it up to an expert performance level. This system will be
useful in the Extension Service at the Plant Pathology department at the University of
Minnesota, which provides diagnostic information to farmers over the phone lines.

Dr. Spackman's thesis is entitled “Induction of classification rules under the guidance of
comprehensibility-enhancing logical structures and diagnostic performance goals." The
purpose of this research is to study and implement methodologies for the automated
generation of comprehensible decision rules from empiric data, with emphasis upon
logic-based knowledge representation formats and upon problems drawn from the
domain of medicine. This work builds upon some of the machine learning
methodologies developed at the University of Illinois by R. S. Michalski and others.

This work addresses two shortcomings of previous work on induction of classification
rules. These are, first, lack of comprehensibility of the induced rules, and second, lack
of flexibility in specifying the diagnostic performance (sensitivity, specificity, or
efficiency) desired for the rules that are to be derived.

Comprehensibility of the derived rules or descriptions can be enhanced by imposing
restrictions upon the format which the rules may take. For example, the restriction of
Tules to a unate boolean function format allows the induction of rules that can often be
simplified to a “criteria table" type of representation. The type of diagnostic
performance a rule must have will depend upon its purpose, and specifying the purpose
may allow inductive inference algorithms to trade off small decrements in diagnostic
performance for large increments in comprehensibility, or to increase their robustness
in the face of noisy or uncertain data.

Successful development of these techniques will lead to enhanced capabilities for
deriving rule bases for expert classification systems from empiric data, and will provide
new methods for the conceptual analysis of data.

Preliminary results have been obtained for the problem of deriving rules for the
identification of bacteria based upon their biochemical profiles in the medical
microbiology lab. Other problem domains under investigation are the analysis and
interpretation of endocrine laboratory tests, and the induction of rules for the diagnosis
of congenital heart disease, for comparison with the rules used in GALEN.

Research is also under way in methods of automating knowledge acquisition in pediatric
cardiology. This is being done as thesis research by Paul Krueger. The objective of the
research is to design, implement, and test a computerized procedure to derive from
examples a nonmonotonic set of rules for an expert classification system. Systems
using such rules are generally more efficient than those using monotonic classification
processes and more closely approximate psychological models as well.

The research proposes a process for automated learning of preliminary rulebases subject
to a set of efficiency constraints which are consistent with a formally defined,

E. H. Shortliffe 252 Privileged Communication
SOLVER Project

psychologically plausible model of classification. The constraints include an upper
bound on the amount of information required to explain observations not accounted
for by the current set of beliefs, and a lower bound on the degree of inconsistency
allowed in the knowledge base at any given time. It will be shown that these constraints
can be used to guide the automated determination of both the content and organization
of the rules of expert classification systems. The result is behavior that is more focused
and efficient, and more closely duplicates the lines of reasoning of domain experts.

A fepresentational formalism for classification knowledge bases based upon a
nonmonotonic logic of belief called “autoepistemic logic” (Moore, 1985) is proposed.
Having thus defined a representation for the knowledge base the research will propose a
methodology for instantiating its concepts within a given application domain. The
general approach is to use heuristics to identify from a set of input examples various
contextual situations that occur and the types of rules to associate with them. The tule
acquisition module (RAM) is then tested in two different application domains. The
resulting expert systems will be evaluated for correctness of classification and similarity
of their lines of reasoning with those of human experts.

The major conclusion of the research is that constraints similar to those observed in
expert human classification processes can be used to guide the empirical induction of
efficient expert system rulebases. Supporting this conclusion is the elucidation of a
formal nonmonotonic model of classification, and the design and subsequent testing of
the Rule Acquisition Module and expert systems derived by it.

D. List of Relevant Publications

1. Connelly, D. and Johnson, P.E.: Medical problem solving. Human Pathology,
11(5):412-419, 1980.

2. Elstein, A., Gorry, A., Johnson, P. and Kassirer, J: Proposed Research
Efforts. IN D.C. Conneily, E. Benson and D. Burke (Eds.), CLINICAL
DECISION MAKING AND LABORATORY USE. University of Minnesota
Press, 1982, pp. 327-334.

3. Feltovich, PJ: Knowledge based components of expertise in medical
diagnosis. Learning Research and Development Center Technical Report
PDS-2, University of Pittsburgh, September, 1981.

4. Feltovich, PJ., Johnson, P.E., Moller, JLH. and Swanson, D.B: The Role and
Development of Medical Knowledge in Diagnostic Expertise. IN W. Clancey
and E.H. Shortliffe (Eds.), READINGS IN MEDICAL AI, Addison-Wesley,
1984, pp. 275-319.

5. Johnson, P.E.: Problem Solving. IN ENCYCLOPEDIA OF SCIENCE AND
TECHNOLOGY, McGraw-Hill (in press).

6. Johnson, P.E., Moen, J.B. and Thompson, W.B.: Garden Path Errors in
Medical Diagnosis. YN Bloc, L. and Coombs, MJ. (Eds.), COMPUTER
EXPERT SYSTEMS, Springer-Veriag (in press).

7, Johnson, P.E.: Cognitive Models of Medical Problem Solvers. IN Dc.
Connelly, E. Benson, D. Burke (Eds.), CLINICAL DECISION MAKING
AND LABORATORY USE. University of Minnesota Press, 1982, pp. 39-51.

8. Johnson, P.E.: What kind of expert should a system be? J. Medicine and
Philosophy, 8:77-97, 1983.

9. Johnson, P.E., The Expert Mind: A new Challenge for the Information

Privileged Communication 253 E. H. Shortliffe
SOLVER Project

Scientist IN Th. M.A. Bemelmans (Ed.), INFORMATION SYSTEM
DEVELOPMENT FOR ORGANIZATIONAL EFFECTIVENESS, Elsevier
Science Publishers B. V. (North-Holland), 1984.

10. Johnson, P.E., Severance, D.G. and Feltovich, PJ.: Design of decision support
systems in medicine: Rationale and principles from the analysis of
physician expertise. Proc. Twelfth Hawaii International Conference on
System Science, Western Periodicals Co. 3:105-118, 1979.

11. Johnson, P.E., Duran, A., Hassebrock, F., Moller, J., Prietula, M., Feltovich,
P. and Swanson, D.: Expertise and error in diagnostic reasoning. Cognitive
Science 5:235-283, 1981.

12. Johnson, P.E. and Hassebrock, F.: Validating Computer Simulation Models
of Expert Reasoning. IN R. Trappl (Ed.), CYBERNETICS AND SYSTEMS
RESEARCH. North-Holland Publishing Co., 1982.

13. Johnson, P.E. and Thompson, W.B.: Strolling down the garden path:
Detection and recovery from error in expert problem solving. Proc. Seventh
IJCAI, Vancouver, B.C., August, 1981, pp. 214-217.

14, Johnson, P.E., Hassebrock, F. and Moller, J.H.: Multimethod study of clinical

judgement. Organizational Behavior and Human Performance 30:201-230,
1982.

15. Moller, J.H., Bass, G.M., Jr. and Johnson, P.E.: New techniques in the

construction of patient management problems. Medical Education 15:150-153,
1981.

16. Sedimeyer, R.L., Thompson, W.B. and Johnson, P.E.: Knowledge-based fault
localization in debugging. The Journal of Systems and Software, vol. 3, no. 4
(Dec 83) pp. 301-307, Elsevier.

17. Sedimeyer, R.L., Thompson, W.B. and Johnson, P.E.: Diagnostic reasoning in
software fault localization. Proc. Eighth IJCAI, Karlsruhe, West Germany,
August, 1983.

18. Smith, K.A., Farm, B., Johnson, P.E.: Surface: A prototype expert system for
selecting surface analysis techniques. Proceedings of [EEE Conference on
Computers and Comm., 1985.

19. Swanson, D.B.: Computer simulation of expert problem solving in medical
diagnosis. Unpublished Ph.D. dissertation, University of Minnesota, 1978.

20. Swanson, D.B., Feltovich, PJ. and Johnson, P.E.: Psychological Analysis of
Physician Expertise: Implications for The Design of Decision Support
Systems. In D.B. Shires and H. Wold (Eds.), MEDINFO77, North-Holland
Publishing Co., Amsterdam, 1977, pp. 161-164.

21. Thompson, W.B., Johnson, P.E. and Moen, J.B.: Recognition-based diagnostic
reasoning. Proc. Eighth IJCAI, Karlsruhe, West Germany, August, 1983.

E. Funding and Support

Work on the SOLVER project is currently supported by a grant from the Control Data
Corporation to Paul Johnson ($90,000; 1983-85) and by a grant from the
Microelectronics and Information Sciences Center at the University of Minnesota to
Paul Johnson, William Thompson, James Slagle (Dept. of Computer Science), Harry

E. H. Shortliffe 254 Privileged Communication
SOLVER Project

Wechsler (Electrical Engineering), and Albert Yonas (Institute for Child Development)
($500,000; 1984-85).

Research in medical informatics is supported, in part, by a training grant from the
National Library of Medicine, LM-00160, in the amount of $712,573 for the period
1984-1989. Dr. Connelly and Prof. Johnson are participants in this grant. The post
doctoral fellowship of Dr. Spackman is funded by this grant.

“Expert system techniques for analyzing and evaluating internal accounting controls."
McKnight Foundation, $13,000 (1984-5). Paul E. Johnson and Andrew D. Bailey.

Dwan Family Fund, University of Minnesota Medical School, $6,000 (1985) to Paul
Johnson for research assistant funding on the GALEN project.

Il, INTERACTIONS WITH THE SUMEX-AIM RESOURCE
A. Medical Collaborations and Program Dissemination via SUMEX

Work in medical diagnosis is carried out with the cooperation of faculty and students
in the University of Minnesota Medical School and St. Paul Ramsey Medical Center.

B. Sharing and Interactions with Other SUMEX-AIM Projects

William Clancey, Stanford University, acted as a reviewer of the MEIS Intelligent
Systems Project in September, 1984 at the University of Minnesota. The Principal
Investigators in the SOLVER project are also principal investigators in that project.

Paul Johnson was a panel member at the SUMEX-AIM conference in Columbus, Ohio
in 1984. Dr. Connelly and two graduate students associated with the SOLVER
PROJECT also attended the conference.

IY. RESEARCH PLANS
A. Project Goals and Plans

Near term -- Our research objectives in the near term can be divided in three parts.
First, we are committed to the design, implementation, and evaluation of Galen, as
described above. We have completed an interactive front end so that physicians can
directly enter patient data, and Galen's knowledge base is currently being "tuned" with
the help of Dr. James Moller, an expert physician collaborator from the University of
Minnesota Pediatric Cardiology Clinic, the Diagnoser program, and with expert
physicians. We believe that GALEN has passed through phases of expertise assessment
and cognitive simulation and that it is now approaching a level of performance that
will qualify it as a true expert system. An objective now is to extend the explanation
capability of GALEN. We are initiating a new investigation into two aspects of expert
problem solving that relate to the interaction between a problem solving system and its
environment: “guery generation" and explanation. Some simple expert systems proceed
from a fixed set of input data to an evaluation of that data. For most problem
domains, however, the space of possibly relevant information is large, and some or all
of this information may have costs associated with its acquisition. Thus, computational
and other costs can be reduced by some mechanism which intelligently selects
appropriate queries designed to solicit information that is relevant and cost effective in
terms of the problem being solved. Expert systems for complex problem domains must
also be able to generate explanations for their actions. Unless the system operates in
an entirely autonomous manner, users must be apprised of the rationale for system
actions. There is a particular need for explanations tailored for system users rather
than system designers.

Privileged Communication 255 E. H. Shortliffe
SOLVER Project

Experienced experts are typically quite proficient at asking relevant questions, even
when the criteria for relevance is difficult to specify. These experts use heuristics
capable of keying on selected aspects of data already examined and on the current
problem state in order to select the next needed query. We propose to incorporate
these heuristics into a "query generation knowledge base” . This knowledge base can
be thought of as a form of domain specific meta-knowledge. It contains rules by
which the problem state can be efficiently evaluated in order to determine the next
course of action. By basing these rules on actual expert knowledge and experience, it
will often be possible to bypass the combinatorial complexity associated with either
blind search or optimization techniques.

Our approach to explanation starts from the premise that substantially different forms
of explanation are required within a single expert system. The type of explanation is
distinguished both by the level of sophistication of the person receiving the explanation
and by whether that person is Principally interested in the specific problem being
solved or in the internal working of the expert system. Less sophisticated users of the
system are likely to have only a superficial understanding of the nature of the system
being diagnosed and will require explanations in terms of simplified system properties
with which they are familiar. Expert users will require information about significant
details of the state of the system being diagnosed and the causal relationships that
connect system state with observable symptoms. Designers and maintainers of the
expert system require explanations in terms of the actual lines of reasoning used to
arrive at a decision.

We will be focusing principally on providing explanations for system users rather than
system designers. Explanations for users must be phrased in terms of the system being
diagnosed. Descriptions of the system itself are more important that descriptions of the
reasoning strategies used to understand the system. For example, many diagnostic tasks
are efficiently approached utilizing recognition-based reasoning strategies using
knowledge arising from empirical association. Experts (or possibly automatic learning
systems) learn to associate particular interpretations with particular patterns in the data.
For many problem domains, knowledge of this sort is quite powerful, providing
accuracy without the complexity associated with causal reasoning. The user of such a
system, however, requires explanations in terms of causality. This suggest a two-step
process. Problem solving is done using a recognition-based Strategy. Explanations are
generated by combining the results of this process with additional, causally-based
explanation knowledge.

Our second objective consists of making extensions to the knowledge capturing strategies
developed in our original work in medical diagnosis. In the near term this work will
examine descriptive strategies in which experts attempt to use a formalized language to
express what they know (e.g. production rules), observational Strategies in which experts
perform tasks designed to reveal information from which a theory of task specific
expertise can be built, and intuitive strategies in which either experts behave as
knowledge engineers or knowledge engineers attempt to perform as pseudo experts. The
research projects of Dr. Spackman and Paul Krueger which have been discussed
previously are both directed toward this objective.

Our third near term objective will be to investigate one of the central problems of
recognition based problem solving, how to classify problems when solving them.
Questions related to problem classification which we will be examining include: What
patterns do experts and novices detect in a problem that allows them to classify it as an
instance of a problem type that is already known? How does an expert make an initial
choice of the level of abstraction to be used in solving a problem? How can an expert
recover from an initial incorrect choice of levels? How can the difference between
causal and prototypic modes of reasoning be modeled as differences in levels of
abstraction, and how can a common model for these two types of reasoning be

E. H. Shortliffe 256 Privileged Communication
SOLVER Project

constructed? We will be pursuing these questions in the areas of problem solving like
law, auditing, and management, as well as in medicine.

Long range -- Our long range objective is to improve the methodology of the
"knowledge capturing” process that occurs in the early stages of the development of
expert systems when problem decomposition and solution strategies are being specified.
Several related questions of interest include: What are the performance consequences of
different approaches, how can these consequences be evaluated, and what tools can assist
in making the best choice? How can organizations be determined which not only
perform well, but are structured so as to facilitate knowledge acquisition from human
experts? In the coming year we will be exploring these questions in areas of design and
management as well as in law, management and medicine.

B. Justification and Requirements for Continued SUMEX Use

Our current model development takes advantage of the sophisticated Lisp programming
environment on SUMEX. Although much current work with Galen is done using a
version running on a local VAX 11/780, we continue to benefit from the interaction
with other researchers facilitated by the SUMEX system. We expect to use SUMEX to
allow other groups access to the Galen program. We also plan to continue use of the
knowledge engineering tools available on SUMEX.

We are working toward a Commonlisp implementation of the GALEN system and
expect to rely heavily on Commonlisp for future projects.

One of our students implemented a demonstration legal expert system in EMYCIN
using the SUMEX resource, and we still find that the resource is valuable for making
available major systems which we do not have locally, such as EMYCIN.

C. Needs and Plans for Other Computing Resources Beyond SUMEX-AIM

Our current grant from MEIS has permitted us to purchase four Perq 2 AI workstations
for our Artificial Intelligence laboratory. The availability of Commonlisp on these
machines is one reason why we expect to make use of that language in the future.

SUMEX will continue to be used for collaborative activities and for program
development requiring tools not available locally.

D. Recommendations for Future Community and Resource Development

As a remote site, we particularly appreciate the communications that the SUMEX
facility provides our researchers with other members of the community. We, too, are
moving toward a workstation based development environment, but we hope that
SUMEX will continue to serve as a focal point for the medical AI community. In
addition to communication and sharing of programs, we are interested in development
of Commonlisp based knowledge engineering tools. The continued existence of the
SUMEX resource is very important to us.

Privileged Communication 257 E. H. Shortliffe
Stanford Pilot Projects

6.3. Stanford Pilot Projects

Following are descriptions of the informal pilot projects currently using the Stanford
portion of the SUMEX-AIM resource, pending funding, full review, and authorization.

E. H. Shortliffe 258 Privileged Communication
CAMDA Project

6.3.1. CAMDA Project

CAMDA Project
CAMDA Research Staff:

Prof. Samuel Holtzman, Co-PI Engineering-Economic Systems
Prof. Ronald A. Howard, Co-PI Engineering-Economic Systems
Prof. Ross Shachter Engineering-Economic Systems
Leonard Bertrand Engineering-Economic Systems
Jack Breese Engineering-Economic Systems
Kazuo Ezawa Engineering-Economic Systems
Keh-Shiou Leu Engineering-Economic Systems
Seok Hui Ng Engineering-Economic Systems
Emilio Navarro Engineering-Economic Systems
Dr. Adam Seiver Engineering-Economic Systems
Joseph Tatman Engineering-Economic Systems
Dr. Emmet Lamb School of Medicine

Dr. Robert Kessler School of Medicine

Dr. Frank Polansky School of Medicine

Associated faculty:

Prof. Edison Tse Engineering-Economic Systems

I. SUMMARY OF RESEARCH PROGRAM
A. Project Rationale

The Computer-Aided Medical Decision Analysis (CAMDA) project is an attempt to
develop intelligent medical decision systems by combining the descriptive generality of
expert-system technology with the normative power of decision analysis.

B. Medical Relevance and Collaboration

The primary effort of the CAMDA project during 1984 and early 1985 has been
focused on the design and implementation of RACHEL, an intelligent decision system
for infertile couples. This system is designed to help patients and physicians deal with
difficult medical treatment choices. RACHEL is being developed in close cooperation
with the Engineering-Economic Systems Department, the Obstetrics and Gynecology
Department, and the Surgery Department (Urology Division), all at Stanford.

In addition to the development of RACHEL, there are several active research programs
within the CAMDA project. One such program is aimed at developing a representation
for dynamic decision processes (such as those faced by cancer patients) that do not
necessarily satisfy the Markov assumption. Another is concentrating on the
development of fast algorithms for the solution of general decision problems.

A recent addition to our research project is a program to design cost-effective strategies
for monitoring the recurrence of bladder cancer.

Privileged Communication 259 E. H. Shortliffe
CAMDA Project

C. Highlights of Research Progress

C.1 Accomplishments this past year

We have successfully implemented a pilot-level version of RACHEL. As we define it,
a pilot system is one where the essential algorithms work individually as well as
interactively with one another, Operating with knowledge that is representative of the
system's domain. Such a system lacks two important elements that must exist within a
prototype-level implementation: an extensive knowledge base, and a front end usable by
trained users who may not be familiar with the details of the system.

As part of the development of RACHEL, we have developed a facility to construct
individualized models of the patient's preferences over the set of possible outcomes of
an infertility therapy. This facility operates in two consecutive stages. The first stage
constructs a parametric model from a library of plausible model elements. A typical
consideration at this stage is whether to explicitly account for the patient's lifetime.
For instance, a treatment strategy which involves Surgery would warrant such explicit
consideration, whereas a therapy consisting strictly of drugs would not. The second
Stage in the preference model development process involves the assessment of specific
parametric values. These values are obtained directly from the patient to ensure that
the overall preference model genuinely reflects his or her desires.

It is important to note that since the preference model is built to fit the specific needs
of each case, the interaction between the patient and the system is short and well-
focused. In particular, the patient is only asked to respond to a few (about five to ten)
questions. These questions are selected so that their relevance to the case is intuitively
obvious from the patient's point of view.

Also as a part of RACHEL, we have developed a knowledge base dealing with the
decisions faced by the subset of infertile couples whose inability to conceive has been
traced to a blockage of the Fallopian tubes of the female partner. In particular, the
knowledge in RACHEL deals with the choice between two important procedures
pertinent to this condition: laparotomy and in-vitro fertilization.

Another accomplishment during this past research year has been the improvement of
our influence-diagram solution procedure. In_ its original form; this procedure
essentially took a brute-force approach to the solution of well-formed influence
diagrams. Although its solutions were mathematically correct, the program was
inefficient in terms of both computational time and storage requirements. In its
current implementation, the program is considerably more efficient and has an adequate
front end which makes it accessible to a fairly wide class of users. Empirical results
indicate that the size and complexity of problems that can be Tepresented and solved
with the system not only exceed the bounds of its original design, but are comparable
and possibly superior to those of the best commercially available decision-analytic
software.

Similarly, RACHEL's inference engine has been improved in several important ways.
Prominent among these are a means for attaching general procedures at any point in
the inference process, a variety of built-in procedures for the acquisition and display of
information coupled with a facility for controlling these procedures (i.e., for the control
of ASKability and TELLability), and a simple explanation mechanism.

C.2 Research in progress

The RACHEL system continues to be developed along four distinct directions: the
efficiency and flexibility of RACHEL's inference engine are being improved, its
explanation mechanism is being enhanced, RACHEL’s facility for the development of
patient preference models is being upgraded, and its knowledge base is being enlarged.

E. H. Shortliffe 260 Privileged Communication
CAMDA Project

As it is currently implemented, the inference engine used by RACHEL is quite
inefficient. This inefficiency is, to some extent, a deliberate design choice since the
engine was designed to be very general and highly modular. Thus, there are many
procedural redundancies and much unnecessary baggage in the programs that implement
it. Now that we have a clearer idea of how the engine is to be used we have redesigned
it by doing away with some of the original generality and modularity in favor of a
more efficient process. Furthermore, the new design emphasizes and enhances
particularly useful engine features such as its ASKability and its TELLability.

A further enhancement to RACHEL's inference engine concentrates on the system's
ability to explain its line of reasoning. The original design only responds to online
“why” queries by displaying its dynamic goal stack. In its new form, the engine allows
offline as well as online queries in both “why” and "how" formats.

Beyond traditional explanation capabilities, we are exploring possible means to explain
decision-theoretic inferences. In particular, we are trying to understand how to explain
decision recommendations that are based on the maximization of expected utility to
users unfamiliar with decision theory. Our current research indicates that a promising
way to do this is to break down large decision problems into smaller, more manageable
Pieces whose formal solution can be checked against intuition. Although still at an
early stage, this line of research seems to be on the path of eliminating an important
barrier to the widespread use of normative decision techniques.

An exciting area of current interest is the improvement of RACHEL's facility for the
creation and assessment of parametric models of patient preferences. In particular, we
are trying to increase the generality of RACHEL’s model library to account for acute as
well as chronic conditions and to simplify the corresponding assessment process. This
simplification is based on the notion that a better understanding of the major concerns
of patients can help us redesign the questions asked by RACHEL so that they are closer
to the specific experiences of individual patients. As part of this effort, we expect to
have significant contact with actual patients to ensure the clinical relevance of our
research.

A fourth area where RACHEL is being enhanced is the expansion of its medical and
decision-analytic knowledge bases. Planned additions include further knowledge about
the treatment of tubal blockage (including more data on in-vitro fertilization
procedures and an ability to consider a wider class of patients) and a new packet of
knowledge dealing with deterministic sensitivity analysis.

In addition to the development of RACHEL, there are several active research programs
within the CAMDA project. One such program is aimed at developing a representation
for dynamic decision processes (such as those faced by cancer patients) that do not
necessarily satisfy the Markov assumption. This research has led to a generalization of
influence diagrams which allows multiple value nodes. This generalization makes it

possible for complex sequential decision processes (whose solution would otherwise be
infeasible) to be efficiently solved.

Another research program within the CAMDA project is the development of fast
algorithms for the solution of decision problems formulated as influence diagrams. In
general, the solution of an influence diagram (i.e., the calculation of a recommended
decision strategy) is obtained by the repeated application of an operation, known as
“removal”, to all nodes in the diagram other than the value node. The removal of a
node in the diagram is a generalization of the foldback Operation needed to solve a
decision tree. With rare exceptions, the order in which nodes are removed from a
diagram is not unique. Current results indicate that Significant reductions in the
computational burden of solution can be achieved by controlling the order in which
diagram nodes are selected for removal.

Privileged Communication 261 E. H. Shortliffe
CAMDA Project

At a more fundamental level, we are exploring the consolidation of the predicate
calculus with probabilistic logic. Of particular interest is the design of an integrated
inference engine that performs logical inferences within a probabilistic framework. A
central problem in this research is the definition of universal and existential
quantification in probabilistic terms.

A recent addition to our research project is a program to design cost-effective strategies
for monitoring the recurrence of bladder cancer. We expect this research to interact
with our ongoing search for more effective models of patient preferences.

D, Publications
1. Holtzman, S.:A Model of the Decision Analysis Process, Department of
Engineering-Economic Systems, Stanford University, Stanford, California,
1981.
2. Holtzman, S.:A Decision Aid for Patients with End-Stage Renal Disease,
Department of Engineering-Economic Systems, Stanford University,
Stanford, California, 1983.

3. Holtzman, S.:On the Use of Formal Models in Decision Making, Proc.
TIMS/ORSA Joint Nat. Mtg., San Francisco, May, 1984.

4.(*) Holtzman, S.: Intelligent Decision Systems, Ph.D. Dissertation,
Department of Engineering-Economic Systems, Stanford University,
Stanford, California, 1985.

5. Shachter, R.: Evaluating Influence Diagrams, Department of Engineering-
Economic Systems, Stanford University, Stanford, California, 1984.

6. Shachter, R.: Automating Probabilistic Inference, Department of
Pap neering- Economic Systems, Stanford University, Stanford, California,
984.
E. Funding Support
EI Principal Funding Source
E.L1. Title of gift
"Research on Intelligent Decision Systems”.
E.I.2. Principal investigator
Samuel Holtzman, Ph.D.
Consulting Assistant Professor
Department of Engineering-Economic Systems
Stanford University
E.1.3. Funding source
Olivetti Advanced Technology Center, Inc.
E.L5. Funding amount

$33,400 (Direct Costs), unrestricted.

E. H. Shortliffe 262 Privileged Communication
CAMDA Project

E.II Additional Funding Source
E.1. Title of gift

"Cost-effective strategies in monitoring for recurrence of
bladder cancer”

E.II.2. Principal Investigators

Ross Shachter, Ph.D. -- PI

Assistant Professor

Department of Engineering-Economic Systems
Stanford University

Linda Shortliffe, M.D. -- Co-PI
Palo Alto Veterans Administration Hospital

Dan Kent, M.D. -- Co-PI
Division of General Internal Medicine
Stanford University Medical Center

Samuel Holtzman, Ph.D. -- PI: CAMDA Project (SUMEX)
Consulting Assistant Professor

Department of Engineering-Economic Systems

Stanford University

E.II.3. Funding agency

Stanford’s American Cancer Society Institutional Research
Grant Committee

E.IL5. Total award
$4634 (Direct Costs), for the year Starting April 1, 1985

E.II Other Funding

E.III.2 Donated Equipment

The CAMDA project has access to the facilities of the Decision Systems Laboratory
(DSL) in the Department of Engineering-Economic Systems, and constitutes the
laboratory's most active research project. The DSL maintains several terminals, printers
and personal computers for research on the development of computer-based decision
systems. The majority of the terminals and printers were donated to the DSL by Qume
Corporation. Olivetti Advanced Technology Center, Inc., has made four M24 personal
computers and two high-quality printers available to the DSL on a “Beta-test-site”
basis. MAD Computer, Inc., has also contributed to the support of the CAMDA project
through the consignment of a MAD-1 personal computer.

Il. INTERACTIONS WITH THE SUMEX-AIM RESOURCE
IT.A Medical Collaborations and Program Dissemination Via SUMEX

Privileged Communication 263 E. H. Shortliffe
CAMDA Project

Since its inception, the CAMDA project has benefited from an active relationship
among decision analysts, computer scientists, and members of the Stanford medical
community. In particular, RACHEL is being developed in close cooperation with
physicians in the Infertility Clinic at Stanford. Other programs within the CAMDA
project such as our research on the form and use of medical preference models are
being done in cooperation with physicians at the Palo Alto Veterans Administration
Hospital and at El Camino Hospital.

II.B. Sharing and Interactions with other SUMEX-AIM Projects
1T.B.1 SUMEX-AIM 1984 Workshop:

Samuel Holtzman participated in the 1984 AIM workshop in Columbus, Ohio. In
addition to the presentation of a summary of CAMDA research, he had many
opportunities to interact with workshop participants on an informal basis. Of
particular interest were several discussions with members of the MIT/TUFTS group
interested in medical decision analysis which have led to an interchange of ideas that
continues to this date.

IT.B.2 Decision Systems Laboratory Research Meetings

As part of the CAMDA project, we have instituted a weekly research meeting for those
interested in the design and implementation of computer-based decision systems. This
weekly meeting has become a very active forum for the presentation of research results.
The following topics of direct relevance to medical decision making were presented
during the last two academic quarters.

Date Speaker Topic

03-OCT-84 Ross Shachter Probabilistic Inference

17-OCT-84 Jack Breese Dempster-Shafer Theory

24-OCT-84 Kazuo Ezawa Efficiency in Solving Influence Diagrams

07-NOV-84 Majid Khorram Fuzzy Sets and Decision Making

14-NOV-84 Dan Kent Utility Theory Underlying Physicians’
Treatment Thresholds: HELP!

21-NOV-84 Yann Bonduelle Explanation in Decision Systems

09-JAN-85 Ross Shachter What Do You Call the Offspring of
SUPERID and INFLUENCE?

23-JAN-85 Doug Logan The Value of Probability Assessment

O6-FEB-85 Seok Hui Ng Minimal Tumor Follow-up Examination
Schedule for Recurrent Bladder Cancer
Patients.

13-FEB-85 Keh-Shiou Leu TEREISIAS’ Explanation Facility

06-MAR-85 Joe Tatman Algorithm for Decision Processes
Optimization

13-MAR-85 Gerald Liu (UC) Knowledge Structure in Evidential
Reasoning

II.B.3 Course in Medical Decision Analysis

A new course in medical decision analysis, taught by Prof. Samuel Holtzman, is being
offered for the first time during the Spring quarter of 1985. The course is offered
jointly by the Engineering-Economic Systems Department, the Medical Information
Sciences Program, and the Computer Science Department. The objective of the course
is to expose students to the practice of decision analysis for clinical purposes and to
introduce them to the design and use of computer-based medical decision tools.

E. H. Shortliffe 264 Privileged Communication
CAMDA Project

I1.C. Critique of Resource Management

The CAMDA project is heavily dependent upon the availability of the SUMEX
computing resource. The physical facility as well as the staff of SUMEX-AIM are
excellent. In particular, it has been a pleasure to deal with Ed Pattermann, who is
invariably courteous, responsive to our needs, and effective in his actions. We will
certainly miss him now that he has moved to industry. Pam Ryalls has also provided
much needed help in managing the CAMDA project in a manner that is friendly and
efficient.

As an update to last year's report, the previously reported Ethernet deficiencies have
been corrected. This improvement was part of a campus-wide effort to improve
Stanford's computer network which directly affected our campus connection to SUMEX.
The system load on SUMEX continues to be heavy, although it appears to be somewhat
lower than it was last year. The ability of the CAMDA project to use the
DECSYSTEM-2020 machine operated by SUMEX (referred to as TINY) has had a
significant effect on our ability to demonstrate our systems during normal business
hours, further reducing our frustration with the main system's load.

III, RESEARCH PLANS
I11.A Project Goals and Plans

During the upcoming year, we intend to enhance four specific elements of the
RACHEL system: its inference mechanism, its explanation facility, its ability to model
patient preferences, and its medical and decision-analytic knowledge bases.
Furthermore, we intend to continue to improve our understanding of normative
decision methodologies, with particular emphasis on the use of these methodologies for
computer-based decision support. Section I.C.2 describes the near-term goals of the
CAMDA project in more detail. Our long-term goal remains that of designing and
implementing usable, fully-validated and documented systems for medical decision
support.

ITI.B Justification and Requirements for Continued SUMEX Use

The CAMDA project is truly interdisciplinary. It draws on elements of decision
analysis, artificial intelligence, and medical science. The project has the potential to
contribute to each of these disciplines in important ways.

In particular, the CAMDA project is likely to lead to the development of tools and
techniques that greatly improve the quality of decision making in medicine. For
instance, RACHEL explicitly considers uncertainty, decision alternatives, and patient
preferences in developing recommendations. In spite of its generality, RACHEL’s
interaction with the user is sufficiently terse and simple to support the claim that
systems based on its methodology can be effective clinical decision tools. Much of the
simplicity and terseness of RACHEL's operation is a direct consequence of the AI
foundations of the system's design.

The heavy reliance of the CAMDA effort on artificial intelligence technology make
SUMEX-AIM an ideal environment in which to pursue this research.

III.C Needs and Plans for other Computing Resources beyond SUMEX-AIM

The CAMDA project has access to four Olivetti M24 and one MAD-1 personal
computers (IBM-PC type) as well as to one Apple Macintosh (128K) computer. In
addition, we continue to search for funds to acquire one or more state-of-the-art LISP
machines.

III.D Recommendations for Future Community and Resource Development

Privileged Communication 265 E. H. Shortliffe
CAMDA Project

What would be the effect of imposing fees for using SUMEX resources (computing and
communications) if NIH were to require this?

A major benefit provided by the existing SUMEX-AIM facility is the availability of
very low-cost computing resources. Access to these resources is granted primarily on
the basis of an assessment of the value of the proposed research to the overall goal of
making artificial intelligence a useful medical tool. Imposing fees for using SUMEX
would prevent users with modest means from obtaining access to the facility on the
basis of merit alone.

Do you have plans to move your work to another machine workstation and if so, when
and to what kind of system?

The CAMDA project has access to several personal computers for its research. These
machines include Olivetti M24's (marketed as the A.T.&T. personal computer in the
U.S.) and a MAD-1 personal computer -- all of which are compatible with the IBM-
PC. In addition, the project has purchased an Apple Macintosh. These machines are
used as a supplement to the SUMEX mainframe, and are not intended to replace it.

E. H. Shortliffe 266 Privileged Communication
REFEREE Project

6.3.2. REFEREE Project

REFEREE Project

Bruce G. Buchanan, Ph.D.
Computer Science Department
Stanford University

Byron W. Brown, Ph.D.
Dept. of Biostatistics
Stanford University

Daniel E. Feldman, Ph.D., M.D.
Department of Medicine
Stanford University

I. SUMMARY OF RESEARCH PROGRAM
A. Project Rationale

The goal of this project is two-fold: (a) use existing AI methods to implement an
expert system that can critique medical journal articles on clinical trials, and (b) in the
long term, develop new AI methods that extract new medical knowledge from the
clinical trials literature. In order to accomplish (a) we are building the system in three
stages.

1. System I will assist in the evaluation of the quality of a single clinical trial.
The user will be imagined to be the editor of a journal reviewing a
manuscript for publication, but the program will be tested on a variety of
readers, including clinicians, medical scientists, medical and graduate

students, and clerical help.

2. System II will assist in the evaluation of the effectiveness of the treatment
or intervention examined in a single published clinical trial. The user will
be imagined to be a clinician interested in judging the efficacy of the
treatment being tested in the trial.

3. System III will assist in the evaluation of the effectiveness of a single
treatment examined in a number of published clinical trials.

B. Medical Relevance

The burden of "keeping up with the literature" is particularly onerous in the practice of
medicine and in medical research [62, 63]. Reading the abstracts in a few journals and
selecting several key articles for a rapid survey are the best that most clinicians can
hope to accomplish each week. The time and effort necessary for a thorough and
critical reading of even a few research reports are not available! Sackett Teports that
to keep up with the 10 leading journals in internal medicine a clinician must read 200
atticles and 70 editorials per month [63]. It was also estimated that the biomedical

lin an informal check on this intuition two of us, with considerable training in analyzing clinical trials
(BWB and DEF) timed critical readings of a five page article on a clinical trial in the New England Journal
of Medicine [4]. Our times were 30 and 120 minutes.

Privileged Communication 267 E. H. Shortliffe
REFEREE Project

literature is expanding at a compound rate of 6% to 7% per year, or doubling every 10
~ 15 years [63, 59]. Furthermore, even if more time were available the statistical and
epidemiological skills necessary for critical reading are not part of most clinicians’
Tepertoires*; and yet decisions about which therapy to use, what intervention to adopt,
or what advice to give patients must be based on a combination of clinical experience
and published literature. But the existing literature is often confusing and
contradictory [42]Jand publication in the most prestigious medical journals does not
guarantee freedom from serious methodologic flaws and erroneous conclusions (44, 18].
Any assistance to the clinician must deal with both the problem of the vastness of the
literature and the quality of the research report. Similar problems are faced by the
editors of medical journals, swamped with manuscripts to review and evaluate, and by
Tesearch scientists and academicians trying to stay abreast of the developments in their
fields. How can they cover more and yet evaluate better and more consistently?
Clearly any machine assistance would be welcome.

C. Highlights of Progress
This project is just getting started.

Preliminary work has been done on REFEREE [23], a prototype expert system for
determining the quality of a clinical trial report, and the efficacy of the intervention
evaluated in the trial. REFEREE is written in EMYCIN, a rule-based programming
language which allows rapid prototyping of a consultation system that gives advice to a
user. It presupposes that a knowledge base about the problem area has been
constructed, which usually involves codifying an expert's knowledge.

The basic format of a REFEREE session is fairly simple. The reader is asked a series
of questions pertaining to the paper and the Study described. The answers given are
used to rate the overall quality of the paper and the probable efficacy of the treatment
described. (See sample dialogs below).

In the first version of REFEREE, after the program has finished with its chain of
questions and deductions, the quality of the paper and the efficacy of the drug are
given to the user as a “merit score", an integer between 0 and 10, with 10 indicating the
highest quality. Additionally, the user is provided with a series of English language
messages indicating the main flaws detected in the paper. The merit score was used
because the expert system makes its judgements by using a weighted average of values
assigned to each aspect of the paper being critiqued. As the user answers the
consultant's questions, the answers are given individual merit scores. For example, if
the user's answer indicate that experimental blinding was done correctly, the paper is
given a high score in the blinding category. When all merit score assignments have
been made, the total merit score is calculated as a weighted average of the categorical
merit scores, with those categories that are more crucial to a good paper or clinical trial
being given a higher weight.

The final result of this calculation is a number between 1 and 10 which serves as a
quality measure for the paper or the treatment. A 1 indicates low quality; a 10 indicates
the highest quality. An integer as a final result, however, can be very cryptic. It is
usually quite difficult, given just an integer, to understand or believe the findings of the
consultant. It was discovered quite early that users, when presented with just the bare
merit score of the paper, would want to know why the paper was rated in the way it
was. For this reason, English language statements are given to the user, indicating the
nature of the main flaws of the paper. In each category, if the calculated merit score is

 

Ih recent survey of the statistical methods used by authors in the New England Journal of Medicine
indicated that 42 per cent of the articles surveyed relied on statistical analysis beyond descriptive
Statistics [15].

E. H. Shortliffe 268 Privileged Communication
REFEREE Project

found to be less than an arbitrary minimum, this is noted in a sentence or two, and
given to the user at the end of the consultation. In this way, the user not only gets an
overall picture of the quality of the paper, but also an indication of the general areas
in which the paper was found to be lacking.

Several problems were found in the original version of REFEREE. It was discovered
that the use of a weighted average precluded the use of EMYCIN's certainty factors.
Because of this, the user would often be forced to choose from a fairly limited set of
possible answers to the consultant's questions. The lack of versatility implied by this
constraint dictated that a new approach which could make full use of EMYCIN's
certainty factors should be used.

In order to do this, the old rule base was scrapped, and a new one was written. Instead
of deciding on a rating between one and ten to indicate quality, the new version simply
decides whether or not the paper in question is of “high academic and scholarly
quality”, with an EMYCIN certainty factor modifying the conclusion. For example, in
the case of a mediocre paper, the program would conclude that the paper was of “high
quality”, but only with a certainty of say, .5, on a scale between -1 and 1. Though the
words “certainty factor" are used for historical reasons, our final number is the
equivalent of a merit score.

While at first glance the two approaches seem similar, the second approach was found
to be much more flexible and satisfying from the user's standpoint. Since the
conclusion is in terms of the programs certainty that the paper's quality is good, the
user may incorporate his or her own uncertainty into the dialogue with the program.
This was accomplished by asking mainly yes/no questions, and at all times allowing the
user to indicate his or her certainty in the answers given. Thus, if the program asks
the user if the quality of the paper's literature review was high, he or she can answer
simply “yes” or "no", indicating complete confidence in the answers, or modify a
yes/no answer with a certainty factor, indicating that he or she is not completely
certain. The user's answers, along with the uncertainty indicated by him or her, will be
combined by EMYCIN to give a final conclusion on the paper's quality.

As an example, one of the old-style rules might have been something like this: If the
user indicates that the literature review is of "poor quality", conclude that the merit of
the paper is 3 with a (built-in) weight of 2. After all the merit values had been
calculated, a weighted average, (using built-in weights) would be taken to come to the
final merit score. In contrast, one of the new rules would be of the form: If the user
gives a “yes” answer to the question “Is the literature review thorough and balanced?”,
conclude that the paper is of good quality with a certainty of .3. While in the first
case the user was limited to a set of possible answers (e.g. excellent, good, poor), the
second rule gives the user the opportunity to answer either yes or no, and qualify that
answer with any degree of certainty desired. If, in the second rule, the user gives a
certainty of less than 1 that the literature review was of good quality, the inferred
conclusion about the quality of the paper will be automatically downgraded as well. In
other words, if the user expresses uncertainty, the conclusion about the quality of the
paper will be less certain.

The new approach, in addition to supplying the user with the ability to express varying
degrees of uncertainty, also allows for a hierarchical question structure. At any point,
if the user is unclear of the appropriate response, the program can prompt with further,
more detailed questions, until a conclusion about the original question can be provided.
Conversely, whenever a user is willing to give an answer, the program will refrain from
dwelling on the issue and omit its long series of sub-questions. In this manner the
amount of detail provided can be individualized.

This current version of REFEREE has two hundred rules and has been tested by the
present research team on several papers. It is this program that will be expanded as
described in Section III-A. Part of a sample consultation is shown below.

Privileged Communication 269 E. H. Shortliffe
REFEREE Project

tec tece- MEDICINE-1--------
The first paper of MEDICINE-1 will be referred to as:

-------- PAPER-1--------
-------- STATISTICS-1--------

1) What is the size of the control sample?
os 25

2) How many of the subjects in the centro} sample responded to
treatment?
14
3) What is the size of the test sample?
23

4) How many of the subjects in the test sample responded to

treatment?

oe 23

cae eecs PLANNING-1--------

9) Was there an explicit stopping rule defined before the experiment
was run?

oe N

cow eceene RANDOMIZATION- 1-~------

10) Was there any mention of the use of randomization in patient
assignment?
Y

11) hi the assignment of subjects in the experiment performed blindly?
* UN

alia BLINDING- 1-----~---

16) Was the experiment double blinded, or was any mention made of
blinding in the experiment?

oe

17) Was there any mention of an effort to make the placebo and
medication as similar as possible?

oo Nn

soe

The strength of the evidence indicating the efficacy of PAPER-1 is as
follows:

There is some evidence for efficacy, but further study is needed.

The general quality of the paper is as follows:
The current paper is of poor quality.

The flaws of the current paper are as follows:

A stopping rule was not defined or was not adhered to in the
experiment.

The measures taken to evaluate subject compliance were inadequate or
non-existent,

Subjects ware not randomly assigned treatment groups, seriously
weakening the validity of the conclusions.

Though an effort was made to blind the experiment, the techniques
used were not effective.

The final calculated efficacy of the drug as indicated by the given clinical

trial (between 0 and 10, with a score of 10 being the highest) is as
follows:
5.

The final merit of the current paper is as follows:
3.

23) Are there any other papers on MEDICINE-17
ee N

24) Do you want the results of this consultation output to a file?
ee WN

E. H. Shortliffe 270 Privileged Communication
REFEREE Project

E. Funding Support

Grant applications submitted to the NLM:

Title: Understanding and Critiquing Clinical Trials Literature

PI's: Bruce G. Buchanan, Byron W. Brown

Agency: National Library of Medicine (Pending)

Total Amount: $178,923.

Dates: July 1, 1985 - June 30, 1988

Il. INTERACTIONS WITH THE SUMEX-AIM RESOURCE
A. Medical Collaborations

Dr. D. Feldman is a physician and epidemiologist at the Stanford Center for Disease
Prevention. Prof. B. Brown is currently teaching a Medical School class on reading
medical journal articles.

B. Interactions with other SUMEX-AIM projects

Our interactions have all been through the Knowledge Systems Laboratory where we
have discussed design and implementation issues.

C. Critique of Resource Management

The SUMEX staff has been most cooperative in helping get this project started. We
have tried to place few demands on the SUMEX staff, but have received prompt
answers to all questions.

Ill. RESEARCH PLANS
A. Goals & Plans

It is proposed to construct three computer-based expert systems to assist a variety of
different readers in the evaluation of an extensive but well defined area of the medical
literature, clinical trials. It is further proposed to test the hypothesis that such
programs will enable a variety of users to read the literature on clinical trials more
more critically and more rapidly.

The expert systems will be developed using the EMYCIN programming environment
and the production rule approach followed successfully in previous expert systems
[24, 36, 43, 48, 6].

The three programs to be developed are separate, but closely related:

1. System I will assist in the evaluation of the quality of a single clinical trial.
The user will be imagined to be the editor of a journal reviewing a
manuscript for publication, but the program will be tested on a variety of
readers, including clinicians, medical scientists, medical and graduate
Students, and clerical help.

2. System IT will assist in the evaluation of the effectiveness of the treatment
or intervention examined in a single published clinical trial. The user will

Privileged Communication 271 E. H. Shortliffe
REFEREE Project

be imagined to be a clinician interested in judging the efficacy of the
treatment being tested in the trial.

3. System III will assist in the evaluation of the effectiveness of a single
treatment examined in a number of published clinical trials.

Within the duration of this research it is also proposed to test the first two systems
against unassisted evaluations by the various categories of readers. The testing will
include a formal testing of the programs by comparing the speed and number of flaws
found in using the program with similar measurements on unassisted reading. In
addition there will be a more informal evaluation by questionnaire of the subjective
impressions of users of the program, ascertaining the likelihood of routine use and the
value of such a program to the user.

This proposal with its concentration on clinical trials is regarded as the initial step in a
more general research goal - building computer systems to help the clinician and
medical scientist read the medical literature more critically.

B. Justification for continued SUMEX use

We will continue to use SUMEX for developing the AI methods. We need EMYCIN at
the moment because it provides a good environment for building a rule-based system

that may grow to many hundreds of rules.) EMYCIN is not available on other machines
without substantial cost.

C. Need for other computing resources

In the short term we will not need additional resources. Should we decide to
implement a new system in a framework other than EMYCIN, we might seek funding
to buy a LISP workstation.

D. Recommendations
Although our use has been small, we find the load average on SUMEX often precludes

running test cases during the day. We have no specific recommendation, but would like
to have access to small amounts of high quality computer time.

E. H. Shortliffe 272 Privileged Communication
National AIM Pilot Projects

6.4. National AIM Pilot Projects

Following is a description of the informal Pilot projects currently using the national
AIM portion of the SUMEX-AIM resource, pending funding, full review, and
authorization.

Privileged Communication 273 E. H. Shortliffe
PATHFINDER Project E. H. Shortliffe

6.4.1. PATHFINDER Project

PATHFINDER Project

Bharat Nathwani, M.D.
Department of Pathology
University of Southern California

Lawrence M. Fagan, M.D., Ph.D.
Department of Medicine
Stanford University

I. SUMMARY OF RESEARCH PROGRAM
A. Project Rationale

Our project addresses difficulties in the diagnosis of lymph node pathology. Five studies
from cooperative oncology groups have documented that, while experts show agreement
with one another, the diagnosis made by practicing pathologists may have to be changed
by expert hematopathologists in as many as 50% of the cases. Precise diagnoses are
crucial for the determination of optimal treatment. To make the knowledge and
diagnostic reasoning capabilities of experts available to the practicing pathologist, we
have developed a pilot computer-based diagnostic program called PATHFINDER. The
project is a collaborative effort of the University of Southern California and the
Stanford University Medical Computer Science Group. A pilot version of the program
provides diagnostic advice on 80 common benign and malignant diseases of the lymph
node based on 150 histologic features. Our research plans are to develop a full-scale
version of the computer program by substantially increasing the quantity and quality of
knowledge and to develop techniques for knowledge representation and manipulation
appropriate to this application area. The design of the program has been strongly
influenced by the INTERNIST/CADUCEUS program developed on the SUMEX
Tesource.

A group of expert pathologists from several centers in the U.S., have showed interest in
the program and helped to provide the structure of the knowledge base for the
PATHFINDER system.

B. Medical Relevance and Collaboration

One of the most difficult areas in surgical pathology is the microscopic interpretation
of lymph node biopsies. Most pathologists have difficulty in accurately classifying
lymphomas. Several cooperative oncology group studies have documented that while
experts show agreement with one another, the diagnosis rendered by a “local”
pathologist may have to be changed by expert lymph node pathologists (expert
hematopathologists) in as many as 50% of the cases.

The National Cancer Institute recognized this problem in 1968 and created the
Lymphoma Task Force which is now identified as the Repository Center and the
Pathology Panel for Lymphoma Clinical Studies. The main function of this expert
panel of pathologists is to confirm the diagnosis of the “local” pathologists and to
ensure that the pathologic diagnosis is made uniform from one center to another so
that the comparative results of clinical therapeutic trials on lymphoma patients are
valid. An expert panel approach is only a partial answer to this problem. The panel is

E. H. Shortliffe 274 Privileged Communication
E. H. Shortliffe PATHFINDER Project

useful in only a small percentage (3%) of cases; the Pathology Panel annually reviews
only 1,000 cases whereas more than 30,000 new cases of lymphomas are reported each
year. A Panel approach to diagnosis is not practical and lymph node pathology cannot
be routinely practiced in this manner.

We believe that practicing pathologists do not see enough case material to maintain a
high-level of diagnostic accuracy. The disparity between the experience of expert
hematopathology teams and those in community hospitals is striking. An experienced
hematopathology team may review thousands of cases per year. In contrast, in a
community hospital, an average of only 10 new cases of malignant lymphomas are
diagnosed each year. Even in a university hospital, only approximately 100 new
patients are diagnosed every year.

Because of the limited numbers of cases seen, pathologists may not be conversant with
the differential diagnoses consistent with each of the histologic features of the lymph
node; they may lack familiarity with the complete spectrum of the histologic findings
associated with a wide range of diseases. In addition, pathologists may be unable to
fully comprehend the conflicting concepts and terminology of the different
classifications of non-Hodgkin's lymphomas, and may not be cognizant of the
significance of the immunologic, cell kinetic, cytogenetic, and immunogenetic data
associated with each of the subtypes of the non-Hodgkin's lymphomas.

In order to promote the accuracy of the knowledge base development we will have
participants for multiple institutions collaborating on the project. Dr. Nathwani will be
joined by experts from Stanford (Dr. Dorfman), St. Jude's Children's Research Center
~~ Memphis (Dr. Berard) and City of Hope (Dr. Burke).

C. Highlights of Research Progress
C.1 Accomplishments This Past Year

Since the project's inception in September, 1983, we have constructed several versions of
PATHFINDER. The first several versions of the program were rule-based systems like
MYCIN and ONCOCIN which were developed earlier by the Stanford group. We soon
discovered, however, that the large number of overlapping features in diseases of the
lymph node would make a rule-based system cumbersome to implement. We next
considered the construction of a hybrid system, consisting of a rule-based algorithm
that would pass control to an INTERNIST-like scoring algorithm if it could not
confirm the existence of classical sets of features. We finally decided that a modified
form of the INTERNIST program would be most appropriate. The original version of
PATHFINDER is written in the computer language Maclisp and runs on the SUMEX
DEC-20. This was transferred to Portable Standard Lisp (PSL) on the DEC-20, and
later transferred to PSL on the HP 9836 workstations. Two graduate students, David
Heckerman and Eric Horvitz, designed and implemented the program.

C.1 The PATHFINDER knowledge base

The basic building block of the PATHFINDER knowledge base is the disease profile or
frame. The disease frame consists of features useful for diagnosis of lymph node
diseases. Currently these features include histopathological findings seen in both
low- and high-power magnifications. Each feature is associated with a list of
exhaustive and mutually exclusive values. For example, the feature pseudofollicularity
can take on any one of the values absent, slight, moderate, or prominent. These lists of
values give the program access to severity information. In addition, these lists
eliminate obvious interdependencies among the values for a given feature. For example,
if pseudofollicularity is moderate, it cannot also be absent.

Evoking strengths and frequencies are associated with each feature-value pair in a

Privileged Communication 275 E. H. Shortliffe