PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979

1207

Knowledge Engineering for Medical Decision
Making: A Review of Computer-Based
Clinical Decision Aids

EDWARD H. SHORTLIFFE, BRUCE G. BUCHANAN, anp EDWARD A. FEIGENBAUM

Abstract Computer-based models of medical decision making account
for a large portion of clinical computing efforts. This article reviews
representative examples from each of several major medical computing
paradigms. These include 1) clinical algorithms, 2) clinical databanks
that include analytic functions, 3) mathematical models of physical
Processes, 4) pattern recognition, 5) Bayesian statistics, 6) decision
analysis, and 7) symbolic reasoning or artificial intelligence. Because
the techniques used in the various systems cannot be examined exhaus-
tively, the case studies in each category are used as a basis for studying
general strengths and limitations. It is noted that no one method is best
for all applications. However, emphasis is given to the limitations of
early work that have made artificial intelligence techniques and knowl-
edge engineering research particularly attractive. We stress that consid-

Manuscript received December 13, 1978; revised February 20, 1979.

The authors are with the Heuristic Programming Project, Departments
of Medicine and Computer Science, Stanford University, Stanford, CA
94305.

erable basic research in medical computing remains to be done and thai
powerful new approaches may lie in the melding of two ce move es* ub-
lished techniques.

I. INTRODUCTION

S EARLY as the 1950’s, physicians and computer scien-
Awe recognized that computers could assist with clinical

decision making [63] and began to analyze medical diag-
nosis with a view to the potential role of automated decision
aids in that domain [61]. Since that time a variety of tech-
niques have been applied, accounting for at least 800 references
in the clinical ard computing literature [112]. In this article
we review several medical decision making paradigms and dis-
cuss some issues that account for both the multiplicity of ap-
proaches and the limited clinical success of most systems

0018-9219/79/0900 19N7$00.75 © 1979 IEEE
1208

developed to date. Because other authors have reviewed
computer-aided diagnosis [47], [92], [114] and the potential
impact of computers in medical care {93], our emphasis here
is somewhat different. We will focus on the symbolic repre-
sentation and use of knowledge, termed “knowledge engineer-
ing,” and the inadequacies of data-intensive techniques which
have led to the exploration of novel symbolic reasoning ap-
proaches during the last decade.

A. Reasons for Attempting Computer-Aided Medical
Decision Making

Because of the accelerated growth in medical knowledge,
physicians have tended to specialize and to become more de-
pendent upon assistance from other experts when presented
with a complex problem outside their own area of expertise.
The primary care physician who first sees a patient has thou-
sands of tests available with a wide range of costs (both fiscal
and physical) and potential benefits (ie., arrival at a correct
diagnosis or optimal therapeutic management). Even the
experts in a specialized field may reach very different decisions
regarding the management of a specific case {131]. Diagnoses
that are made, and upon which therapeutic decisions are based,
have been shown to vary widely in their accuracy [26], [83],
{89]. Furthermore, medical students usually learn about
decision making in an unstructured way, largely through obser-
vation and by emulating the thought processes they perceive
to be used by their clinical mentors [53].

Thus the motivations for attempts to understand and auto-
mate the process of clinical decision making have been numer-
ous [114]. They are directed both at diagnostic models and at
assisting with patient management decisions. Among the
reasons for introducing computers into such work are the
following:

1) to improve the accuracy of clinical diagnosis through ap-
proaches that are systematic, complete, and able to inte-
grate data from diverse sources,

2) to improve the reliability of clinical decisions by avoiding
unwarranted influences of similar but not identical cases
(a common source of bias among physicians), and by
making the criteria for decisions explicit, and hence
reproducible;

3) to improve the cost efficiency of tests and therapies by
balancing the expenses of time, inconvenience, or funds
against benefits and risks of definitive actions;

4) to improve our understanding of the structure of medical
knowledge, with the associated development of tech-
niques for identifying inconsistencies and inadequacies in
that knowledge; and

5) to improve our understanding of clinical decision making,
in order to improve medical teaching and to make com-
puter programs more effective and easier to understand,

B. The Distinction Between Data and Knowledge

The models on which computer systems base their clinical
advice range from data-intensive to knowledge-intensive ap-
proaches. There are at least four types of knowledge that may
be distinguished from pure statistical data:

1) knowiedge derived from data analysis (largely numerical);

2) judgmental or subjective knowledge;

3) scientific or theoretical knowledge;

4) high-level strategic knowledge or “self-knowledge.”

If there is a chronology to the field over the last 20 years, it
is that there has been progressively less dependence on “‘pure”’

PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979

observational data and more emphasis on higher level symbolic
knowledge inferred from primary data. We include with do-
main knowledge the category of ‘judgmental knowledge”
which reflects the experience and opinions of an expert regard-
ing an issue about which the formal data may be fragmentary
or nonexistent. Since many decisions made in clinical medi
cine depend upon this kind of judgmental expertise, it is not
surprising that investigators should begin to look for ways to
capture and use the knowledge of experts in decision making
programs. Another reason to move away from purely data-
intensive programs is that in medicine the primary data avail-
able to decision makers are far from objective [20], [57].
They include subjective reports from patients, and error-prone
observations [27]. Also, the terminology used in the reports
is not standardized [7] and the classifications often overlap.
Thus decision making aids must be knowledgeable about the
unreliability of the data [57] as well as the uncertainty of the
inference,

For example, data-intensive programs include medical record
systems which accumulate large databanks to assist with deci-
sion making. There is little knowledge per se in the databank,
but there ave large amounts of data which can help with deci-
sions and be analyzed to provide new knowledge. A program
that retrieves a patient’s record for review, or even one that
identifies and retrieves the records of similar patients (match-
ing some set of descriptors), is performing a data management
task with little reasoning involved [36], [86]. Although there
is statistical “knowledge” contained in the conditional probabil-
ities generated from such a databank and utilized for Bayesian
analysis, it is all numeric. At the other extreme are systems
that encode and use the kind of expert knowledge which can-
not be easily gleaned from databanks or literature review {75],
[102]. Systems that model human reasoning or emphasize
education of users tend to fall towards this end of the data-
knowledge continuum.

In addition to judgmental and statistical knowledge, there
are other forms of information that can play an important role
in computer-based clinical decision aids. For example, under-
lying scientific theories and relationships are often ignored by
diagnostic programs but provide the foundation for decisions
made by human experts. Consider, for example, the potential
utility of techniques that could effectively represent and use
the basic knowledge of biochemistry, biophysics, or detailed
human physiology. Biomedical modeling research offers some
mathematical techniques for encoding such knowledge in cer-
tain domains, but symbolic approaches and clinically useful
applications are still largely unrealized.

Finally, there is another kind of knowledge used by human
decision makers—an understanding of reasoning processes and
strategies themselves. This kind of “high-level” or “meta-level”
knowledge, if incorporated into computer programs, may not
only heighten their decision making performance but also aug-
ment their acceptability to users by making them appear
more aware of their own power, strategies, and limitations.

We use the term “knowledge engineering,” then, to refer to
computer-based symbolic reasoning issues such as knowledge
representation, acquisition, explanation, and “self-awareness”
or self-modification [19]. It is along these dimensions that
knowledge-based programs differ most sharply from conven-
tional calculations. For example, they can solve problems by
pursuing a line of reasoning; the individual inference steps and
the whole chain of reasoning may also form the basis for expla-
nations of decisions. A major concern in knowledge engineering
SHORTLIFFE et al.: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING

is clear separation of the medical knowledge in a program from
the inference mechanism that applies that knowledge to the
data of individual cases. One goal of this paper is to identify,
in the strengths and weaknesses of earlier work, those issues
which have motivated several current researchers to investigate
the automation of clinical decision aids through knowledge
engineering.

C. Parameters for Assessing Work in the Field

Barriers to successful implementation of computer-based
diagnostic systems have been analyzed on several occasions
[7], [23], [106] and need not be reviewed here. However, in
assessing programs it is pertinent to examine several parameters
that affect the success and scope of a particular system in light
of its intended users and application. Unfortunately, the
medical computing literature has few descriptions of systems
for which all the following issues can be assessed.

1) How accurate is the program?!

2) What is the nature of the knowledge in the system and
how is it generated or acquired?

3) How is the clinical knowledge represented, and how does
it facilitate the performance goals of the system described?

4) How are knowledge and clinical data used and how does
this impact system performance?

5) Is the system accepted by the users for whom it is in-
tended? Is the interface with the user adequate? Does the
system function outside of a research setting and is it suitable
for dissemination?

6) What are the limitations of the approach?

An issue we have chosen not to address is the cost of a sys-
tem, including the size of the required computing resource.
Not only is information on this question scanty for most of
the programs, but expenses generated in a research and devel-
opment environment do not realistically reflect the costs one
expects from a system once it is operating for service use.

D, Overview of this Paper

An exhaustive review of computer-aided diagnosis will not
be attempted in light of the vastness of the field, and we have
therefore chosen to present the prominent paradigms by dis-
cussing representative examples. In separate sections we give
an overview, example, and discussion of 1) clinical algorithms,
2) databank analysis, 3) mathematical models, 4) pattern rec-
ognition, 5) Bayesian analysis, 6) decision theory, and 7) sym-
bolic reasoning. We close each section by identifying the range
of applications for which the approach appears most appro-
priate, the limitations of the approach, and the ways in which
symbolic reasoning techniques may strengthen the approach
by improving its performance or acceptability.

The seven principle examples we have selected are not neces-
sarily the best nor the most successful; however, they illustrate
the issues we wish to discuss within the major paradigms. We
have also referenced other closely related systems, so the bibli-
ography should guide the reader to more details on particular
topics. Any attempt to categorize programs in this way is
inherently fraught with problems in that several systems draw

1 Although this is important it is not the only measure of clinical ef-
fectiveness. For example, the effects on morbidity, mortality, and length
of hospital stay may also be important parameters. As we shall show,
few systems have reached a stage of implementation where these param-
eters could be assessed. Moreover, because of the complexity of the
interacting influences that affect the usual measures of outcome, it may
be difficult ever to define the marginal benefit of such systems.

1209

upon more than one paradigm. Thus we have occasionally felt
obligated to simplify a topic for clarity in light of the overall
purposes of this review and the limitations of the space avail-
able to us.

Because we are only interested here in decision making tools
for use by clinicians, we have chosen to disregard systems that
are designed primarily for use by researchers [39], [50], [65],
[90]. Furthermore, we shall not discuss biomedical engineer-
ing applications of computers, such as advanced automated
instrumentation techniques (e.g., computerized tomography” )
or signal processing techniques (e.g., programs for EKG anal-
ysis [79] or patient monitoring [116]). Because they do not
explicitly make inferences, we have also omitted programs
designed largely for data storage and retrieval with the actual
analysis and decision making left to the clinician [36], [58],
[124]. We have also chosen to discuss working computer
programs rather than unimplemented theories or early reports
of work in progress.

Il. CLINICAL ALGORITHMS AND AUTOMATION
A. Overview

Clinical algorithms, or protocols, are flowcharts to which a
diagnostician or therapist can refer when deciding how to
manage a patient with a specific clinical problem [97]. Such
protocols usually allow decisions to be made by carefully fol-
lowing the simple branching logic, although there are built-in
safeguards whereby referrals to experts are made if a patient is
unusually complex. The value of a protocol depends upon the
infrequency with which such referrals are made, so it is impor-
tant to design algorithms that reflect an appropriate balance
between safety and efficiency. In general, algorithms have
been designed by expert physicians for use by paramedical
personnel who have been entrusted with the performance of
certain routine clinical-care tasks.? The methodology has
been developed in part because of a desire to define basic
medical logic concisely so that detailed training in pathophysi-
ology would not be necessary for ancillary practitioners. Ex-
perience has shown that intelligent high school graduates,
selected in large part because of poise and warmth of person-
ality, can provide excellent care guided by protocols after only
four to eight weeks of training. This care has been shown to
be equivalent to that given by physicians for the same limited
problems, and to be accepted by physicians and patients alike
for such diverse clinical situations as diabetes management
[56], [66], pharyngitis [38], headache [37], and other dis-
ease categories [104], [110].

The role of the computer in such applications has been
limited, however. In fact, several groups initially experimented
with computer representation of the algorithms but have since
abandoned the efforts and resorted to prepared paper forms
[56],[110]. In these cases the computer had originally guided
the physician assistant’s collection of data and had specified
precisely what decisions should be made or actions taken, in
accordance with the clinical algorithm. However, since the
algorithmic logic is generally simple, and can often be repre-
sented on a single sheet of paper, the advantages of an auto-
mated approach over a manual system have not been clearly

? See Kak’s article in this issue.

3 Clinical algorithms have also been prepared for use by physicians
themselves, but Grimm has found that they are generally less well-
accepted by doctors [38]. He showed, however, that physician per-
formance could improve when protocols were used in certain settings.
1210

demonstrated. In one study Vickery showed that supervising
physicians could detect no significant difference between the
performance of physicians’ assistants using automated versus
manual systems. although the computer system entirely elimi-
nated errors in data collection (since it demanded all relevant
data at the appropriate time) [110]. Furthermore, the com-
puter could not, of course, decide whether the actual observa-
tions entered by the physicians’ assistant were correct; yet this
kind of inaccuracy was one of the most common reasons that
supervisors found an assistant’s performance unsatisfactory.

There are two other ways in which the computer has been
used in the setting of clinical algorithms. First, mathematical
techniques have been used to analyze signs and symptoms of
diseases and thereby to identify those that should most ap-
propriately be referenced in corresponding clinical algorithms
[30!, [S81], [113]. The process for distilling expert knowl-
edge in the form of a clinical algorithm can be an arduous and
imperfect one [97]; formal techniques to assist with this task
may prove Co be very valuable.

Some researchers in this area also use computers to assist
with clinical care audit comparing actual actions taken by a
physicians’ assistant with those recommended by the algorithm
itself. Sox et al. [104] have described a system in which the
assistant’s checklist for a patient encounter was sent to a cen-
tral computer and analyzed for evidence of deviation from the
accepted protocol, Computer-generated reports then served as
feedback to the physicians’ assistant and to the supervising
physicians

BB. Example

We have selected for discussion a project that differs from
those previously cited in that 1) computer techniques are still
being used, and 2) the clinical algorithms are designed for use
by Primary care physicians themselves. This is the cancer
chemotherapy system developed in Alabama by Mesel er al.
{70j. The algorithms were developed to allow private prac-
titioners, at a distance from the regional tertiary-care center,
to manage the complex chemotherapy for their cancer patients
without routinely referring them to the central oncologists.
Mesel ef af. have described a “consultant-extender system”
that enables the primary physician to treat patients with Hodg-
kin’s Disease uader the supervision of a regional specialist.
Five oncologists developed a care protocol for the treatment
of Hodgkin’s Disease, and this algorithm was placed on-line.
Once patients had agreed to participate in the study, their
private physicians would prepare “encounter forms” at the
time cf each office visit. These forms would document perti-
nent interval history, physical findings, and lab data, as well
as chemetherapy administered. The form would then be sent
to the regional center where it was analyzed by the computer
and a customized clinical algorithm was produced to assist the
private physician with the management of ¢hat patient during
ihe next appointment. Thus the computer program would
take into account the ways in which the individual patient’s
disease might progress or improve and would prepare an ap-
propriate clinical algorithm. This protocol was sent back to
the physician in time for it to be available at the next office
visit. The private practitioner was encouraged to call the
regional! specialist directly if the protocol seemed in some way
inadequate or additional questions arose. The authors present
lata suggesting that their system was well-accepted by physi-

PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979

cians and patients, and that excellent care was delivered.*
Retrospective review of cases that were treated at the referral
center itself, but without the use of the protocols, showed a
16-percent rate of variance from the management guidelines
specified in the algorithms; there was no such variance when
the protocols were followed. Thus algorithms may be effec-
tive tools for the administration of complex specialized therapy
in circumstances such as those described.°

C. Discussion of the Methodology

Although clinical algorithms are among the most widespread
and best accepted of the decision aids described in this article,
the simplicity of their logic makes it clear why the technique
cannot be effectively applied in most medical domains. Deci-
sion points in the algorithm are generally binary (i.e., a given
sign or symptom is either present or absent), and there tend to
be many circumstances that can arise for which the user is
advised to consult the supervising physician (or specialist).
Thus the difficult decision tasks are left to experts, and there
is generally no formal algorithm for managing the case from
that point on. It is precisely the simplicity of the algorithmic
logic, and the safeguard of the supervising expert, which have
permitted many algorithms to be represented on one or two
sheets of paper and have obviated the need for direct computer
use in most of the systems. The contributions of clinical al-
gorithms to the distribution and delivery of health care, to the
training of paramedics, and to quality care audit, have been
impressive and substantial. However, the approach is not
suitable for extension to the complex decision tasks to be dis-
cussed in the following sections.

IH. DATABANK ANALYSIS FOR PROGNOSIS AND
THERAPY SELECTION

A. Overview

Automation of medical record keeping and the development
of computer-based patient databanks have been major research
concerns since the earliest days of medical computing. Most
such systems have attempted to avoid direct interaction be-
tween the computer and the physician recording the data, with
the systems of Weed [123], [124] and Greenes [36] being
notable exceptions. Although the earliest systems were de-
signed merely as record-keeping devices, there have been several
recent attempts to create programs that could also provide
analyses of the information stored in the computer databank.
Some early systems [36], [52] had retrieval modules that
identified all patient records matching a Boolean combination
of descriptors; however, further analysis of these records for
decision making purposes was left to the investigator. Weed
has not stressed an analytical component in his automated
problem-oriented record [124], but others have developed
decision aids which use medical record systems fashioned after
his [103].

The systems for databank analysis all depend on the develop-
ment of a complete and accurate medical record system. Once

*This is an interesting result in light of Grimm’s experience men-
tioned in footnote 3. One possible explanation is that physicians were
more accepting of the algorithmic approach in Mesel’s case because it
allowed them to perform tasks that they would previously not have
been able to undertake.

5 More recently the Alabama group has reported similar success im-
plementing a consultant-extender system for adjuvant chemotherapy
in breast carcinoma [129].
SHORTLIFFE et al.: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING

such a system is developed, a number of additional capabilities
can be provided: 1) correlations among variables can be calcu-
lated, 2) prognostic indicators can be measured, and 3) the
response to various therapies can be compared. A physician
faced with a complex management decision can look to such
a system for assistance in identifying patients in the past who
had similar clinical problems and can then see how those pa-
tients responded to various therapies. A clinical investigator
keeping the records of his study patients on such a system can
use the program’s statistical capabilities for data analysis.
Hence, although these applications are inherently data-intensive,
the kinds of *‘knowledge” generated by specialized retrieval and
statistical routines can provide valuable assistance for clinical
decision makers. For example, they help avoid the inherent
biases of anecdotal experience, such as occur when an individ-
ual practitioner bases decisions primarily on personal encoun-
ters with one or two patients having a rare disease or complex
of symptoms.

There are many excellent programs in this category, one of
which is discussed in some detail in the next section. Several
others warrant mention, however. The HELP System at the
University of Utah [117], [119], [120] uses a large data file
on patients in the Latter-Day Saints Hospital. Clinical experts
formulate specialized ‘“‘“HELP sectors’ which are collections of
logical rules that define the criteria for a particular medical
decision. These sectors are developed by an interactive pro-
cess; the expert proposes important criteria for a given deci-
sion and is provided with actual data regarding each criterion
(based on relevant patients and controls from the computer
databank). The criteria in the sector are thus adjusted by the
expert until adequate discrimination is made to justify using
the sector’s logic as a decision tool.® The sectors are then used
for a variety of tasks throughout the hospital.

Another system of interest is that of Feinstein et a/. at Yale
[21], in which physicians interact with the system to request
assistance in estimating prognosis and guiding management for
patients with lung cancer. Similarly, Rosati et al. have devel-
oped a system at Duke University which uses a large databank
on patients who have undergone coronary arteriography [88].
New patients can be matched against those in the databank to
help determine patient prognosis under a variety of manage-
ment alternatives.

B. Example

One of the most successful projects in this category is the
ARAMIS system of Fries at Stanford University [24]. The ap-
proach was designed originally for use in an outpatient rheu-
matology clinic, but then broadened to a general clinical data-
base system, the time-oriented databank (TOD) [126], [127],
so that it could be transferred to clinics in oncology, metabolic
disease, cardiology, endocrinology, and certain pediatric sub-
specialties. All clinic records are kept in a tabular fomat in
which a column in a large table indicates a specific clinic visit
and the rows indicate the relevant clinical parameters that are
being followed over time. These charts are maintained by the
physicians seeing the patient in clinic, and the new column of
data is later transferred to the computer databank by a tran-

® This process might be seen as a technique to assist with the formula-
tion of clinical algorithms as discussed in the previous section. Another
approach using databank analysis for algorithm development is described
in [30].

1211

scriptionist; in this way time-oriented data on all patients are
kept current. The defined database (clinical parameters to be
followed) is determined by clinical experts, and in the case of
rheumatic diseases has now been standardized on a national
scale [41].

The information in the databank can be used to create a
prose summary of the patient’s current status, and there are
graphical capabilities which can plot specific parameters for a
patient over time [126]. However, it isin the analysis of stored
clinical experience that the system has its greatest potential
utility [25]. In addition to performing search and statistical
functions such as those developed in databank systems for
clinical investigation [50], [65], ARAMIS offers a prognostic
analysis for a new patient when a management decision is to
be made. Using the consultative services of the Stanford Im-
munology Division, an individual practitioner may select clini-
cal indices for his patient that he would like matched against
other patients in the databank. It is imperative that such
indices be selected wisely and hence with expert advice; the
Stanford immunologists have found that the best descriptors
for characterizing patients are often different from those that
a novice chooses to use. Based on two to five such descriptors,
the computer locates relevant prior patients and prepares a
report outlining their prognosis with respect to a variety of
endpoints (e.g., death, development of renal failure, arthritic
status, pleurisy). Therapy recommendations are also generated
on the basis of a response index that is calculated for the
matched patients. A prose case analysis for the physician’s
patient can also be generated: this readable document sum-
marizes the relevant data from the databank and explains the
basis for the therapeutic recommendation.

The rheumatologic databank generated under ARAMIS has
now been expanded to involve a national network of immu-
nologists who are accumulating time-oriented data on their
patients. This national project seeks in part to obtain enough
data so that groups of retrieved patients will be sizable, thereby
controlling for some observer variability and making the sys-
tem’s recommendations more statistically defensible.

C. Discussion of the Methodology

Databank analysis systems have powerful capabilities to
offer to the individual clinical decision maker. Furthermore,
medical computing researchers recognize the potential value of
large databanks in supporting many of the other decision mak-
ing approaches discussed in subsequent sections. There are
important additional issues regarding databank systems.

1) Data acquisition remains a major problem. Many systems
have avoided direct physician-computer interaction but have
then been faced with the expense and errors of transcription.
The developers of one well-accepted record system still express
their desire to implement a direct interface with the physician
for these reasons, although they recognize the difficulties
encountered in encouraging direct use of a computer system
by doctors [107].

2) Analysis of data in the system can be complicated by
missing values that frequently occur, outlying values, and poor
reproducibility of data across time and among physicians.
Conversely, the system can itself be used to identify question-
able values of tests or observations.

3) The decision aids provided tend to emphasize patient
management rather than diagnosis. Feinstein’s system [21] is
1212

only useful for patients with lung cancer, for example, and the
ARAMIS prognostic routines, which are designed for patient
management, assume that the patient’s rheumatologic diagnosis
is already known.

4) There is no formal correlation between the way expert
physicians approach patient management decisions and the
way the programs arrive at recommendations, Feinstein and
Koss felt that the acceptability of their system would be limited
by a purely statistical approach, and they therefore chose to
mimic human reasoning processes to a large extent [59], but
their approach appears to be an exception.

5} Data storage space requirements can be large since the
decision aids of course require a comprehensive medical record
system as a basic component.

Slamecka has distinguished between structured and empirical
approaches to clinical consulting systems [103], pointing out
that databanks provide a largely empirical basis for advice,
whereas structured approaches rely on judgmental knowledge
elicted from the literature or from experts. It is important to
note, however, that judgmental knowledge is itself based on
empirical information. Even an expert’s “‘intuitions” are based
on observations and “‘data collection” over years of experience.
Thus one might argue that large, complete, and flexible data-
banks could form the basis for large amounts of judgmental
knowledge that we now have to elicit from other sources.
Some researchers have indicated a desire to experiment with
methods for the automatic generation of medical decision
rules from databanks, and one component of the research on
Slamecka’s MARIS system is apparently pointed in that direc-
tion [103]. Indeed, some of the most exciting and practical
uses of large databanks may be found precisely at the interface
with those knowledge engineering tasks that have most con-
founded researchers in medical symbolic reasoning [5].

IV. MATHEMATICAL MODELS OF PHYSICAL PROCESSES
A, Overview

Pathophysiologic processes can be well-described by mathe-
matical formulas in a limited number of clinical problem areas.
Such domains have lent themselves well to the development of
computer-based decision aids since the issues are generally
well-defined. The actual techniques used by such programs
tend to reflect the details of the individual applications, the
most celebrated of which have been in pharmacokinetics
(specifically digitalis dosing), acid-base/electrolyte disorders,
and respiratory care [69].

It is important that cooperating experts assist with the defini-
tion of pertinent variables and the mathematical characteriza-
tion of the relationships among them. The computer program
requests the relevant data, makes the appropriate computa-
tions, and provides a clinical analysis or recommendation for
therapy. Some of the programs have also involved branched-
chain logic to guide decisions about what further data are
needed for adequate analysis.”

Programs to assist with digitalis dosing have gradually intro-
duced broader medical knowledge over the last ten years. The

7“ Branched-chain” logic refers to mechanisms by which portions of a
decision network can be considered or ignored, depending upon the data
on a given case. For example, in an acid-base program the anion gap
might be calculated and a branch-point could then determine whether
the pathway for analyzing an elevated anion gap would be required. If
the gap were not elevated, that whole portion of the logic network
could be skipped.

PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979

earliest work was Jelliffe’s [48] and was based upon his con-
siderable experience studying the pharmacokinetics of the
cardiac glycosides. His computer program used mathematical
formulations based on parameters such as therapeutic goals
(e.g., desired predicted blood levels), body weight, renal func-
tion, and route of administration. In one study he showed that
computer recommendations reduced the frequency of adverse
digitalis reactions from 35 percent to 12 percent [49]. Later,
another group revised the Jelliffe model to permit a feedback
loop in which the digitalis blood levels obtained with initial
doses of the drug were considered in subsequent therapy rec-
ommendations [78], [96]. More recently, a third group in
Boston, noting the insensitivity of the first two approaches to
the kinds of nonnumerical observations that experts tend to
use in modifying digitalis therapy, augmented the pharmaco-
kinetic model with a patient-specific model of clinical status
[35}. Running their system in a monitoring mode, in parallel
with actual clinical practice on a cardiology service, they found
that each patient in the trial in whom toxicity developed had
received more digitalis than would have been recommended by
their program.

B. Example

Perhaps the best known program in this category is the inter-
active system developed at Boston’s Beth Israel Hospital by
Bleich. Originally designed as a program for assessment of
acid-base disorders [2], it was later expanded to consider elec-
trolyte abnormalities as well [3], [4]. The knowledge in
Bleich’s program is a distillation of his own expertise regarding
acid-base and electrolyte disorders. The system begins by col-
lecting initial laboratory data from the physician seeking advice
on a patient’s management. Branched-chain Jogic is triggered
by abnormalities in the initial data so that only the pertinent
sections of the extensive decision pathways created by Bleich
are explored. The approach is therefore similar to the flow-
charting techniques used by the clinical algorithms of Section
II, but it involves more complex mathematical relationships
than algorithms typically do. Essentially all questions asked
by the program are numerical laboratory values or “yes-no”
questions (e.g., “Does the patient have pitting edema?’’). De-
pending upon the complexity and severity of the case, the
program eventually generates an evaluation note that may vary
in length from a few lines to several pages. Included are sug-
gestions regarding possible causes of the observed abnormali-
ties and suggestions for correcting them. Literature references
are also provided with the recommendations.

Although the program was made available at several East
Coast institutions, few physicians accepted it as an ongoing
clinical tool. Bleich points out that part of the reason for this
was the system’s inherent educational impact; physicians simply
began to anticipate its analysis after they had used it a few
times [3]}.8

The system’s lack of sustained acceptance by physicians is
probably due to more than its educational impact, however.
For example, there is no feedback in the system; every patient
is seen as a new case and the program has no concept of follow-
ing a patient’s response to prior therapy. Furthermore, the
program generates differential diagnosis lists but does not pur-
sue specific etiologies; this can be particularly bothersome

® More recently he has been experimenting with the program operat-
ing as a monitoring system, thereby avoiding direct interaction with the
physician.
SHORTLIFFE ez al.: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING

when there are multiple coexistent disturbances in a patient
and the program simply suggests parallel lists of etiologies
without noticing or pursuing the possible interrelationships.
Finally, the system is highly individualized in that it contains
only the parameters and relationships that Bleich specifically
thought were important to include in the logic network. Of
course human consultants also give personalized advice which
may differ from that obtained from other experts. However, a
group of researchers in Britain [85] who compared Bleich’s
program to four other acid-base/electrolyte systems, found
total agreement among the programs in only 20 percent of test
cases when these systems were asked to define the acid-base
disturbance and the degree of compensation present. Their
analysis does not reveal which of the programs reached the
correct decision, however, and it may be that the results are
more an indictment of the other four programs than a valid
criticism of the advice from Bleich’s acid-base component.

C. Discussion of the Methodologies

The programs mentioned in this section differ from one
another in several respects, and each tends to overlap with
other paradigms we have discussed. Bleich’s program, for ex-
ample, is essentially a complicated clinical algorithm interfaced
with mathematical formulations of electrolyte and acid-base
pathophysiology. As such it suffers from the weaknesses of all
algorithmic approaches, most importantly its highly structured
and inflexible logic which is unable to contend with circum-
stances not specifically anticipated in the algorithm. The digi-
talis dosing programs all draw on mathematical techniques
from the field of biomedical modeling [40], but have recently
shown more reliance on methods from other areas as well. In
particular these have included symbolic reasoning methods
that allow clinical expertise to be encoded and used in con-
junction with mathematical techniques [35]. The Boston
group that developed this most recent digitalis program is
interested in similarly developing an acid-base/electrolyte sys-
tem so that judgmental knowledge of experts can be interfaced
with the mathematical models of pathophysiology.”

There is also a large research community of mathematicians
who attempt to understand and characterize physical processes
by devising simulation models [40]. Although such models
are largely empirical and have generally not found direct appli-
cation in clinical medicine, their research role may eventually
be broadened to provide practical] decision aids through inter-
faces with the other paradigms described in this review.

The major strength of mathematical models is their ability to
capture mathematically sound relationships in a concise and
efficient computer program. However, the major limitation, as
with most of the paradigms discussed here, is that few areas of
medicine are amenable to firm, quantitative description. Be-
cause the accuracy of the results depends on correct identifica-
tion of relevant parameters, the precision and certainty of the
relationships among them, and the accuracy of the techniques
for measuring them, mathematical models have limited appli-
cability at present. Furthermore, those domains that do lend
themselves to mathematical description may still benefit from
interactions with symbolic reasoning techniques, as has been
demonstrated in the digitalis therapy adviser [35].

> This project was described by Professor Peter Szolovits, of MIT’s
clinical decision making group, during a workshop on artificial intelli-
gence in medicine at the University of Tokyo, Tokyo, Japan, in Novem-
ber 1978.

1213

V. STATISTICAL PATTERN-RECOGNITION TECHNIQUES
A. Overview

Pattern-recognition techniques define the mathematical re-
lationship between measurable features and classification of
objects [15], [51]. In medicine, the presence or absence cf
each of several signs and symptoms in a patient may be defini-
tive for the classification of the patient as ‘“abnormal” or into
the category of a specific disease. They are also used for prog-
nosis [1], or predicting disease duration, time course, and out-
comes. These techniques have been applied to a variety of
medical domains, such as image processing and signa! analysis,
in addition to computer-assisted diagnosis.

In order to find the diagnostic pattern, or discriminant func-
tion, the method requires a training set of objects, for which
the correct classification is already known, as well as reliable
values for their measured features. If the form and parameters
are not known for the statistical distributions underlying the
features, then they must be estimated. Parametric techniques
focus on learning the parameters of the probability density
functions, while nonparametric (or ‘“‘distribution-free’”’) tech-
niques make no assumptions about the form of the distribu-
tions. After training, then, the pattern can be compared to
new, unclassified objects to aid in deciding the category to
which the new object belongs. !°

There are numerous variations on this general approach, most
notably in the mathematical techniques, used to extract char-
acteristic measurements (the features) and to find and refine
the pattern classifier during training. For example, linear re-
gression analysis is a commonly used technique for finding the
coefficients of an equation that defines a recurring pattern or
category of diagnostic or prognostic interest. A class of pa-
tients can be described by a feature vector Y¥ =[x,,%2,°°',
X,] (where x; is one of n descriptive variables). The goal is to
produce an equation relating the posterior probabilities!' of
each diagnostic class to the feature vector through a set of n
coefficients (@;)'?:

P(D,|X) = a, x4 bagxy too + ayXp.

Recent work emphasizes structural relationships among sets of
features more than statistical ones.

Three of the best known training criteria for the discrimi-

nant function are:

a) least squared error criterion: choose the function that
minimizes the squared differences between predicted and
observed measurement values;

b) clustering criterion: choose the function that produces
the tightest clusters;

c) Bayes’ criterion: choose the function that has the mini-
mum cost associated with incor-ect diagnoses. 4

Ten commonly used mathematical models based on these

‘Tt is possible to detect patterns, even without a known classifica-
tion for objects in the training set, with so-called “unsupervised” learn-
ing techniques. Also, it is possible to work with both numerical and
nonnumerical measurements.

‘The posterior probability of a diagnostic class, represented as
P(D;{X), is the probability that a patient falls in diagnostic category D;
given that the feature vector X has been observed.

12See [62] for a study in which the coefficients are reported because
of their medica! import.

13 This is one of many uses of Bayes’ Theorem, a definitional rule that
relates posterior and prior probabilities. For an overview of its use as a
diagnostic rule (as opposed to a training criterion) and a definition of
the formula, see Section VI.
1214

criteria have been shown to produce remarkably similar diag-
nostic results for the same data [7].

B. Example

There are numerous papers on uses of pattern recognition
methods in medicine. Armitage [11] discusses three examples
of prognostic studies, with an emphasis on regression methods.
Goldwyn ef al. [31] discuss uses of cluster analysis, One re-
cent diagnostic application by Patrick [73] uses Bayes’ criterion
to classify patients having chest pains into three categories:
D,: acute myocardial infarction (MI); D.: coronary insuffi-
ciency; and D3: noncardiac causes of chest pain. The need for
early diagnosis of heart attacks without laboratory tests is a
prevalent problem, yet physicians are known to misclassify
about one third of the patients in categories D, and D, and
about 80 percent of those in D3. In order to determine the
correct classification, each patient in the training set was classi-
fied after 3 days, based on laboratory data including electro-
cardiogram (ECG) and blood data (cardiac enzymes). There
remained some uncertainty about several patients with “prob-
able MI.” Seventeen variables were selected from many: 9
features with continuous values (including age, heart rates,
white blocd count, and hemoglobin) and 8 features with dis-
crete values (sex and 7 ECG features).

The training data were measurements on 247 patients. The
decision rule was chosen using Bayes’ Theorem to compute the
posterior probabilities of each diagnostic class given the feature
vector X (¥ = [x,,x2,°°°,%X17]). Then a decision rule was
chosen to minimize the probability of error by adjusting the
coefficients on the feature vector X such that for the correct
class D;:

P(D,|X) = MAX [PD |X), PWD21X), P(WD3IX)I.

The class conditional probability density functions must be
estimated initially, and the performance of the decision rule
depends on the accuracy of the assumed model.

Using the same 247 patients for testing the approach, the
trained classifier averaged 80 percent correct diagnoses over
the three classes, using only data available at the time of ad-
mission. Physicians, using more data than the computer, aver-
aged only 50.5 percent correct over these three categories for
the same patients. Training the classifier with a subset of the
patients, and using the remainder for testing produced nearly
as good results.

C. Discussion of the Methodology

The number of reported medical applications of pattern rec-
ognition techniques is large, but there are also numerous prob-
lems associated with the approach. The most obvious difficul-
ties are choosing the set of features in the first place, collecting
reliable measurements on a large sample, and verifying the
initial classifications among the training data. Current tech-
niques are inadequate for problems in which trends or move-
ment of features are important characteristics of the categories.
Also the problems for which existing techniques are accurate
are those that are well characterized by a small number of
features (‘dimensions of the space”).

As with all techniques based on statistics, the size of the
sample used to define the categories is an important considera-
tion. As the number of important features and the number of
relevant categories increase, the required size of the training set
also increases. In one test [7], pattern classifiers trained to
discriminate among 20 disease categories from 50 symptoms

PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979

were correct 51-64 percent of the time. The same methods
were used to train classifiers to discriminate between 2 of the
diseases, from the same 50 symptoms, and produced correct
diagnoses 92 -98 percent of the time.

The context in which a local pattern is identified raises prob-
lems related to the issue of utilizing medical knowledge. It is
difficult to find and use classifiers that are best for a small
decision, such as whether an area of an X-ray is inside or out-
side the heart, and integrate those into a global classifier, such
as one for abnormal heart volume.

Accurate application of a classifier in a hospital setting also
requires that the measurements in that clinical environment
are consistent with the measurements used to train the classi-
fier initially. For example, if diseases and symptoms are de-
fined differently in the new setting, or if lab test values are
reported in different ranges, or different lab tests used, then
decisions based on the classification are not reliable.

Pattern recognition techniques are often misapplied in medi-
cal domains in which the assumptions are violated. Some of
the difficulties noted above are avoided in systems that inte-
grate structural knowledge into the numerical methods and in
systems that integrate human and machine capabilities into
single interactive systems. These modifications will overcome
one of the major difficulties seen in completely automated
systems, that of providing the system with good “intuitions”
based on an expert’s a priori knowledge and experience [51].

VI. BAYESIAN STATISTICAL APPROACHES
A, Overview

More work has been done on Bayesian approaches to com-
puter-based medical decision making than on any of the other
paradigms we have discussed. The appeal of Bayes’ Theorem!
is clear: it offers a potentially exact method for computing
the probability of a disease based on observations and data
regarding the frequency with which these observations are
known to occur for specified diseases. In several domains the
technique has been shown to be exceedingly accurate, but
there are also several! limitations to the approach which we
discuss below.

In its simplest formulation, Bayes’ Theorem can be seen as a
mechanism to calculate the probability of a disease, in light of
specified evidence, from the a priori probability of the disease
and the conditional probabilities relating the observations to
the diseases in which they may occur. For example, suppose
disease D; is one of nm mutually exclusive diagnoses under con-
sideration and £& is the evidence or observations supporting
that diagnosis. Then if P(D,) is the @ priori probability of the
ith disease: !5
PD, lB) = PDP

n
> P(D;) P(E LD,)
7=1

The theorem can also be represented or derived in a variety of
other forms, including an odds/likelihood ratio formulation.
We cannot include a full discussion here, but any introductory
statistics book or Lusted’s volume [64] presents the subject in
considerable detail.

‘4 Also often referred to as Bayes’ rule, discriminant, or criterion.

'SHere P(D;|E) is the probability of the ith disease given that evi-
dence E has been observed; P(E|D;) is the probability that evidence EF
wiil be observed in the setting of the ith disease.
SHORTLIFFE et al.: KNOWLEDGE ENGINEERING FCR MEDICAL DECISION MAKING

Among the most commonly recognized problems with the
utilization of a Bayesian approach is the large amount of data
required to determine all the conditional probabilities needed
in the rigorous application of the formula. Chart review or
computer-based analysis of large databanks occasionally allows
most of the necessary conditional probabilities to be obtained.
A variety of additional assumptions must be made. For ex-
ample: 1) the diseases under consideration are assumed mu-
tually exclusive and exhaustive (i.e., the patient is assumed to
have one of the n diseases), 2) the clinical observations are as-
sumed to be conditionally independent over a given disease, '®
and 3) the incidence of the symptoms of a disease is assumed
to be stationary (i.e., the model does not allow for changes in
disease patterns over time).

One of the earliest Bayesian programs was Warner’s system
for the diagnosis of congenital heart disease [115]. He com-
piled data on 83 patients and generated a symptom- disease
matrix consisting of 53 symptoms (attributes) and 35 disease
entities. The diagnostic performance of the computer, based
on the presence or absence of the 53 symptoms in a new pa-
tient, was then compared to that of two experienced physi-
cians. The program was shown to reach diagnoses with an
accuracy equal to that of the experts. Furthermore, system
performance was shown to improve as the statistics in the
symptom-disease matrix stabilized with the addition of in-
creasing numbers of patients.

In 1968 Gorry and Barnett pointed out that Warner’s pro-
gram had required making all 53 observations for every patient
to be diagnosed, a situation which would not be realistic for
many clinical applications. They therefore used a modifica-
tion of Bayes’ Theorem in which observations are considered
sequentially.!” Their computer program analyzed observations
one at a time, suggested which test would be most useful if
performed next, and included termination criteria so that a
diagnosis could be reached, when appropriate, without needing
to make all the observations [32]. Decisions regarding tests
and termination were made on the basis of calculations of ex-
pected costs and benefits at each step in the logical process. !8
Using the same symptom- disease matrix developed by Warner,
they were able to attain equivalent diagnostic performance
using only 6.9 tests on average.'? They pointed out that be-
cause the costs of medical tests may be significant (in terms
of patient discomfort, time expended, and financial expense),
the use of inefficient testing sequences should be regarded as
ineffective diagnosis. Warner has also more recently included
Gorry and Barnett’s sequential! diagnosis approach in an appli-
cation regarding structured patient history-taking [118].

The medical computing literature now includes many ex-
amples of Bayesian diagnosis programs, most of which have
used the nonsequential approach, in addition to the necessary
assumptions of symptom independence and mutual exclusive-

'€ The purest form of Bayes’ Theorem allows conditional dependen-
cies and the order in which evidence is obtained to be explicitly con-
sidered in the analysis. However, the number of required conditional
probabilities is so unwieldy that conditional independence of observa-
tions and nondependence on the order of observations are generally
assumed [108].

A similar approach was devised in Russia at approximately the
same time by Vishnevskiy and associates. Their analyses and a sum-
mary of the impressive amount of statistical data they ave amassed are
contained in [111].

18 See the decision theory discussion in Section VI.

1? Tests for determining attributes were defined somewhat differently
than they had been by Warner. Thus the maximum number of tests
was 31 rather than the 53 observations used in the original study.

1218

ness of disease as discussed above. One particularly successful
research effort has been chosen for discussion.

B. Example

Since the late 1960’s deDombal and associates, ai the Uni-
versity of Leeds, England, have been studying the diagnostic
process and developing computer-based decision aids using
Bayesian probability theory. Their area of investigation has
been gastrointestinal diseases, originally acute abdominal pair:
[12] with more recent analyses of dyspepsia [44) aad gastric
carcinoma [134].

Their program for assessment of acute abdominal pain was
evaluated in the emergency room of their affidiated hospital
[12]. Emergency physicians filled out data sheets summaziz-
ing clinical and laboratory findings on 304 patients presenting
with abdominal pain of acute onset. The data from these
sheets became the attributes that were subjected to Bayesian
analysis; the required conditional probabilities had been pre-
viously compiled from a large group of patients with one of
seven possible diagnoses.”° Thus the Bayesian formulation:
assumed each patient had one of these ciseases and would
select the most likely on the basis of recorded observations.
Diagnostic suggestions were cbtained in batch inede and did
not require direct interaction between physician and com
puter; the program could generate resulis within 30.5 to 15
min depending upon the jevel of system use at the time of
analysis [43]. Thus the computer output could have been
made available to the emergency rcom physician, ia average,
within S min after the data form was completed and f:anded io
the technician assisting with the study.

During the study [12], however, these computer-generzted
diagnoses were simply saved and later compared to (a) the dtag-
noses reached by the attending clinicians, and (b) the ultimate
diagnosis verified at surgery or through appropriate tests. Al-
though the clinicians reached the correct diagnosis in only
65-80 percent of the 304 cases (with accuracy depending
upon an individual’s training and experience), the program was
correct in 91.8 percent of cases. Furthermore, in 6 of the 7
disease categories the computer was proved more likely than
the senior clinician in charge of a case to assign the patient to
the correct disease category. Of particular interest was the
program’s accuracy regarding appendicitis, a diagnosis which
is often made incorrectly. In no cases cf appendicitis did the
computer fail to make the correct diagnosis, and in only six
cases were patients with nonspecific abdominal pain incor-
rectly classified as having appendicitis. Based on the actual
clinical decisions, however, over 20 patients with nonspecific
abdominal pain were unnecessarily taken to surgery for ap-
pendicitis, and in six cases patients with appendicitis were
‘“‘watched”’ for over eight hours before they were finaily taken
to the operating room.

These investigators also performed a fascinating experiment
in which they compared the program’s performance based on
data derived from 600 real patients, with the accuracy the sys-
tem achieved using “estimates” of conditional probabuitics
obtained from experts [60].7? As discussed above, the pro-

   

2° Appendicitis, diverticulitis, perforated ulcer, cholecystitis, smal!
bowel obstruction, pancreatitis, and nonspecific abdominal pain.

21 Such estimates are referred to as ‘subjective’ or “personal” prova-
bilities, and some investigators have argued that they should be used in
Bayesian systems when formaliy derived conditional probabilities are
not available [64].
1216

gram was significantly more effective than the unaided clini-
cian when real-life data were used. However, it performed
significantly less well than clinicians when expert estimates
were used. The results supported what several other observers
have found, namely that physicians often have very little idea
of the “‘true”’ probabilities for symptom-disease relationships.

Another Leeds study of note was an analysis of the effect of
the system on the performance of clinicians [13]. The trial
we have mentioned that involved 304 patients was eventually
extended to 552 before termination. Although the computer’s
accuracy remained in the range of 91 percent throughout this
period, the performance of clinicians was noted to improve
markedly over time. Fewer negative laparotomies were per-
formed, for example, and the number of acute appendices that
perforated (ruptured) also declined. However, these data slowly
returned towards baseline after the study was terminated, sug-
gesting that the constant awareness of computer monitoring
and feedback regarding system performance had temporarily
generated a heightened awareness of intellectual processes
among the hospital surgeons.

C. Discussion of the Methodology

The ideal matching of the problem of acute abdominal pain
and Bayesian analysis must be emphasized; the technique can-
not necessarily be as effectively applied in other medical do-
mains where the following limitations of the Bayesian approach
may have a greater impact.

1) The assumption of conditional independence of symp-
toms usually does not apply and can lead to substantial errors
in certain settings [72]. This has led some investigators to
seek new numerical techniques that avoid the independence
assumption [8]. If a pure Bayesian formulation is used with-
out making the independence assumption, however, the
number of required conditional probabilities becomes pro-
hibitive for complex real world problems [108].

2) The assumption of mutual exclusiveness and exhaustive-
ness of disease categories is usually false. In actual practice
concurrent and overlapping disease categories are common. In
deDombal’s system, for example, many of the abdominal
pain diagnoses missed were outside the seven “recognized”
possibilities; if a program starts with an assumption that it
need only consider a small number of defined likely diagnoses,
it will inevitably miss the rare or unexpected cases (precisely
the ones with which the clinician is most apt to need assistance).

3) In many domains it may be inaccurate to assume that
relevant conditional probabilities are stable over time (e.g.,
the likelihood that a particular bacterium will be sensitive to
a specific antibiotic). Furthermore, diagnostic categories and
definitions are constantly changing, as are physicians’ obser-
vational techniques, thereby invalidating data previously ac-
cumulated.??_ A similar problem results from variations in
@ priori probabilities depending upon the population from
which a patient is drawn.?2? Some observers feel that these
are major limitations to the use of Bayesian techniques [16].

In general, then, a purely Bayesian approach can so constrain
problem formulation as to make a particular application un-

2? Although gradual changes in definitions or observational techniques
may be statistically detectable by database analysis, a Bayesian analysis
that uses such data is inevitably prone to error,

73deDombal has examined such geographic and population-based
variations in probabilities and has reported early results of his analysis
{14].

PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979

realistic and hence unworkable. Furthermore, even when diag-
nostic performance is excellent such asin deDombal’s approach
to abdominal pain evaluation, clinical implementation and
system acceptance will generally be difficult. Forms of repre-
sentation that allow explanation of system performance in
familiar terms (i.e., a more congenial interface with physician
users) will heighten clinical acceptance; it is at this level that
Bayesian statistics and symbolic reasoning techniques may
most beneficially interact.

VII. DECISION THEORETICAL APPROACHES
A. Overview

Bayes’ Theorem is only one of several techniques used in the
larger field of decision analysis, and there has recently been in-
creasing interest in the ways in which decision theory might be
applied to medicine and adapted for automation. Several ex-
cellent reviews of the field are available in basic reviews [45],
textbooks [84], and medically oriented journal articles [67],
[94], [109]. In general terms, decision analysis can be seen
as any attempt to consider values associated with choices, as
well as probabilities, in order to analyze the processes by
which decisions are made or should be made. Schwartz identi-
fies the calculation of “expected value” as central to formal
decision analysis [94]. Ginsberg contrasts medical classifica-
tion problems (e.g., diagnosis) with broader decision problems
(e.g., “What should I do for this patient?’’), and asserts that
most important medical decisions fall in the latter category
and are best approached through decision analysis [29].

The following topics are among the central issues in the
field.

1} Decision Trees: The decision making process can be seen
as a sequence of steps in which the clinician selects a path
through a network of plausible events and actions. Nodes in
this tree-shaped network are of two kinds: decision nodes,
where the clinician must choose from a set of actions, and
chance nodes, where the outcome is not directly controlled by
the clinician but is a probabilistic response of the patient to
some action taken. For example, a physician may choose to
perform a certain test (decision node) but the occurrence or
nonoccurrence of complications may be largely a matter of
statistical likelihood (chance node). By analyzing a difficult
decision process before taking any actions, it may be possible
to delineate in advance all pertinent chance and decision nodes,
all plausible outcomes, plus the paths by which these out-
comes might be reached. Furthermore, data may exist to
allow specific probabilities to be associated with each chance
node in the tree.

2) Expected Values: In actual practice physicians make
sequential decisions based on more than the probabilities as-
sociated with the chance node that follows. For example, the
best possible outcome is not necessarily sought if the costs
associated with that ‘‘path” far outweigh those along alternate
pathways (e.g., a definitive diagnosis may not be sought if the
required testing procedure is expensive or painful and patient
management will be unaffected; similarly, some patients prefer
to “‘live with’ an inguinal hernia rather than undergo a surgical
repair procedure). Thus anticipated ‘“‘costs’’ (financial, compli-
cations, discomfort, patient preference) can be associated with
the decision nodes. Using the probabilities at chance nodes,
the costs at decision nodes, and the ‘“‘value” of the various
outcomes, an ‘“‘expected value” for each pathway through the
SHORTLIFFE et al: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING

tree (and in turn each node) can be calculated. The ideal path-
way, then, is the one which maximizes the expected value.

3) Eliciting Values: Obtaining from physicians and patients
the costs and values they associate with various tests and out-
comes can be a formidable problem, particularly since formal
analysis requires expressing the various costs in standardized
units. One approach has been simply to ask for value ratings
on a hypothetical scale, but it can be difficult to get the physi-
cian or patient to keep the values?* separate from their knowl-
edge of the probabilities linked to the associated chance nodes.
An alternate approach has been the development of lottery
games. Inferences regarding values can be made by identifying
the odds, in a hypothetical lottery, at which the physician or
patient is indifferent regarding taking a course of action with
certain outcome and betting on a course with preferable out-
come but with a finite chance of significant negative costs if
the “‘bet” is lost. In certain settings this approach may be ac-
cepted and provide important guidelines in decision making
{77}.

4) Test Evaluation: Since the tests which lie at decision
nodes are central to clinical decision analysis, it is crucial to
know the predictive value of tests that are available. This leads
to consideration of test sensitivity, specificity, receiver opera-
tor characteristic curves, and sensitivity analysis. Such issues
are discussed by Komaroff in this issue [57] and have also
been summarized elsewhere in the clinical literature [68] .

Many of the major studies of clinical decision analysis have
not specifically involved computer implementations. Schwartz
et al. examined the workup of renal vascular hypertension,
developing arguments to show that for certain kinds of cases a
purely qualitative theoretical approach was feasible and use-
ful [94]. However, they showed that for more complex clini-
cally challenging cases the decisions could not be adequately
sorted out without the introduction of numerical techniques.
Since it was impractical to assume that clinicians would ever
take the time to carry out a detailed quantitative decision
analysis by hand, they pointed out the logical role for the
computer in assisting with such tasks and accordingly de-
veloped the system we discuss as an example below [33].

Other colleagues of Schwartz at Tufts have been similarly
active in applying decision theory to clinical problems. Pauker
and Kassirer have examined applications of formal cost-
benefit analysis to therapy selection [74] and Pauker has
also looked at possible applications of the theory to the
management of patients with coronary artery disease [76].
An entire issue of the New England Journal of Medicine has
also been devoted to papers on this methodology [46].

B. Example

Computer implementations of clinical decision analysis have
appeared with increasing frequency since the mid-1960’s.
Perhaps the earliest major work was that of Ginsberg at Rand
Corporation [28], with more recent systems reported by
Pliskin and Beck [80] and Safran et al. [91].

We will briefly describe here the program of Gorry et al.,
developed for the management of acute renal failure [33].
Drawing upon Gorry’s experience with the sequential Bayesian
approach previously mentioned [32], the investigators recog-
nized the need to incorporate some way of balancing the

24 Also termed “utilities” in some references; hence, the term “utility
theory” [84].

1217

dangers and discomforts of a procedure against the value of
the information to be gained. They divided their program into
two parts: phase I considered only tests with minimal risk
(e.g., history, examination, blood tests) and phase II con-
sidered procedures involving more risk and inconvenience.
The phase I program considered 14 of the most common
causes of renal failure and used a sequential test selection
process based on Bayes’ Theorem and omitting more advanced
decision theoretical techniques [32}. The conditional prob-
abilities used were subjective estimates obtained from an
expert nephrologist and were therefore potentially as proble-
matic as those discussed by Leaper et al, [60] (see Section
VI-B). The researchers found that they had no choice but to
use expert estimates, however, since detailed quantitative data
were not available either in databanks or the literature.

It is in the phase II program that the methods of decision
theory were employed because it was in this portion of the
decision process that the risks of procedures became important
considerations. At each step in the decision process this
program considers whether it is best to treat the patient im-
mediately or to first carry out an additional diagnostic test.
To make this decision the program identifies the treatment
with the highest current expected value (in the absence of
further testing), and compares this with the expected values
of treatments that could be instituted if another diagnostic
test were performed. Comparison of the expected values are
made in light of the risk of the test in order to determine
whether the overall expected value of the test is greater than
that of immediate treatment. The relevant values and prob-
abilities of outcomes of treatment were obtained as subjective
estimates from nephrologists in the same way that symptom-
disease data had been obtained. Alli estimates were gradually
refined as they gained experience using the program, however.

The program was evaluated on 18 test cases in which the
true diagnosis was uncertain but two expert nephrologists
were willing to make management decisions. In 14 of the
cases the program selected the same therapeutic plan or
diagnostic test as was chosen by the experts. For three of the
four remaining cases the program’s decision was the physi-
cians’ second choice and was, they felt, a reasonable alterna-
tive plan of action. In the last case the physicians also ac-
cepted the program’s decision as reasonable although it was
not among their first two choices.

C. Discussion of the Methodology

The excellent performance of Gorry’s program, despite its
reliance on subjective estimates from experts, may serve to
emphasize the importance of the clinical analysis that under-
lies the decision theoretical approach. The reasoning steps in
managing clinical cases have been dissected in such detail that
small errors in the probability estimates are apparently much
less important than they were for deDombal’s purely Bayesian
approach [60]. Gorry suggests this may be simply because
the decisions made by the program are based on the combina-
tion of large aggregates of such numbers, but this argument
should apply equally for a Bayesian system. It seems to us
more likely that distillation of the clinical domain in a formal
decision tree gives the program so much more knowledge of
the clinical problem that the quantitative details become
somewhat less critical to overall system operation. The ex-
plicit decision network is a powerful knowledge structure;
the “knowledge” in deDombal’s system lies in conditional
probabilities alone and there is no larger scheme to override
1218

the propagation of error as these probabilities are mathemati-
cally manipulated by the Bayesian routines.

The decision theory approach is not without problems,
however. Perhaps the most difficult problem is assigning
numerical values (e.g., dollars) to a human life or a day of
health, etc. Some critics feel this is a major limitation to the
methodology [120]. Overlapping or coincidental diseases are
also not well-managed, unless specifically included in the
analysis, and the Bayesian foundation for many of the calcula-
tions still assumes mutually exclusive and exhaustive disease
categories. Problems of symptom conditional dependence
still remain, and there is no easy way to include knowledge
regarding the time course of diseases. Gorry points out that
his program was also incapable of recognizing circumstances in
which two or more actions should be carried out concurrently.
Furthermore, decision theory per se does not provide the kind
of focusing mechanisms that clinicians tend to use when they
assume an initia! diagnostic hypothesis in dealing with a
patient and discard it only if subsequent data make that
hypothesis no tonger tenable. Other similar strategies of
clinical reasoning are becoming increasingly well-recognized
{53] and account in large part for the applications of symbolic
reasoning techniques to be discussed in the next section.

VIII. SymBoLtic REASONING APPROACHES
A. Overview

In the early 1970’s researchers at several institutions simul-
taneously began to investigate potential clinical applications of
symbolic reasoning techniques drawn from the branch of
computer science known as artificial intelligence (Al). The
field is well-reviewed in a recent book by Winston [128].
The term “‘artificial intelligence’ is generally accepted to
include those computer applications that involve symbolic
inference rather than strictly numerical calculations. Exam-
ples include programs that reason about mineral exploration,
organic chemistry, or molecular biology; programs that con-
verse in English and understand spoken sentences; and pro-
grams that generate theories from observations.

Such programs gain their power from qualitative, ex peri-
ential judgments, codified in so-called “rules-of-thumb” or
“heuristics.” in contrast to numerical calculation programs
whose power derives from the analytica} equations used. The
heuristics focus the attention of the reasoning program on
parts of the problem that seem most critical and parts of the
knowledge base that seem most relevant. They also guide the
application of the domain knowledge to an individual case by
deleting items from consideration as well as by focusing on
items. The result is that these programs pursue a line of rea-
soning as opposed to following a sequence of steps in a calcula-
tion. Among the earliest symbolic inference programs in
medicine was the diagnostic interviewing system of Klein-
muntz [54]. Other early work included Wortman’s informa-
tion processing system, the performance of which was largely
motivated by a desire to understand and simulate the psycho-
logical processes of neurologists reaching diagnoses [130].

It was a landmark paper by Gorry in 1973, however, that
first critically analyzed conventional approaches to computer-
based clinical decision making and outlined his motivation for
turning to newer symbolic techniques [34]. He used the acute
renal failure program discussed in Section VII-B [33] as an

PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979

example of the problems arising when decision analysis is used
alone. In particular, he analyzed some of the cases on which
the program had failed but the physicians considering the cases
had performed well. His conclusions from these observations
include the following four points.

1) Clinical judgment is based less on detailed knowledge of
pathophysiology than it is on gross chunks of knowledge and a
good deal of detailed experience from which rules of thumb
are derived.

2) Clinicians know facts, of course, but their knowledge is
also largely judgmental. The rules they learn allow them to
focus attention and generate hypotheses quickly. Such heuris-
tics permit them to avoid detailed search through the entire
problem space.

3) Clinicians recognize levels of belief or certainty asso-
ciated with many of the rules they use, but they do not
routinely quantitate or use these certainty concepts in any
formal statistical manner.

4) It is easier for experts to state their rules in response to
perceived misconceptions in others than it is for them to
generate such decision criteria @ priori.

In the renal failure program medical knowledge had been
embedded in the structure of the decision tree. This knowl-
edge was never explicit, and additions to the experts’ judg-
mental rules had generally required changes to the tree itself.

Based on observations such as those above, Gorry identified
at least three important problems for investigation.

1) Medical Concepts: Clinical decision aids had tradition-
ally had no true ‘‘understanding”’ of medicine. Although ex-
plicit decision trees had given the decision theory programs a
greater sense of the pertinent associations, medical knowledge
and the heuristics for problem solving in the field had never
been explicitly represented nor used. So-called ““common
sense” was often clearly lacking when the programs failed,
and this was often what most alienated potential physician
users.

2) Conversational Capabilities: Both for capturing knowl-
edge from collaborating experts, and for communicating with
physician users, Gorry argued that further research on the de-
velopment of computer-based linguistic capabilities was crucial.

3) Explanation: Diagnostic programs had seldom empha-
sized an ability to explain the basis for their decisions in terms
understandable to the physician. System acceptability was
therefore inevitably limited; the physician would often have
no basis for deciding whether to accept the program’s advice,
and might therefore resent what could be perceived as an at-
tempt to dictate the practice of medicine.

Gorry’s group at MIT and Tufts developed new approaches
to explaining the renal failure problem in light of these obser-
vations [75].

Due to the limitations of the older techniques, it was per-
haps inevitable that some medical researchers would turn to
the AI field for new techniques. Major research areas in AI
include knowledge representation, heuristic search, natural
language understanding and generation, and models of thought
processes—all topics clearly pertinent to the problems we have
been discussing. Furthermore, AI researchers were beginning
to look for applications to which they could apply some of
the techniques they had developed in theoretical domains.
This community of researchers has grown in recent years, and
a recent issue of Artificial Intelligence was devoted entirely
SHORTLIFFE et al.: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING

to applications of AI to biology, medicine, and chemistry
{105} .25

Among the programs using symbolic reasoning techniques
are several systems that have been particularly novel and suc-
cessful. At the University of Pittsburgh, Pople and Myers
have developed a system called INTERNIST that assists with
test selection for the diagnosis of ali diseases in internal
medicine [81]. This awesome task has been remarkably suc-
cessful to date, with the program correctly diagnosing a large
percentage of complex cases selected from clinical pathologic
conferences in the major medical journals.2° The program
uses a hierarchic disease categorization, an ad hoc scoring
system for quantifying symptom-disease relationships, plus
some clever heuristics for focusing attention, discriminating
between competing hypotheses, and diagnosing concurrent
diseases [82]. The system currently has a limited human
interface, however, and is not yet implemented for clinical
trials.

Weiss, Kulikowski, and Amarel (Rutgers University) and
Safir (Mt. Sinai Hospital, New York City) have developed a
model of reasoning regarding disease processes in the eye,
specifically glaucoma [125]. In this specialized application
area it has been possible to map relationships between observa-
tions, pathophysiologic states, and disease categories. The
resulting causal associational network (termed CASNET)
forms the basis for a reasoning program that gives advice
regarding disease states in glaucoma patients and generates
management recommendations. The system is undergoing
evaluation by a nationwide network of ophthalmologists but
is not yet offered for routine clinical use.

For the Al researchers the question of how best to manage
uncertainty in medical reasoning remains a central issue. The
programs mentioned have developed ad hoc weighting systems
and avoided formal statistical approaches. Others have turned
to the work of statisticians and philosophers of science who
have devised theories of approximate or inexact reasoning.
For example, Wechsler [122] describes a program that is based
upon Zadeh’s fuzzy set theory [133], and Shortliffe and
Buchanan [101] have turned to confirmation theory for their
model of inexact reasoning.

B. Example

The symbolic reasoning program selected for discussion is
the MYCIN System at Stanford University [102]. The re-
searchers cited a variety of design considerations which moti-
vated the selection of AI techniques for the consultation
system they were developing [99]. They primarily wanted it
to be useful to physicians and therefore emphasized the selec-
tion of a problem domain in which physicians had been shown
to err frequently, namely the selection of antibiotics for
patients with infections. They also cited human issues that
they felt were crucial to make the system acceptable to

75 Many of the systems which use AI techniques for medical decision
making were developed on the SUMEX-AIM computing resource, a
nationally shared system devoted entirely to applications of AI to the
biomedical sciences. The SUMEX-AIM computer is physically located
at Stanford University but is used by researchers nationwide via connec-
tions to computer networks. The resource is funded by the Division
of Research Resources, Biotechnology Branch, National Institutes of
Health.

76 Data communicated by Drs. Pople and Myers at the Fourth Annual
A.I.M. Workshop, Rutgers University, June 1978.

1219

physicians:

1) it should be able to explain its decisions in terms of a
line of reasoning that a physician can understand;

2) it should be able to justify its performance by responding
to questions expressed in simple English:

3) it should be able to “learn” new information rapidly by
interacting directly with experts;

4) its knowledge should be easily modifiable so that per-
ceived errors can be corrected rapidly before they recur
in another case; and

5) the interaction should be engineered with the user in
mind (in terms of prompts, answers, and information
volunteered by the system as well as by the users).

All these design goals were based on the observation that
previous computer decision aids had generally been poorly
accepted by physicians, even when they were shown to per-
form well on the tasks for which they were designed. MYCIN’s
developers felt that barriers to acceptance were largely concep-
tual and could be counteracted in large part if a system were
perceived as a clinical tvol rather than a dogmatic replacement
for the primary physician’s own reasoning.

Knowledge of infectious diseases is represented in MYCIN as
production rules, each containing a “packet” of knowledge
obtained from collaborating experts [102].27 A production
rule is simply a conditional statement which relates observa-
tions to associated inferences that may be drawn. For exam-
ple, a MYCIN rule might state that “if a bacterium is a gram
positive coccus growing in chains, then it is apt to be a strepto-
coccus.”” MYCIN’s power is derived from such rules in a
variety of ways:

1) it is the program that determines which rules to use and
how they should be chained together to make decisions
about a specific case ;7°

2) the rules can be stored in a machine-readable format but
translated intu English for display to physicians:

3) by removing, altering, or adding rules, the system’s
knowledge structures can be rapidly modified without
explicitly restructuring the entire knowledge base; and

4) the rules themselves can often form a coherent ex plana-
tion of system reasoning if the relevant ones are trans-
lated into English and displayed in response to a user’s
question.

Associated with all rules and inferences are numerical weights
reflecting the degree of certainty associated with them. These
numbers, termed certainty factors, form the basis for the SyS-
tem’s inexact reasoning [101]. They allow the judgmental
knowledge of experts to be captured in rule form and then
used in a consistent fashion.

The MYCIN system has been evaluated regarding its per-
formance at therapy selection for patients with either septi-
cemia [132] or meningitis [131]. The program performs
comparably with experts in these two task domains, but as
yet it has no rules regarding the other infectious disease prob-
lem areas. Further knowledge base development will there-
fore be required before MYCIN is made available for clinical
use; hence, questions regarding its acceptability to physicians

?? Production rules are a technique frequently employed in AI re-
search [9] and effectively applied to other scientific problem domains
[6].

28 The control structure used is termed “goal-oriented” and is similar
to the consequent-theorems used in Hewitt’s PLANNER (42].
1220

cannot yet be assessed. However, the required implementation
stages have been delineated [100], attention has been paid to
all the design criteria mentioned above, and the program does
have a powerful explanation capability [95].

C. Discussion of the Methodology

Whereas the computations used by the other paradigms
mostly involve straightforward application of well-developed
computing techniques, artificial intelligence methods are
largely experimental; new approaches to knowledge represen-
tation, language understanding, heuristic search, and the other
symbolic reasoning problems we have mentioned are still
needed. Thus the Al programs tend to be developed in re-
search environments where short-term practical results are
unlikely to be found. However, out of this research are
emerging techniques for coping with many of the problems
encountered by the other paradigms we have discussed. AI
researchers have developed promising methods for handling
concurrent diseases [82], [125], assessing the time course
of disease [18], and acquiring adequate structured knowledge
from experts [11]. Furthermore, inexact reasoning tech-
niques have been developed and implemented [101] (although
they tend to be justified largely on intuitive grounds). In
addition, the techniques of artificial intelligence provide a way
to respond to many of Gorry’s observations regarding the
three major inadequacies of prior paradigms as described in
Section VIII-A: 1) the medical AI programs all tend to stress
the representation of medical knowledge and a sense of under-
standing the underlying concepts; 2) many of them have
conversational capabilities which draw on language processing
research; and 3) explanation capabilities have been a primary
focus of systems such as MYCIN.

Szolovits and Pauker have recently reviewed some applica-
tions of AI to medicine and have attempted to weigh the
successes of this young field against the very real problems
that lie ahead [108]. They identify several deficiencies of
current systems. For example, termination criteria are still
poorly understood. Although INTERNIST can diagnose
simultaneous diseases, it also pursues all abnormal findings to
completion, even though a clinician often ignores minor un-
explained abnormalities if the rest of a patient’s clinical status
is well understood. In addition, although some of these pro-
grams now cleverly mimic the reasoning styles observed in
experts [17], [53], it is less clear how to keep the systems
from abandoning one hypothesis and turning to another one
as soon as new information suggests another possibility. Pro-
grams that operate this way appear to digress from one topic
to another—a characteristic that decidedly alienates a user
regardless of the validity of the final diagnosis or advice.

Still largely untapped is the power of an AI program to
understand its own knowledge base, i.e., the structure and
content of the reasoning mechanisms as well as of the medical
facts. In effect, Al programs have the ability to “know what
they know,” the best working example of which can be found
in the prototype system named Teiresias [10]. Because such
programs can reason about their own knowledge, they have
the power to encode knowledge about strategies, e.g., when to
use and when to ignore specific items of medical knowledge
and which leads to follow up on. Such “meta-level’’ knowl-
edge offers a new dimension to the design of “intelligent
assistant’? programs which we predict will be exploited in
medical decision making systems of the future.

PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979

IX. CONCLUSIONS

This review has shown that there are two recurring questions

regarding computer-based clinical decision making:

1) Performance-—how can we design systems that reach
better, more reliable decisions in a broad range of appli-
cations, and

2) Acceptability-how can we more effectively encourage
the use of such systems by physicians or other intended
users?

We shall summarize these points separately by reviewing

many of the issues common to all the paradigms discussed in
this paper.

A, Performance Issues

Central to assuring a program’s adequate performance is a
matching of the most appropriate technique with the problem
domain. We have seen that the structured logic of clinical
algorithms can be effectively applied to triage functions and
other primary care problems, but they would be less naturally
matched with complex tasks such as the diagnosis and manage-
ment of acute renal failure. Good statistical data may support
an effective Bayesian program in settings where diagnostic
categories are small in number, nonoverlapping, and well-
defined, but the inability to use qualitative medical knowledge
limits the effectiveness of the Bayesian approach in more
difficult patient management or diagnostic environments.
Similarly, mathematical models may support decision making
in certain well-described fields in which observations are’
typically quantified, and related by functional expressions,
but in which the knowledge is typically limited to numerical
encoding. These examples, and others, demonstrate the need
for thoughtful consideration of the technique most appro-
priate for managing a clinical problem. In general the simplest
effective approach is to be preferred,?? but acceptability
issues must also be considered as discussed below.

As researchers have ventured into more complex clinical
domains, a number of difficult problems have tended to de-
grade the quality of performance of computer-based decision
aids. Significant clinical problems require large knowledge
bases that contain complex interrelationships including time
and functional dependencies. The knowledge of such domains
is inevitably open-ended and incomplete, so the knowledge
base must be easily extensible. Not only does this require a
flexible representation of knowledge, but it encourages the
development of novel techniques for the acquisition and inte-
gration of new facts and judgments. Similarly, the inexactness
of medical inference must somehow be represented and mani-
pulated within effective consultation systems. As we have
discussed, all these performance issues are important knowl-
edge engineering research problems for which artificial intelli-
gence already offers promising new methods.

It is also important to consider the extent to which a pro-
gram’s “understanding” of its task domain will heighten its
performance, particularly in settings where knowledge of the
field tends to be highly judgmental and poorly quantified. We

291¢ is also always appropriate to ask whether computer-based ap-
proaches are needed at all for a given decision making task. For all but
the most complex clinical algorithms, for example, the developers have
tended to discard computer programs. Similarly, Schwartz et ai.
pointed out that the decision analyses can often be successfully accom-
plished in a qualitative manner using paper and pencil [94].
SHORTLIFFE et al.: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING

use the term “understanding” here to refer to a -program’s
ability to reason about, as well as reason with, its medical
knowledge base. This implies a substantial amount of judg-
mental or structural knowledge (in addition to data) contained
within the program. Analyses of human clinical decision
making [17], [53] suggest that as decisions move from simple
to complex, a physician’s reasoning style becomes less algo-
rithmic and more heuristic, with qualitative judgmental knowl-
edge and the conditions for invoking it coming increasingly
into play. Furthermore, the performance of complex decision
aids will also be heightened by the representation and utiliza-
tion of high-level ‘‘meta-knowledge” that permits programs to
understand their own limitations and reasoning strategies. In
order to design medical computing programs with these capa-
bilities, the designers themselves will have to become cognizant
of “knowledge engineering” issues. It is especially important
that they find effective ways to match the knowledge struc-
tures they use to the complexity of the tasks their programs
are designed to undertake.

B. Acceptability Issues

A recurring observation as one reviews the literature of
computer-based medical decision making is that essentially
none of the systems has been effectively used outside of a
research environment, even when its performance has been
shown to be excellent! This suggests that it is an error to
concentrate research primarily on methods for improving the
computer’s decision making performance when clinical impact
depends on solving other problems of acceptance as well,
There are some data [106] to support the extreme view that
the biases of medical personnel against computers are so strong
that systems will inevitably be rejected, regardless of perfor-
mance. However, we are beginning to see examples of applica-
tions in which initial resistance to automated techniques has
gradually been overcome through the incorporation of ade-
quate system benefits [121].

Perhaps one of the most revealing lessons on this subject is
an observation regarding the system of Mesel et al. {70] de-
scribed in Section II-B. Despite documented physician resis-
tance to clinical algorithms in other settings [38], the physi-
cians in Mesel’s study accepted the guidance of protocols for
the management of chemotherapy in their cancer patients. It is
likely that the key to acceptance in this instance is the fact
that these physicians had previously had no choice but to refer
their patients with cancer to the tertiary care center in Bir-
mingham where all complex chemotherapy was administered.
The introduction of the protocols permitted these physicians
to undertake tasks that they had previously been unable to do.
It simultaneously allowed maintenance of close doctor-
patient relationships and helped the patients avoid frequent
long trips to the center. The motivation for the physician to
use the system is clear in this case. It is reminiscent of Rosati’s
assertion that physicians will first welcome computer decision
aids when they become aware that colleagues who are using
them have a clear advantage in their practice [87].

A heightened awareness of ‘human engineering’’ issues
among medical computing researchers will also make com-
puters more acceptable to physicians by making the programs
easier and more pleasant to use. Fox has recently reviewed
this field in detail [22]. The issues range from the mechanics
of interaction with the computer (e.g., using display terminals
with such features as light pens, special keyboards, color, and

1221

graphics) to the features of the program that make it appear
as a helpful tool rather than a complicating burden. Also
involved, from both the mechanical and global design sides, is
the development of flexible interfaces that tailor the style of
the interaction to the needs and desires of individual
physicians.

Adequate attention must also be given to the severe time
constraints perceived by physicians. Ideally they would like
programs to take no more time than they currently spend
when accomplishing the same task on their own. Time and
schedule pressures are similarly likely to explain the greater
resistance to automation among interns and residents than
among medical students or practicing physicians in Starts-
man’s study [106].

The issue of a program’s “self-knowledge” impacts on the
acceptance of consultation systems in much the same way as it
does upon program performance. Decision makers, in general,
and physicians, in particular, will place more trust in systems
that appear to understand their own limitations and capa-
bilities, and that know when to admit ignorance of a problem
area or inability to support any conclusion regarding an
individual patient. Moreover, physicians will have a means
for checking up on these automated assistants if the programs
have an ability to explain not only the reasoning chain lead-
ing to their decisions but their problem solving strategies also.
High-level knowledge, including a sense of scope and limita-
tions, may thus allow a program to know enough about it-
self to prevent its own misuse. Furthermore, since systems
that are not easily modifiable tend not to be accepted, meta-
level knowledge about representation and interconnections
within the knowledge base may help overcome the problem of
programs becoming tied too closely to a store of knowledge
that is regionally or temporally specific. It is therefore im-
portant to stress that considerations such as those we have
mentioned here may argue in favor of using symbolic reason-
ing techniques even when a somewhat less complex approach
might have been adequate for the decision task itself.

IX. SUMMARY

In summary, the trend towards increased use of knowledge
engineering techniques for clinical decision programs stems
from the dual goals of improving the performance and increas-
ing the acceptance of such systems. Both acceptability and
performance issues must be considered from the outset in a
system’s design because they dictate the choice of methodology
as much as the task domain itself does. As greater experience
is gained with these techniques, and as they become better
known throughout the medical computing community, it is
likely that we will see increasingly powerful unions between
symbolic reasoning and the alternate paradigms we have dis-
cussed. One lesson to be drawn lies in the recognition that
much basic research remains to be done in medical computing,
and that the field is more than the application of established
computing techniques to medical problems.

ACKNOWLEDGMENT

We wish to thank R. Blum, L. Fagan, J. King, J. Kunz, H.
Sox, and G. Wiederhold for their thoughtful advice in review-
ing earlier drafts of this paper. We are also grateful to Dr.
Herbert Sherman and the reviewers for their constructive
suggestions regarding revisions.
1222

[1]

[2]

[3]

[4]

[5]

[6]

(7)

{8]

19}

[10]

f11]

[12]

113}

[14]

1s}

[16]

{17}

[18]

{19}

[20]

{21]

[22]

[23)

[24]

[25]

[26]

[27]

[28]

129}

REFERENCES

P. Armitage and E. A. Gehan, “Statistical methods for the
identification and use of prognostic factors,’ Int. J. Cancer, vol.
13, pp. 16-36, 1974.

H. L. Bleich, “Computer evaluation of acid-base disorders,”
J. Clin, Invest., vol. 48, pp. 1689-1696, 1969.

—., “The computer as a consultant,” New Eng. J. Med., vol.
284, pp. 141-147, 1971.

-——, “Computer-based consultation: Electrolyte and acid-
base disorders,” Amer. J. Med., vol. 53, pp. 285-291, 1972.

R. L. Blum and G. Wiederhold, “Inferring knowledge from
clinical data banks: Utilizing techniques from artificial intel-
ligence,”’ in Proc, 2nd Annu. Symp. Comput. Appl. Med. Care
(IEEE, Washington, DC}, pp. 303-307, Nov. 1978.

B. G. Buchanan and E. A. Feigenbaum, “Dendral and meta-
dendral: Their applications dimension,” Artific. Intell., vol. 11,
pp. 5-24, 1978.

D. J. Croft, “Is computerized diagnosis possible?’? Comput.
Biomed. Res., vol. 5, pp. 351-367, 1972.

J. Cumberpatch and H. S. Heaps, “A disease-conscious method
for sequential diagnosis by use of disease probabilities without
assumption of symptom independence,” Int. J. Biomed.
Comput., vol. 7, pp. 61-78, 1976.

R. Davis and J. King, “An overview of production systems,”’
in Machine Representation of Knowledge, E. W. Elcock and D.
Michie, Eds. New York: Wiley, 1976.

R. Davis, “Applications of meta-level knowledge to the con-
struction, maintenance, and use of large knowledge bases,”
Heuristic Programming Project, Stanford Univ., Stanford, CA,
Memo HPP.-76-7, July 1976.

“Interactive transfer of expertise: Acquisition of new
inference rules,” in Proc. of Sth Int. Joint Conf. Artific. Intell.
(Cambridge, MA), 1977.

F. T. deDombal, D. J. Leaper, J. R. Staniland et al., “Computer-
aided diagnosis of acute abdominal pain,” Brit. Med. J., vol. 2,
pp. 9-13, 1972.

F. T. deDombal, D. J. Leaper, J. C. Horrocks et al., “Human
and computer-aided diagnosis of abdominal pain: Further
report with emphasis on performance of clinicians,” Brit. Med.
J., vol. 1, pp. 376-380, 1974.

F. T. deDombal and F. Gremy, Eds. Decision Making And
Medical Care: Can Information Science Help? Amsterdam,
The Netherlands: North-Holland, 1976.

R. O. Duda and P. E. Hart, Pattern Classification and Scene
Analysis. New York: Wiley, 1973.

W. Edwards, ‘‘N = 1: Diagnosis in unique cases,” in Computer
Diagnosis And Diagnostic Methods, J. A. Jacquez, Ed. Spring-
field, IL: Charles C. Thomas, 1972, pp. 139-151.

A. S. Elstein, L. S. Shulman, and S. A. Sprafka, Medical Prob-
lem Solving: An Analysis of Clinical Reasoning. Cambridge,
MA: Harvard Univ. Press, 1978.

L. M. Fagan, “Knowledge engineering for dynamic clinical
settings: Giving advice in the intensive care unit,’? Heuristic
Programming Project, Stanford Univ., Stanford, CA. (Doctoral
dissertation), 1979.

E. A. Feigenbaum, “The art of artificial intelligence: Themes
and case studies of knowledge engineering,” in AFIPS Conf.
Proc., NCC, 1978, vol. 47. Montvale, NJ: AFIPS Press, 1978,
p. 227.

A. R. Feinstein, “Quality of data in the medical record,”
Comput. Biomed. Res., vol. 3, pp. 426-435, 1970.

A. R. Feinstein, J. F. Rubinstein, and W. A. Ramshaw, “Esti-
mating prognosis with the aid of a conversational mode com-
puter program,” Ann. Intern. Med., vol. 76, pp. 911-921, 1972.
J. Fox, “Medical computing and the user,” Int. J, Man-Mach.
Stud., vol. 9, pp. 669-686, 1977.”

R. B. Friedman and D. H. Gustafson, “Computers in clinical
medicine: A critical review,” Comput. Biomed, Res., vol. 8,
pp. 199-204, 1977.

J. F. Fries, ‘“Time-oriented patient records and a computer
databank,” J. Amer. Med. Ass., vol. 222, pp. 1536-1542,
1972.

-—, “A data bank for the clinician?’’ (editorial), New Eng. J.
Med., vol. 294, pp. 1400-1402, 1976.

L. H. Garland, “Studies on the accuracy of diagnostic proce-
dures,” Amer. J. Roentgen., vol. 82, pp. 25-38, 1959.

P. W. Gilt, D. J. Leaper, P. J. Guillou e¢ al., ‘‘Observer variation
in clinical diagnosis: A computer-aided assessment of its magni-
tude and importance,” Meth. Inform. Med., vol. 12, pp. 108-
113, 1973.

A. S. Ginsberg, ‘‘Decision analysis in clinical patient manage-
ment with an application to the pleural effusion syndrome,”’
The Rand Corp., Santa Monica, CA, R-751-RC/NLM, July 1971.
, “The diagnostic process viewed as a decision problem,”

PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

139}

[40]
[41]

[42]

[43]

[44]

[45]
[46]
[47]

[48]

{49}

[50]

[51]

[52]

[53]

[54]

[55]

[56}

[57]
[58]

[59}

in Computer Diagnosis and Diagnostic Methods, J. A. Jacquez,
Ed. Springfield, IL: Charles C. Thomas, 1972.

M. A. Gleser and M. F. Collen, ‘‘Towards automated medical
decisions,” Comput. Biomed. Res., vol. 5, pp. 180-189, 1972.
R. M. Goldwyn, H. P. Friedman, and J. H. Siegel, ‘Iteration and
interaction in computer data bank analysis: As case study in
the physiologic classification and assessment of the critically
ill,” Comput. Biomed, Res., vol. 4, pp. 607-622, 1971.

G. A. Gorry and G. O. Barnett, “Experience with a model of
sequential diagnosis,’ Comput. Biomed. Res., vol. 1, pp. 490-
507, 1968.

G. A. Gorry, J. P. Kassirer, A. Essig, and W. B. Schwartz,
“Decision analysis as the basis for computer-aided management
of acute renal failure,’’ Amer. J. Med., vol. 55, pp. 473-484,
1973.

G. A. Gorry, “Computer-assisted clinical decision making,’’
Meth, Inform. Med., vol. 12, pp. 45-51, 1973.

G. A. Gorry, H. Silverman, and S. G. Pauker, ‘“‘Capturing clinical
expertise: A computer program that considers clinical responses
to digitalis,’ Amer. J, Med., vol. 64, pp. 452-460, 1978.

R. A. Greenes, G. O. Barnett, S. W. Klein et al., “Recording,
retrieval, and review of medical data by physician-computer
interaction,’ New Eng. J. Med., vol. 282, pp. 307-315, 1970.

S. Greenfield, A. L. Komaroff, and H. Anderson, ‘‘A headache
protocol for nurses: Effectiveness and efficiency ,’’ Arch. Intern.
Med., vol. 136, pp. 1111-1116, 1976.

R. H. Grimm, K. Shimoni, W. R. Harlan, and E. H. Estes,
“Evaluation of patient-care protocol use by various providers,”
New Eng. J. Med., vol. 292, pp. §07-511, 1975.

G,. F. Groner, R. L. Clark, R. A. Berman, and E. C. De Land,
“BIOMOD—an interactive computer graphics system for model-
ing,” in Proc. Fall Joint Conput. Conf., pp. 369-378, 1971.

T. Groth, “Biomedical modelling,’’ in MEDINFO 77. Amster-
dam, The Netherlands: North-Holland, 1977, pp. 775-784.

E. V. Hess, “A uniform database for rheumatic diseases,”’
Arthrit. Rheumat., vol. 19, pp. 645-648, 1976.

C. Hewitt, “Description and theoretical analysis (using schemata)
of PLANNER: A language for proving theorems and manipulat-
ing models in a robot,” Ph.D. dissertation, Dep. Mathematics,
Massachusetts Inst. Technol., Cambridge, MA. 1972.

J. C. Horrocks, A. P. McCann, J. R. Staniland et al., ““Computer-
aided diagnosis: Description of an adaptable system, and opera-
tional experience with 2,034 cases,’ Brit, Med. J., vol. 2, pp.
§-9, 1972.

J. C. Horrocks and F. T. deDombal, “Computer-aided diagnosis
of dyspepsia,’ Amer. J. Diges. Dis., vol. 20, pp. 397-406,
1975.

R. A. Howard, Ed., Special Issue on Decision Analysis, JEEE
Trans. Syst., Sci. Cybern., vol. SSC-4, Sept. 1968.

F. J. Inglefinger, ‘“‘Decision in medicine’’ (editorial), New Eng
J. Med., vol. 293, pp. 254-255, 1975.

J. A. Jacquez, Computer Diagnosis and Diagnostic Methods.
Springfield, IL: Charles C. Thomas, 1972.

R. W. Jelliffe, J. Buell, R. Kalaba et al., “A computer program
for digitalis dosage regimens,’’ Math. Biosci., vol. 9, pp. 179-
193, 1970.

R. W. Jelliffe, J. Buell, and R. Kalaba, ‘“‘Reduction of digitalis
toxicity by computer-assisted glycoside dosage regimens,”
Ann, Intern, Med., vol. 77, pp. 891-906, 1972.

D. C. Johnson and G. O. Barnett, ‘“MEDINFO-A medical
information system,’ Comput. Prog. Biomed., vol. 7, pp. 191-
201, 1977.

L. N. Kanal, ‘Patterns in pattern recognition: 1968-1974,”
IEEE Trans. Inform. Theory, vol. I1T-20, no. 6, 1974.

R. H. S. Karpinski and H. L. Bleich, “MISAR: A miniature
information storage and retrieval system,’’ Comput. Biomed.
Res., vol. 4, pp. 655-660, 1971.

J. P. Kassirer and G. A. Gorry, “Clinical problem solving: A
behavioral analysis,’’ Ann. Intern. Med., vol. 89, pp. 245-255,
1978.

B. Kleinmuntz and R. S. McLean, “Diagnostic interviewing by
digital computer,’’ Behav. Sci., vol. 13, pp. 75-80, 1968.

R. G. Knapp, S. Levi, D. Lurie, and M. Westphal, ‘A computer-
generated diagnostic decision guide: A comparison of statistical
diagnosis and clinical diagnosis,’? Comput. Biol. Med., vol. 7,
Pp. 223-230, 1977.

A. L. Komaroff, W. L. Black, M. Flatley et al., ‘Protocols for
physician assistants: Management of diabetes and hyperten-
sion,” New Eng. J. Med., vol. 290, pp. 307-312, 1974.

A. L. Komaroff, ‘Medical data collection: Hard decisions from
soft data,”’ this issue, pp. 000-000.

J. Korein, M. Lyman, and J. L. Tick, ‘‘The computerized medi-
cal record,” Bull. NY Acad, Med., vol. 47, pp. 824-826, 1971.
N. Koss and A. R. Feinstein, “(Computer-aided prognosis: II.
Development of a prognostic algorithm,’? Arch. Intern. Med.,
vol. 127, pp. 448-459, 1971.
SHORTLIFFE et al.: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING

{60}

[61]

[62]

163]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

{72]

[73]

[74]

{75]

(76]

{77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

D. J. Leaper, J. C. Horrocks, J. R. Staniland, and F. T. de-
Dombal, “Computer-assisted diagnosis of abdominal pain
using estimates provided by clinicians,’ Brit. Med. J., vol. 4,
pp. 350-354, 1972.

R. S. Ledley and L. B. Lusted, “Reasoning foundations of
medical diagnosis,” Science, vol. 130, pp. 9-21, 1959.

S. Levi, J. R. Frant, M. C. Westphal, and D. Lurie, “Develop-
ment of a decision guide -optimal discriminations for meningitis
determined by statistical analysis,’ Meth. Inform, Med., vol.
15, pp. 87-90, 1976.

M. Lipkin and J. D. Hardy, “Mechanical correlation of data in
differential diagnosis of hematologic diseases,’ J. Amer. Med.
Ass., vol. 166, pp. 113-125, 1958.

L. B. Lusted, Introduction To Medical Decision Making. Spring-
field, IL: Charles C. Thomas, 1968.

J. C. Mabry, H. K. Thompson, M. D. Hopwood, and W. R.
Baker, “A protoype data management and analysis system
(CLINFO): System description and user experience,” in MED-
INFO 77. Amsterdam, The Netherlands: North-Holland, 1977,
pp. 71-75.

C. McDonald, B. Bhargava, and D. Jeris, “A clinical information
system (CIS) for ambulatory care,” Proc. 1975 NCC, vol. 44
AFIPS Press, (1975) pp. 749-756.

B. J. McNeil, E. Keeler, and S. J. Adelstein, “Primer on certain
elements of medical decision making,’ New Eng. J. Med., vol.
293, pp. 211-215,1975.

B. J. McNeil and S. J. Adelstein, ‘Determining the value of

diagnostic and screening tests,’ J. Nucl. Med., vol. 17, pp.

439-448, 1977.

S. J. Menn, G. O. Barnett, D. Schmechel et al,, “A computer
Program to assist in the care of acute respiratory failure,” J.
Amer, Med. Ass., vol. 223, pp. 308-312, 1973.

E. Mesel, D. D. Wirtschafter, J. T. Carpenter et al., “Clinical
algorithms for cancer chemotherapy—systems for community -
based consultant-extenders and oncology centers,’’ Meth,
Inform. Med., vol. 15, pp. 168-173, 1976.

R. A. Nordyke, C. A. Kulikowski, and C. W. Kulikowski, “A
comparison of methods for the automated diagnosis of thyroid
dysfunction,” Comput. Biomed. Res., vol. 4, pp. 374-389,
1971.

M. J. Norusis and J. A. Jacquez, “Diagnosis. I. Symptom non-
independence in mathematical models for diagnosis,’’ Comput.
Biomed, Res., vol. 8, pp. 156-172, 1975.

E. A. Patrick, “Pattern recognition in medicine,’ Syst.. Man,
Cybern, Rev., vol. 6, p. 4, 1977.

S. G. Pauker and J. P. Kassirer, “Therapeutic decision making:
A cost-benefit analysis,” New Eng. J. Med., vol. 293, pp. 229-
234,1975.

S. G. Pauker, G. A. Gorry, J. P. Kassirer, and W. B. Schwartz,
“Towards the simulation of clinical cognition: Taking a present
illness by computer,” Amer. J. Med., vol. 60, pp. 981-996,
1976.

S. G. Pauker, “Coronary artery surgery: The use of decision
analysis,” Ann, Intern. Med., vol. 85, pp. 8-18, 1976.

S. P. Pauker and S. G. Pauker, “Prenatal diagnosis: A directive
approach to genetic counseling using decision analysis,” Yale J.
Biol, Med., vol. 50, pp. 275-289, 1977.

C. C. Peck, L. B. Sheiner, C. M. Martin et al., “Computer-
assisted digoxin therapy,” New Eng. J. Med., vol. 289, pp.
441-446, 1973.

H. V. Pipberger, ‘Clinical application of a second generation
electrocardiography computer Pprogram,’’ Amer, J, Electro-
cardiol., vol. 35, pp. 597-608, 1975.

J. S. Pliskin and C. H. Beck, “Decision analysis in individual
clinical decision making: A real-world application in treatment
of renal disease,’’ Meth. Inform. Med., vol. 15, pp. 43-46, 1976.
H. E. Pople, J. D. Myers, and R. A. Miller, “DIALOG: A model
of diagnostic logic for internal medicine,’ in Proc. 4th Int.
Joint. Conf. Artific. Intell., MIT, Cambridge, MA, 1975.

H. Pople, “The formation of composite hypotheses in diagnostic
problem solving: An exercise in synthetic reasoning,” in Proc.
Sth Int, Joint Conf. Artific. Intell., Cambridge, MA, pp. 1030-
1037, 1977.

J. Prutting, “Lack of correlation between antemortem and
postmortem diagnosis,”” NY J. Med., vol. 67, pp. 2081-2084,
1967.

H. Raiffa, Decision Analysis: Introductory Lectures on Choices
Under Uncertainty. Reading, MA: Addison-Wesley, 1968.

B. Richards and A. E. S. Goh, “Computer assistance in the
treatment of patients with acid-base and electrolyte distur-
bances,” in MEDINFO 77. Amsterdam, The Netherlands:
North-Holland, 1977, pp. 407-410.

J. Rodnick and G. Wiederhold, “Review of automated ambula-
tory medical record systems: Charting services that are of
essential benefit to the physician,” in MEDINFO 77. Amster-
dam, The Netherlands: North-Holland, 1977, pp. 957-961.

[87]

[88}

[89]

[90}

i9t]

[92]

[93]

[94]

[95]

[96]

197]

{98}

[99}

[100]

{101]

[102]

{103}

[104]

{105]

{106]

[107]

{108]

[109]

{110]

[111]

[112]

[113]

{114]

(115]

1223

R. A. Rosati, A. G. Wallace, and E. A. Stead, “The way of the
future.” Arch, Intern, Med., vol. 131, pp. 285-287, 1973.

R. D. Rosati, J. F, McNeer, C. F. Starmer ef al., “A new infor-
mation system for medical practice,” Arch, Intern. Med.. vol.
135, pp. 1017-1024, 1975.

M. B. Rosenblatt, P. K. Teng, and S. Kerpe, “Diagnostic accu-
tacy in cancer as determined by post-mortem examination,”
Prog. Clin. Cancer, vol. §, pp. 71-80, 1973.

A. D. Rubin and J. F. Risley, “The PROPHET system: An
experiment in providing a computer resource to scientists,”
in MEDINFO 77. Amsterdam, The Netherlands: North-
Holland, 1977, pp. 77-81.

C. Safran, P. N. Tsichlis, A. Z. Bluming, and J. F, Desforges,
“Diagnostic planning using computer-assisted decision making
for patients with Hodgkins disease,” Cancer, vol. 39, pp. 2426 -
2434,1977.

H. Schoolman, L. Bernstein, “Computer use in diagnosis, prog-
nosis, and therapy,’ Science, vol. 200, pp. 926-931,1978.

W. B. Schwartz, “Medicine and the computer: The promise and
problems of change,’ New Eng. J. Med., vol. 283, pp. 1257-
1264, 1979.

W. B. Schwartz, G. A. Gorry, J. P. Kassirer, and A. Essig,
“Decision analysis and clinical judgment,’ Amer, J. Med., vol.
S$. pp. 459-472, 1973.

A. C. Scott, W. Clancey, R. Davis, and E. H. Shortliffe, “Ex-
planation capabilities of knowledge-based production systems,”
Amer, J. Comput, Ling., Microfiche 62, 1977.

L. B. Sheiner, H. Halkin, C. Peck, ef al., “Improved computer-
assisted digoxin therapy,”’ Ann, Intern. Med., vol. 82. pp. 619-
627, £975.

H. Sherman, B. Reiffen, and A. L. Komaroff, ‘Ambulatory
care systems,’ in Problem-Directed and Medical Information
Systems, M.F. Driggs, Ed. New York: Intercontinental Medical
Book Corp., 1973, pp. 143-171.

M. Shimura, “Learning procedures in pattern classifiers—intro-
duction and survey,” in Proc. Int, Joint Conf. Pattern Recog.,
Kyoto, Japan, 1978, pp. 125-138.

E. H. Shortliffe, S. G. Axline, B. G. Buchanan, and S. N. Cohen,
“Design considerations for a program to provide consultations
in clinical therapeutics,” in Proc. 13th San Diego Biomed.
Symp., San Diego, CA, Feb. 1974, pp. 311-319.

E. H. Shortliffe and R. Davis, “Some considerations for the
implementation of knowledge-based expert systems,” SIGART
Newsletter, no. 55, pp. 9-12, Dec. 1975.

E. H. Shortliffe and B. G. Buchanan, “A model of inexact
reasoning in medicine,” Math. Biosci., vol. 23, pp. 351-379,
1975.

E. H. Shortliffe, Computer-Based Medical Consultations:
MYCIN. New York: Elsevier/North Holland, 1976.

Vv. Stamecka, H. N. Camp, A. N. Badre, and W. D. Hall,
“MARIS: A knowledge system for internal medicine,” Inform,
Process. Manag., vol. 13, pp. 273-276, 1977.

H. C. Sox, C. H. Sox, and R. K. Tompkins, “The training of
physicians’ assistants: The use of a clinical algorithm system,”
New Eng, J. Med., vol. 288, pp. 818-824, 1973.

N. S. Sridharan, Artif. Intell. guest editorial, vol. 11, pp. 1-4,
1978.

T. S. Startsman and R. E. Robinson, “The attitudes of medical
and paramedical personnel towards computers,” Comput,
Biomed. Res., vol. 5, pp. 218-227, 1972.

W. W. Stead, R. G. Brame, W. E. Hammond et al., “A comput-
erized obstetric medical record,” Obstet. Gyn., vol. 49, pp.
§$02-509, 1977.

P. Szolovits and S. G. Pauker, “Categorical and probabilistic
reasoning in medical diagnosis,” Artif. Intell., vol. 11, pp.
115-144, 1978.

T. R. Taylor, “Clinical decision analysis,” Meth. Inform, Med.,
vol. 15, pp. 216-224, 1976.

D. M. Vickery, ‘Computer support of paramedical personnel:
The question of quality control,” in MEDINFO 74. Amster-
dam, The Netherlands: North-Holland, 1974, pp. 281-287.

A. A. Vishnevskiy, 1. I. Artobolevskiy, and M. L. Bykovskiy,
Machine Diagnosis And Information Retrieval In Medicine In
The USSR. DHEW Publication No. (NIH) 73-424, 1973.

G. Wagner, P. Tautu, and U. Wolber, ‘Problems of medical
diagnosis: A bibliography,” Meth. Inform. Med., vol. 17, pp.
§5-74, 1978.

B. T. Walsh, W. W. Bookhein, R. C. Johnson, et ai., “Recogni-
tion of streptococcal pharyngitis in adults,” Arch. Intern.
Med., vol. 135, pp. 1493-1497, 1975

A. Wardle and L. Wardle, “Computer-aided diagnosis: A review
of research,’ Meth, Inform. Med., vol. 17, pp. 15-28, 1978.

H. R. Warner, A. F. Toronto, and L. G. Veasy, “Experience
with Bayes’ Theorem for computer diagnosis of congenital
heart disease,” Ann. N.Y. Acad. Sci., vol. 115, pp. 558-567,
1964.
1224

[116]

{117]

[113]

[119]

{120]

[121]

{122]
{123]

{124]

[125]

H. R. Warner, “Experiences with computer-based patient moni-
toring,’’ Anes. Anaglesia Current Res., vol. 47, pp. 453-461,
1968.

H. R. Warner, C. M. Olmsted, and B. D. Rutherford, “HELP—
a program for medical decision-making,’ Comput. Biomed.
Res., vol. 5, pp. 65-74, 1972.

H. R. Warner, B. D. Rutherford, and B. Houtchens, “A sequen-
tial approach to history taking and diagnosis,’’ Comput. Biomed.
Res., vol. 5, pp. 256-262, 1972.

H. R. Warner, J. D. Morgan, T. A. Pryor et al., ““HELP-—A self-
improving system for medical decision making,” in MEDINFO
74, Amsterdam, The Netherlands: North-Holland, 1974.

H. R. Warner, “Knowledge sectors for logical processing of
patient data in the HELP system,” in Proc. 2nd Annu. Symp.
Comput. Appl. Med Care (IEEE, Washington, DC), 1978, pp.
401-404.

R. J. Watson, ‘‘Medical staff response to a medical informa-
tion system with direct physician-computer interface,’ in
MEDINFO 74. Amsterdam, The Netherlands: North-Holland,
1974, pp. 299-302.

H. Wechsler, “A fuzzy approach to medical diagnosis,” Int. J.
Biomed. Comput., vol. 7, pp. 191-203, 1976.

L. L. Weed, “Medical records that guide and teach,” New Eng.
J. Med., vol. 278, pp. 593-599, pp. 652-657, 1968.

—., “Problem-oriented medical records,” in Problem-Directed
and Medical Information Systems, M. F. Driggs, Ed. New York:
Intercontinental Medical Book Corp., 1973.

S. M. Weiss, C. A. Kulikowski, S. Amarel, and A. Safir, “A

PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979

[126]

[127]

[128]

[129]

[130]

[131]

(132]

[133]

{134}

model-based method for computer-aided medical decision-
making,” Artific, Inteill., vol. 11, pp. 145-172, 1978.

S. Weyl, J. Fries, G. Wiederhold, and F. Germano, ‘‘A modular
self-describing clinical databank system,’? Comput. Biomed.
Res., vol. 8, pp. 279-293, 1975.

G. Wiederhold, J. F. Fries, and S. Weyl, “Structured organiza-
tion of clinical data bases,” in Proc. 1975 NCC, AFIPS Press,
vol. 44, 1975, pp. 479-485.

P. H. Winston, Artificial Intelligence.
Wesley, 1977.

D. Wirschafter, J. T. Carpenter, and E. Mesel, ‘‘A consultant-
extender system for breast cancer adjuvant chemotherapy,”
Ann, Intern. Med., vol. 90, pp. 396 -401, 1979.

P. M. Wortman, ‘Medical diagnosis: An information processing
approach,” Comput, Biomed, Res., vol. 5, pp. 315-328, 1972.
Vv. L. Yu, L. M. Fagan, S. M. Wraith, et al., ‘‘Antimicrobial
selection by a computerized consultant: A blinded evaluation
by infectious disease experts,” J. Amer. Med. Ass., vol. 241,
1979 (in press).

Vv. L. Yu, B. G. Buchanan, E. H. Shortliffe, et al., ““An evalua-
tion of the performance of a computer-based consultant,”
Comput. Prog. Biomed., vol. 9, pp. 95-102, 1979.

L. A. Zadeh, “Fuzzy sets,’? Inform. Cont., vol. 8, pp. 338-
353, 1965.

N. Zoltie, J. C. Horrocks, and F. T. deDombal, ‘“Computer-
assisted diagnosis of dyspepsia—report on transferability of a
system, with emphasis on early diagnosis of gastric cancer,”
Meth, Inform, Med., vol. 16, pp. 89-92, 1977.

Reading, MA: Addison-