PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979 1207 Knowledge Engineering for Medical Decision Making: A Review of Computer-Based Clinical Decision Aids EDWARD H. SHORTLIFFE, BRUCE G. BUCHANAN, anp EDWARD A. FEIGENBAUM Abstract Computer-based models of medical decision making account for a large portion of clinical computing efforts. This article reviews representative examples from each of several major medical computing paradigms. These include 1) clinical algorithms, 2) clinical databanks that include analytic functions, 3) mathematical models of physical Processes, 4) pattern recognition, 5) Bayesian statistics, 6) decision analysis, and 7) symbolic reasoning or artificial intelligence. Because the techniques used in the various systems cannot be examined exhaus- tively, the case studies in each category are used as a basis for studying general strengths and limitations. It is noted that no one method is best for all applications. However, emphasis is given to the limitations of early work that have made artificial intelligence techniques and knowl- edge engineering research particularly attractive. We stress that consid- Manuscript received December 13, 1978; revised February 20, 1979. The authors are with the Heuristic Programming Project, Departments of Medicine and Computer Science, Stanford University, Stanford, CA 94305. erable basic research in medical computing remains to be done and thai powerful new approaches may lie in the melding of two ce move es* ub- lished techniques. I. INTRODUCTION S EARLY as the 1950’s, physicians and computer scien- Awe recognized that computers could assist with clinical decision making [63] and began to analyze medical diag- nosis with a view to the potential role of automated decision aids in that domain [61]. Since that time a variety of tech- niques have been applied, accounting for at least 800 references in the clinical ard computing literature [112]. In this article we review several medical decision making paradigms and dis- cuss some issues that account for both the multiplicity of ap- proaches and the limited clinical success of most systems 0018-9219/79/0900 19N7$00.75 © 1979 IEEE 1208 developed to date. Because other authors have reviewed computer-aided diagnosis [47], [92], [114] and the potential impact of computers in medical care {93], our emphasis here is somewhat different. We will focus on the symbolic repre- sentation and use of knowledge, termed “knowledge engineer- ing,” and the inadequacies of data-intensive techniques which have led to the exploration of novel symbolic reasoning ap- proaches during the last decade. A. Reasons for Attempting Computer-Aided Medical Decision Making Because of the accelerated growth in medical knowledge, physicians have tended to specialize and to become more de- pendent upon assistance from other experts when presented with a complex problem outside their own area of expertise. The primary care physician who first sees a patient has thou- sands of tests available with a wide range of costs (both fiscal and physical) and potential benefits (ie., arrival at a correct diagnosis or optimal therapeutic management). Even the experts in a specialized field may reach very different decisions regarding the management of a specific case {131]. Diagnoses that are made, and upon which therapeutic decisions are based, have been shown to vary widely in their accuracy [26], [83], {89]. Furthermore, medical students usually learn about decision making in an unstructured way, largely through obser- vation and by emulating the thought processes they perceive to be used by their clinical mentors [53]. Thus the motivations for attempts to understand and auto- mate the process of clinical decision making have been numer- ous [114]. They are directed both at diagnostic models and at assisting with patient management decisions. Among the reasons for introducing computers into such work are the following: 1) to improve the accuracy of clinical diagnosis through ap- proaches that are systematic, complete, and able to inte- grate data from diverse sources, 2) to improve the reliability of clinical decisions by avoiding unwarranted influences of similar but not identical cases (a common source of bias among physicians), and by making the criteria for decisions explicit, and hence reproducible; 3) to improve the cost efficiency of tests and therapies by balancing the expenses of time, inconvenience, or funds against benefits and risks of definitive actions; 4) to improve our understanding of the structure of medical knowledge, with the associated development of tech- niques for identifying inconsistencies and inadequacies in that knowledge; and 5) to improve our understanding of clinical decision making, in order to improve medical teaching and to make com- puter programs more effective and easier to understand, B. The Distinction Between Data and Knowledge The models on which computer systems base their clinical advice range from data-intensive to knowledge-intensive ap- proaches. There are at least four types of knowledge that may be distinguished from pure statistical data: 1) knowiedge derived from data analysis (largely numerical); 2) judgmental or subjective knowledge; 3) scientific or theoretical knowledge; 4) high-level strategic knowledge or “self-knowledge.” If there is a chronology to the field over the last 20 years, it is that there has been progressively less dependence on “‘pure”’ PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979 observational data and more emphasis on higher level symbolic knowledge inferred from primary data. We include with do- main knowledge the category of ‘judgmental knowledge” which reflects the experience and opinions of an expert regard- ing an issue about which the formal data may be fragmentary or nonexistent. Since many decisions made in clinical medi cine depend upon this kind of judgmental expertise, it is not surprising that investigators should begin to look for ways to capture and use the knowledge of experts in decision making programs. Another reason to move away from purely data- intensive programs is that in medicine the primary data avail- able to decision makers are far from objective [20], [57]. They include subjective reports from patients, and error-prone observations [27]. Also, the terminology used in the reports is not standardized [7] and the classifications often overlap. Thus decision making aids must be knowledgeable about the unreliability of the data [57] as well as the uncertainty of the inference, For example, data-intensive programs include medical record systems which accumulate large databanks to assist with deci- sion making. There is little knowledge per se in the databank, but there ave large amounts of data which can help with deci- sions and be analyzed to provide new knowledge. A program that retrieves a patient’s record for review, or even one that identifies and retrieves the records of similar patients (match- ing some set of descriptors), is performing a data management task with little reasoning involved [36], [86]. Although there is statistical “knowledge” contained in the conditional probabil- ities generated from such a databank and utilized for Bayesian analysis, it is all numeric. At the other extreme are systems that encode and use the kind of expert knowledge which can- not be easily gleaned from databanks or literature review {75], [102]. Systems that model human reasoning or emphasize education of users tend to fall towards this end of the data- knowledge continuum. In addition to judgmental and statistical knowledge, there are other forms of information that can play an important role in computer-based clinical decision aids. For example, under- lying scientific theories and relationships are often ignored by diagnostic programs but provide the foundation for decisions made by human experts. Consider, for example, the potential utility of techniques that could effectively represent and use the basic knowledge of biochemistry, biophysics, or detailed human physiology. Biomedical modeling research offers some mathematical techniques for encoding such knowledge in cer- tain domains, but symbolic approaches and clinically useful applications are still largely unrealized. Finally, there is another kind of knowledge used by human decision makers—an understanding of reasoning processes and strategies themselves. This kind of “high-level” or “meta-level” knowledge, if incorporated into computer programs, may not only heighten their decision making performance but also aug- ment their acceptability to users by making them appear more aware of their own power, strategies, and limitations. We use the term “knowledge engineering,” then, to refer to computer-based symbolic reasoning issues such as knowledge representation, acquisition, explanation, and “self-awareness” or self-modification [19]. It is along these dimensions that knowledge-based programs differ most sharply from conven- tional calculations. For example, they can solve problems by pursuing a line of reasoning; the individual inference steps and the whole chain of reasoning may also form the basis for expla- nations of decisions. A major concern in knowledge engineering SHORTLIFFE et al.: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING is clear separation of the medical knowledge in a program from the inference mechanism that applies that knowledge to the data of individual cases. One goal of this paper is to identify, in the strengths and weaknesses of earlier work, those issues which have motivated several current researchers to investigate the automation of clinical decision aids through knowledge engineering. C. Parameters for Assessing Work in the Field Barriers to successful implementation of computer-based diagnostic systems have been analyzed on several occasions [7], [23], [106] and need not be reviewed here. However, in assessing programs it is pertinent to examine several parameters that affect the success and scope of a particular system in light of its intended users and application. Unfortunately, the medical computing literature has few descriptions of systems for which all the following issues can be assessed. 1) How accurate is the program?! 2) What is the nature of the knowledge in the system and how is it generated or acquired? 3) How is the clinical knowledge represented, and how does it facilitate the performance goals of the system described? 4) How are knowledge and clinical data used and how does this impact system performance? 5) Is the system accepted by the users for whom it is in- tended? Is the interface with the user adequate? Does the system function outside of a research setting and is it suitable for dissemination? 6) What are the limitations of the approach? An issue we have chosen not to address is the cost of a sys- tem, including the size of the required computing resource. Not only is information on this question scanty for most of the programs, but expenses generated in a research and devel- opment environment do not realistically reflect the costs one expects from a system once it is operating for service use. D, Overview of this Paper An exhaustive review of computer-aided diagnosis will not be attempted in light of the vastness of the field, and we have therefore chosen to present the prominent paradigms by dis- cussing representative examples. In separate sections we give an overview, example, and discussion of 1) clinical algorithms, 2) databank analysis, 3) mathematical models, 4) pattern rec- ognition, 5) Bayesian analysis, 6) decision theory, and 7) sym- bolic reasoning. We close each section by identifying the range of applications for which the approach appears most appro- priate, the limitations of the approach, and the ways in which symbolic reasoning techniques may strengthen the approach by improving its performance or acceptability. The seven principle examples we have selected are not neces- sarily the best nor the most successful; however, they illustrate the issues we wish to discuss within the major paradigms. We have also referenced other closely related systems, so the bibli- ography should guide the reader to more details on particular topics. Any attempt to categorize programs in this way is inherently fraught with problems in that several systems draw 1 Although this is important it is not the only measure of clinical ef- fectiveness. For example, the effects on morbidity, mortality, and length of hospital stay may also be important parameters. As we shall show, few systems have reached a stage of implementation where these param- eters could be assessed. Moreover, because of the complexity of the interacting influences that affect the usual measures of outcome, it may be difficult ever to define the marginal benefit of such systems. 1209 upon more than one paradigm. Thus we have occasionally felt obligated to simplify a topic for clarity in light of the overall purposes of this review and the limitations of the space avail- able to us. Because we are only interested here in decision making tools for use by clinicians, we have chosen to disregard systems that are designed primarily for use by researchers [39], [50], [65], [90]. Furthermore, we shall not discuss biomedical engineer- ing applications of computers, such as advanced automated instrumentation techniques (e.g., computerized tomography” ) or signal processing techniques (e.g., programs for EKG anal- ysis [79] or patient monitoring [116]). Because they do not explicitly make inferences, we have also omitted programs designed largely for data storage and retrieval with the actual analysis and decision making left to the clinician [36], [58], [124]. We have also chosen to discuss working computer programs rather than unimplemented theories or early reports of work in progress. Il. CLINICAL ALGORITHMS AND AUTOMATION A. Overview Clinical algorithms, or protocols, are flowcharts to which a diagnostician or therapist can refer when deciding how to manage a patient with a specific clinical problem [97]. Such protocols usually allow decisions to be made by carefully fol- lowing the simple branching logic, although there are built-in safeguards whereby referrals to experts are made if a patient is unusually complex. The value of a protocol depends upon the infrequency with which such referrals are made, so it is impor- tant to design algorithms that reflect an appropriate balance between safety and efficiency. In general, algorithms have been designed by expert physicians for use by paramedical personnel who have been entrusted with the performance of certain routine clinical-care tasks.? The methodology has been developed in part because of a desire to define basic medical logic concisely so that detailed training in pathophysi- ology would not be necessary for ancillary practitioners. Ex- perience has shown that intelligent high school graduates, selected in large part because of poise and warmth of person- ality, can provide excellent care guided by protocols after only four to eight weeks of training. This care has been shown to be equivalent to that given by physicians for the same limited problems, and to be accepted by physicians and patients alike for such diverse clinical situations as diabetes management [56], [66], pharyngitis [38], headache [37], and other dis- ease categories [104], [110]. The role of the computer in such applications has been limited, however. In fact, several groups initially experimented with computer representation of the algorithms but have since abandoned the efforts and resorted to prepared paper forms [56],[110]. In these cases the computer had originally guided the physician assistant’s collection of data and had specified precisely what decisions should be made or actions taken, in accordance with the clinical algorithm. However, since the algorithmic logic is generally simple, and can often be repre- sented on a single sheet of paper, the advantages of an auto- mated approach over a manual system have not been clearly ? See Kak’s article in this issue. 3 Clinical algorithms have also been prepared for use by physicians themselves, but Grimm has found that they are generally less well- accepted by doctors [38]. He showed, however, that physician per- formance could improve when protocols were used in certain settings. 1210 demonstrated. In one study Vickery showed that supervising physicians could detect no significant difference between the performance of physicians’ assistants using automated versus manual systems. although the computer system entirely elimi- nated errors in data collection (since it demanded all relevant data at the appropriate time) [110]. Furthermore, the com- puter could not, of course, decide whether the actual observa- tions entered by the physicians’ assistant were correct; yet this kind of inaccuracy was one of the most common reasons that supervisors found an assistant’s performance unsatisfactory. There are two other ways in which the computer has been used in the setting of clinical algorithms. First, mathematical techniques have been used to analyze signs and symptoms of diseases and thereby to identify those that should most ap- propriately be referenced in corresponding clinical algorithms [30!, [S81], [113]. The process for distilling expert knowl- edge in the form of a clinical algorithm can be an arduous and imperfect one [97]; formal techniques to assist with this task may prove Co be very valuable. Some researchers in this area also use computers to assist with clinical care audit comparing actual actions taken by a physicians’ assistant with those recommended by the algorithm itself. Sox et al. [104] have described a system in which the assistant’s checklist for a patient encounter was sent to a cen- tral computer and analyzed for evidence of deviation from the accepted protocol, Computer-generated reports then served as feedback to the physicians’ assistant and to the supervising physicians BB. Example We have selected for discussion a project that differs from those previously cited in that 1) computer techniques are still being used, and 2) the clinical algorithms are designed for use by Primary care physicians themselves. This is the cancer chemotherapy system developed in Alabama by Mesel er al. {70j. The algorithms were developed to allow private prac- titioners, at a distance from the regional tertiary-care center, to manage the complex chemotherapy for their cancer patients without routinely referring them to the central oncologists. Mesel ef af. have described a “consultant-extender system” that enables the primary physician to treat patients with Hodg- kin’s Disease uader the supervision of a regional specialist. Five oncologists developed a care protocol for the treatment of Hodgkin’s Disease, and this algorithm was placed on-line. Once patients had agreed to participate in the study, their private physicians would prepare “encounter forms” at the time cf each office visit. These forms would document perti- nent interval history, physical findings, and lab data, as well as chemetherapy administered. The form would then be sent to the regional center where it was analyzed by the computer and a customized clinical algorithm was produced to assist the private physician with the management of ¢hat patient during ihe next appointment. Thus the computer program would take into account the ways in which the individual patient’s disease might progress or improve and would prepare an ap- propriate clinical algorithm. This protocol was sent back to the physician in time for it to be available at the next office visit. The private practitioner was encouraged to call the regional! specialist directly if the protocol seemed in some way inadequate or additional questions arose. The authors present lata suggesting that their system was well-accepted by physi- PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979 cians and patients, and that excellent care was delivered.* Retrospective review of cases that were treated at the referral center itself, but without the use of the protocols, showed a 16-percent rate of variance from the management guidelines specified in the algorithms; there was no such variance when the protocols were followed. Thus algorithms may be effec- tive tools for the administration of complex specialized therapy in circumstances such as those described.° C. Discussion of the Methodology Although clinical algorithms are among the most widespread and best accepted of the decision aids described in this article, the simplicity of their logic makes it clear why the technique cannot be effectively applied in most medical domains. Deci- sion points in the algorithm are generally binary (i.e., a given sign or symptom is either present or absent), and there tend to be many circumstances that can arise for which the user is advised to consult the supervising physician (or specialist). Thus the difficult decision tasks are left to experts, and there is generally no formal algorithm for managing the case from that point on. It is precisely the simplicity of the algorithmic logic, and the safeguard of the supervising expert, which have permitted many algorithms to be represented on one or two sheets of paper and have obviated the need for direct computer use in most of the systems. The contributions of clinical al- gorithms to the distribution and delivery of health care, to the training of paramedics, and to quality care audit, have been impressive and substantial. However, the approach is not suitable for extension to the complex decision tasks to be dis- cussed in the following sections. IH. DATABANK ANALYSIS FOR PROGNOSIS AND THERAPY SELECTION A. Overview Automation of medical record keeping and the development of computer-based patient databanks have been major research concerns since the earliest days of medical computing. Most such systems have attempted to avoid direct interaction be- tween the computer and the physician recording the data, with the systems of Weed [123], [124] and Greenes [36] being notable exceptions. Although the earliest systems were de- signed merely as record-keeping devices, there have been several recent attempts to create programs that could also provide analyses of the information stored in the computer databank. Some early systems [36], [52] had retrieval modules that identified all patient records matching a Boolean combination of descriptors; however, further analysis of these records for decision making purposes was left to the investigator. Weed has not stressed an analytical component in his automated problem-oriented record [124], but others have developed decision aids which use medical record systems fashioned after his [103]. The systems for databank analysis all depend on the develop- ment of a complete and accurate medical record system. Once *This is an interesting result in light of Grimm’s experience men- tioned in footnote 3. One possible explanation is that physicians were more accepting of the algorithmic approach in Mesel’s case because it allowed them to perform tasks that they would previously not have been able to undertake. 5 More recently the Alabama group has reported similar success im- plementing a consultant-extender system for adjuvant chemotherapy in breast carcinoma [129]. SHORTLIFFE et al.: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING such a system is developed, a number of additional capabilities can be provided: 1) correlations among variables can be calcu- lated, 2) prognostic indicators can be measured, and 3) the response to various therapies can be compared. A physician faced with a complex management decision can look to such a system for assistance in identifying patients in the past who had similar clinical problems and can then see how those pa- tients responded to various therapies. A clinical investigator keeping the records of his study patients on such a system can use the program’s statistical capabilities for data analysis. Hence, although these applications are inherently data-intensive, the kinds of *‘knowledge” generated by specialized retrieval and statistical routines can provide valuable assistance for clinical decision makers. For example, they help avoid the inherent biases of anecdotal experience, such as occur when an individ- ual practitioner bases decisions primarily on personal encoun- ters with one or two patients having a rare disease or complex of symptoms. There are many excellent programs in this category, one of which is discussed in some detail in the next section. Several others warrant mention, however. The HELP System at the University of Utah [117], [119], [120] uses a large data file on patients in the Latter-Day Saints Hospital. Clinical experts formulate specialized ‘“‘“HELP sectors’ which are collections of logical rules that define the criteria for a particular medical decision. These sectors are developed by an interactive pro- cess; the expert proposes important criteria for a given deci- sion and is provided with actual data regarding each criterion (based on relevant patients and controls from the computer databank). The criteria in the sector are thus adjusted by the expert until adequate discrimination is made to justify using the sector’s logic as a decision tool.® The sectors are then used for a variety of tasks throughout the hospital. Another system of interest is that of Feinstein et a/. at Yale [21], in which physicians interact with the system to request assistance in estimating prognosis and guiding management for patients with lung cancer. Similarly, Rosati et al. have devel- oped a system at Duke University which uses a large databank on patients who have undergone coronary arteriography [88]. New patients can be matched against those in the databank to help determine patient prognosis under a variety of manage- ment alternatives. B. Example One of the most successful projects in this category is the ARAMIS system of Fries at Stanford University [24]. The ap- proach was designed originally for use in an outpatient rheu- matology clinic, but then broadened to a general clinical data- base system, the time-oriented databank (TOD) [126], [127], so that it could be transferred to clinics in oncology, metabolic disease, cardiology, endocrinology, and certain pediatric sub- specialties. All clinic records are kept in a tabular fomat in which a column in a large table indicates a specific clinic visit and the rows indicate the relevant clinical parameters that are being followed over time. These charts are maintained by the physicians seeing the patient in clinic, and the new column of data is later transferred to the computer databank by a tran- ® This process might be seen as a technique to assist with the formula- tion of clinical algorithms as discussed in the previous section. Another approach using databank analysis for algorithm development is described in [30]. 1211 scriptionist; in this way time-oriented data on all patients are kept current. The defined database (clinical parameters to be followed) is determined by clinical experts, and in the case of rheumatic diseases has now been standardized on a national scale [41]. The information in the databank can be used to create a prose summary of the patient’s current status, and there are graphical capabilities which can plot specific parameters for a patient over time [126]. However, it isin the analysis of stored clinical experience that the system has its greatest potential utility [25]. In addition to performing search and statistical functions such as those developed in databank systems for clinical investigation [50], [65], ARAMIS offers a prognostic analysis for a new patient when a management decision is to be made. Using the consultative services of the Stanford Im- munology Division, an individual practitioner may select clini- cal indices for his patient that he would like matched against other patients in the databank. It is imperative that such indices be selected wisely and hence with expert advice; the Stanford immunologists have found that the best descriptors for characterizing patients are often different from those that a novice chooses to use. Based on two to five such descriptors, the computer locates relevant prior patients and prepares a report outlining their prognosis with respect to a variety of endpoints (e.g., death, development of renal failure, arthritic status, pleurisy). Therapy recommendations are also generated on the basis of a response index that is calculated for the matched patients. A prose case analysis for the physician’s patient can also be generated: this readable document sum- marizes the relevant data from the databank and explains the basis for the therapeutic recommendation. The rheumatologic databank generated under ARAMIS has now been expanded to involve a national network of immu- nologists who are accumulating time-oriented data on their patients. This national project seeks in part to obtain enough data so that groups of retrieved patients will be sizable, thereby controlling for some observer variability and making the sys- tem’s recommendations more statistically defensible. C. Discussion of the Methodology Databank analysis systems have powerful capabilities to offer to the individual clinical decision maker. Furthermore, medical computing researchers recognize the potential value of large databanks in supporting many of the other decision mak- ing approaches discussed in subsequent sections. There are important additional issues regarding databank systems. 1) Data acquisition remains a major problem. Many systems have avoided direct physician-computer interaction but have then been faced with the expense and errors of transcription. The developers of one well-accepted record system still express their desire to implement a direct interface with the physician for these reasons, although they recognize the difficulties encountered in encouraging direct use of a computer system by doctors [107]. 2) Analysis of data in the system can be complicated by missing values that frequently occur, outlying values, and poor reproducibility of data across time and among physicians. Conversely, the system can itself be used to identify question- able values of tests or observations. 3) The decision aids provided tend to emphasize patient management rather than diagnosis. Feinstein’s system [21] is 1212 only useful for patients with lung cancer, for example, and the ARAMIS prognostic routines, which are designed for patient management, assume that the patient’s rheumatologic diagnosis is already known. 4) There is no formal correlation between the way expert physicians approach patient management decisions and the way the programs arrive at recommendations, Feinstein and Koss felt that the acceptability of their system would be limited by a purely statistical approach, and they therefore chose to mimic human reasoning processes to a large extent [59], but their approach appears to be an exception. 5} Data storage space requirements can be large since the decision aids of course require a comprehensive medical record system as a basic component. Slamecka has distinguished between structured and empirical approaches to clinical consulting systems [103], pointing out that databanks provide a largely empirical basis for advice, whereas structured approaches rely on judgmental knowledge elicted from the literature or from experts. It is important to note, however, that judgmental knowledge is itself based on empirical information. Even an expert’s “‘intuitions” are based on observations and “‘data collection” over years of experience. Thus one might argue that large, complete, and flexible data- banks could form the basis for large amounts of judgmental knowledge that we now have to elicit from other sources. Some researchers have indicated a desire to experiment with methods for the automatic generation of medical decision rules from databanks, and one component of the research on Slamecka’s MARIS system is apparently pointed in that direc- tion [103]. Indeed, some of the most exciting and practical uses of large databanks may be found precisely at the interface with those knowledge engineering tasks that have most con- founded researchers in medical symbolic reasoning [5]. IV. MATHEMATICAL MODELS OF PHYSICAL PROCESSES A, Overview Pathophysiologic processes can be well-described by mathe- matical formulas in a limited number of clinical problem areas. Such domains have lent themselves well to the development of computer-based decision aids since the issues are generally well-defined. The actual techniques used by such programs tend to reflect the details of the individual applications, the most celebrated of which have been in pharmacokinetics (specifically digitalis dosing), acid-base/electrolyte disorders, and respiratory care [69]. It is important that cooperating experts assist with the defini- tion of pertinent variables and the mathematical characteriza- tion of the relationships among them. The computer program requests the relevant data, makes the appropriate computa- tions, and provides a clinical analysis or recommendation for therapy. Some of the programs have also involved branched- chain logic to guide decisions about what further data are needed for adequate analysis.” Programs to assist with digitalis dosing have gradually intro- duced broader medical knowledge over the last ten years. The 7“ Branched-chain” logic refers to mechanisms by which portions of a decision network can be considered or ignored, depending upon the data on a given case. For example, in an acid-base program the anion gap might be calculated and a branch-point could then determine whether the pathway for analyzing an elevated anion gap would be required. If the gap were not elevated, that whole portion of the logic network could be skipped. PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979 earliest work was Jelliffe’s [48] and was based upon his con- siderable experience studying the pharmacokinetics of the cardiac glycosides. His computer program used mathematical formulations based on parameters such as therapeutic goals (e.g., desired predicted blood levels), body weight, renal func- tion, and route of administration. In one study he showed that computer recommendations reduced the frequency of adverse digitalis reactions from 35 percent to 12 percent [49]. Later, another group revised the Jelliffe model to permit a feedback loop in which the digitalis blood levels obtained with initial doses of the drug were considered in subsequent therapy rec- ommendations [78], [96]. More recently, a third group in Boston, noting the insensitivity of the first two approaches to the kinds of nonnumerical observations that experts tend to use in modifying digitalis therapy, augmented the pharmaco- kinetic model with a patient-specific model of clinical status [35}. Running their system in a monitoring mode, in parallel with actual clinical practice on a cardiology service, they found that each patient in the trial in whom toxicity developed had received more digitalis than would have been recommended by their program. B. Example Perhaps the best known program in this category is the inter- active system developed at Boston’s Beth Israel Hospital by Bleich. Originally designed as a program for assessment of acid-base disorders [2], it was later expanded to consider elec- trolyte abnormalities as well [3], [4]. The knowledge in Bleich’s program is a distillation of his own expertise regarding acid-base and electrolyte disorders. The system begins by col- lecting initial laboratory data from the physician seeking advice on a patient’s management. Branched-chain Jogic is triggered by abnormalities in the initial data so that only the pertinent sections of the extensive decision pathways created by Bleich are explored. The approach is therefore similar to the flow- charting techniques used by the clinical algorithms of Section II, but it involves more complex mathematical relationships than algorithms typically do. Essentially all questions asked by the program are numerical laboratory values or “yes-no” questions (e.g., “Does the patient have pitting edema?’’). De- pending upon the complexity and severity of the case, the program eventually generates an evaluation note that may vary in length from a few lines to several pages. Included are sug- gestions regarding possible causes of the observed abnormali- ties and suggestions for correcting them. Literature references are also provided with the recommendations. Although the program was made available at several East Coast institutions, few physicians accepted it as an ongoing clinical tool. Bleich points out that part of the reason for this was the system’s inherent educational impact; physicians simply began to anticipate its analysis after they had used it a few times [3]}.8 The system’s lack of sustained acceptance by physicians is probably due to more than its educational impact, however. For example, there is no feedback in the system; every patient is seen as a new case and the program has no concept of follow- ing a patient’s response to prior therapy. Furthermore, the program generates differential diagnosis lists but does not pur- sue specific etiologies; this can be particularly bothersome ® More recently he has been experimenting with the program operat- ing as a monitoring system, thereby avoiding direct interaction with the physician. SHORTLIFFE ez al.: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING when there are multiple coexistent disturbances in a patient and the program simply suggests parallel lists of etiologies without noticing or pursuing the possible interrelationships. Finally, the system is highly individualized in that it contains only the parameters and relationships that Bleich specifically thought were important to include in the logic network. Of course human consultants also give personalized advice which may differ from that obtained from other experts. However, a group of researchers in Britain [85] who compared Bleich’s program to four other acid-base/electrolyte systems, found total agreement among the programs in only 20 percent of test cases when these systems were asked to define the acid-base disturbance and the degree of compensation present. Their analysis does not reveal which of the programs reached the correct decision, however, and it may be that the results are more an indictment of the other four programs than a valid criticism of the advice from Bleich’s acid-base component. C. Discussion of the Methodologies The programs mentioned in this section differ from one another in several respects, and each tends to overlap with other paradigms we have discussed. Bleich’s program, for ex- ample, is essentially a complicated clinical algorithm interfaced with mathematical formulations of electrolyte and acid-base pathophysiology. As such it suffers from the weaknesses of all algorithmic approaches, most importantly its highly structured and inflexible logic which is unable to contend with circum- stances not specifically anticipated in the algorithm. The digi- talis dosing programs all draw on mathematical techniques from the field of biomedical modeling [40], but have recently shown more reliance on methods from other areas as well. In particular these have included symbolic reasoning methods that allow clinical expertise to be encoded and used in con- junction with mathematical techniques [35]. The Boston group that developed this most recent digitalis program is interested in similarly developing an acid-base/electrolyte sys- tem so that judgmental knowledge of experts can be interfaced with the mathematical models of pathophysiology.” There is also a large research community of mathematicians who attempt to understand and characterize physical processes by devising simulation models [40]. Although such models are largely empirical and have generally not found direct appli- cation in clinical medicine, their research role may eventually be broadened to provide practical] decision aids through inter- faces with the other paradigms described in this review. The major strength of mathematical models is their ability to capture mathematically sound relationships in a concise and efficient computer program. However, the major limitation, as with most of the paradigms discussed here, is that few areas of medicine are amenable to firm, quantitative description. Be- cause the accuracy of the results depends on correct identifica- tion of relevant parameters, the precision and certainty of the relationships among them, and the accuracy of the techniques for measuring them, mathematical models have limited appli- cability at present. Furthermore, those domains that do lend themselves to mathematical description may still benefit from interactions with symbolic reasoning techniques, as has been demonstrated in the digitalis therapy adviser [35]. > This project was described by Professor Peter Szolovits, of MIT’s clinical decision making group, during a workshop on artificial intelli- gence in medicine at the University of Tokyo, Tokyo, Japan, in Novem- ber 1978. 1213 V. STATISTICAL PATTERN-RECOGNITION TECHNIQUES A. Overview Pattern-recognition techniques define the mathematical re- lationship between measurable features and classification of objects [15], [51]. In medicine, the presence or absence cf each of several signs and symptoms in a patient may be defini- tive for the classification of the patient as ‘“abnormal” or into the category of a specific disease. They are also used for prog- nosis [1], or predicting disease duration, time course, and out- comes. These techniques have been applied to a variety of medical domains, such as image processing and signa! analysis, in addition to computer-assisted diagnosis. In order to find the diagnostic pattern, or discriminant func- tion, the method requires a training set of objects, for which the correct classification is already known, as well as reliable values for their measured features. If the form and parameters are not known for the statistical distributions underlying the features, then they must be estimated. Parametric techniques focus on learning the parameters of the probability density functions, while nonparametric (or ‘“‘distribution-free’”’) tech- niques make no assumptions about the form of the distribu- tions. After training, then, the pattern can be compared to new, unclassified objects to aid in deciding the category to which the new object belongs. !° There are numerous variations on this general approach, most notably in the mathematical techniques, used to extract char- acteristic measurements (the features) and to find and refine the pattern classifier during training. For example, linear re- gression analysis is a commonly used technique for finding the coefficients of an equation that defines a recurring pattern or category of diagnostic or prognostic interest. A class of pa- tients can be described by a feature vector Y¥ =[x,,%2,°°', X,] (where x; is one of n descriptive variables). The goal is to produce an equation relating the posterior probabilities!' of each diagnostic class to the feature vector through a set of n coefficients (@;)'?: P(D,|X) = a, x4 bagxy too + ayXp. Recent work emphasizes structural relationships among sets of features more than statistical ones. Three of the best known training criteria for the discrimi- nant function are: a) least squared error criterion: choose the function that minimizes the squared differences between predicted and observed measurement values; b) clustering criterion: choose the function that produces the tightest clusters; c) Bayes’ criterion: choose the function that has the mini- mum cost associated with incor-ect diagnoses. 4 Ten commonly used mathematical models based on these ‘Tt is possible to detect patterns, even without a known classifica- tion for objects in the training set, with so-called “unsupervised” learn- ing techniques. Also, it is possible to work with both numerical and nonnumerical measurements. ‘The posterior probability of a diagnostic class, represented as P(D;{X), is the probability that a patient falls in diagnostic category D; given that the feature vector X has been observed. 12See [62] for a study in which the coefficients are reported because of their medica! import. 13 This is one of many uses of Bayes’ Theorem, a definitional rule that relates posterior and prior probabilities. For an overview of its use as a diagnostic rule (as opposed to a training criterion) and a definition of the formula, see Section VI. 1214 criteria have been shown to produce remarkably similar diag- nostic results for the same data [7]. B. Example There are numerous papers on uses of pattern recognition methods in medicine. Armitage [11] discusses three examples of prognostic studies, with an emphasis on regression methods. Goldwyn ef al. [31] discuss uses of cluster analysis, One re- cent diagnostic application by Patrick [73] uses Bayes’ criterion to classify patients having chest pains into three categories: D,: acute myocardial infarction (MI); D.: coronary insuffi- ciency; and D3: noncardiac causes of chest pain. The need for early diagnosis of heart attacks without laboratory tests is a prevalent problem, yet physicians are known to misclassify about one third of the patients in categories D, and D, and about 80 percent of those in D3. In order to determine the correct classification, each patient in the training set was classi- fied after 3 days, based on laboratory data including electro- cardiogram (ECG) and blood data (cardiac enzymes). There remained some uncertainty about several patients with “prob- able MI.” Seventeen variables were selected from many: 9 features with continuous values (including age, heart rates, white blocd count, and hemoglobin) and 8 features with dis- crete values (sex and 7 ECG features). The training data were measurements on 247 patients. The decision rule was chosen using Bayes’ Theorem to compute the posterior probabilities of each diagnostic class given the feature vector X (¥ = [x,,x2,°°°,%X17]). Then a decision rule was chosen to minimize the probability of error by adjusting the coefficients on the feature vector X such that for the correct class D;: P(D,|X) = MAX [PD |X), PWD21X), P(WD3IX)I. The class conditional probability density functions must be estimated initially, and the performance of the decision rule depends on the accuracy of the assumed model. Using the same 247 patients for testing the approach, the trained classifier averaged 80 percent correct diagnoses over the three classes, using only data available at the time of ad- mission. Physicians, using more data than the computer, aver- aged only 50.5 percent correct over these three categories for the same patients. Training the classifier with a subset of the patients, and using the remainder for testing produced nearly as good results. C. Discussion of the Methodology The number of reported medical applications of pattern rec- ognition techniques is large, but there are also numerous prob- lems associated with the approach. The most obvious difficul- ties are choosing the set of features in the first place, collecting reliable measurements on a large sample, and verifying the initial classifications among the training data. Current tech- niques are inadequate for problems in which trends or move- ment of features are important characteristics of the categories. Also the problems for which existing techniques are accurate are those that are well characterized by a small number of features (‘dimensions of the space”). As with all techniques based on statistics, the size of the sample used to define the categories is an important considera- tion. As the number of important features and the number of relevant categories increase, the required size of the training set also increases. In one test [7], pattern classifiers trained to discriminate among 20 disease categories from 50 symptoms PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979 were correct 51-64 percent of the time. The same methods were used to train classifiers to discriminate between 2 of the diseases, from the same 50 symptoms, and produced correct diagnoses 92 -98 percent of the time. The context in which a local pattern is identified raises prob- lems related to the issue of utilizing medical knowledge. It is difficult to find and use classifiers that are best for a small decision, such as whether an area of an X-ray is inside or out- side the heart, and integrate those into a global classifier, such as one for abnormal heart volume. Accurate application of a classifier in a hospital setting also requires that the measurements in that clinical environment are consistent with the measurements used to train the classi- fier initially. For example, if diseases and symptoms are de- fined differently in the new setting, or if lab test values are reported in different ranges, or different lab tests used, then decisions based on the classification are not reliable. Pattern recognition techniques are often misapplied in medi- cal domains in which the assumptions are violated. Some of the difficulties noted above are avoided in systems that inte- grate structural knowledge into the numerical methods and in systems that integrate human and machine capabilities into single interactive systems. These modifications will overcome one of the major difficulties seen in completely automated systems, that of providing the system with good “intuitions” based on an expert’s a priori knowledge and experience [51]. VI. BAYESIAN STATISTICAL APPROACHES A, Overview More work has been done on Bayesian approaches to com- puter-based medical decision making than on any of the other paradigms we have discussed. The appeal of Bayes’ Theorem! is clear: it offers a potentially exact method for computing the probability of a disease based on observations and data regarding the frequency with which these observations are known to occur for specified diseases. In several domains the technique has been shown to be exceedingly accurate, but there are also several! limitations to the approach which we discuss below. In its simplest formulation, Bayes’ Theorem can be seen as a mechanism to calculate the probability of a disease, in light of specified evidence, from the a priori probability of the disease and the conditional probabilities relating the observations to the diseases in which they may occur. For example, suppose disease D; is one of nm mutually exclusive diagnoses under con- sideration and £& is the evidence or observations supporting that diagnosis. Then if P(D,) is the @ priori probability of the ith disease: !5 PD, lB) = PDP n > P(D;) P(E LD,) 7=1 The theorem can also be represented or derived in a variety of other forms, including an odds/likelihood ratio formulation. We cannot include a full discussion here, but any introductory statistics book or Lusted’s volume [64] presents the subject in considerable detail. ‘4 Also often referred to as Bayes’ rule, discriminant, or criterion. 'SHere P(D;|E) is the probability of the ith disease given that evi- dence E has been observed; P(E|D;) is the probability that evidence EF wiil be observed in the setting of the ith disease. SHORTLIFFE et al.: KNOWLEDGE ENGINEERING FCR MEDICAL DECISION MAKING Among the most commonly recognized problems with the utilization of a Bayesian approach is the large amount of data required to determine all the conditional probabilities needed in the rigorous application of the formula. Chart review or computer-based analysis of large databanks occasionally allows most of the necessary conditional probabilities to be obtained. A variety of additional assumptions must be made. For ex- ample: 1) the diseases under consideration are assumed mu- tually exclusive and exhaustive (i.e., the patient is assumed to have one of the n diseases), 2) the clinical observations are as- sumed to be conditionally independent over a given disease, '® and 3) the incidence of the symptoms of a disease is assumed to be stationary (i.e., the model does not allow for changes in disease patterns over time). One of the earliest Bayesian programs was Warner’s system for the diagnosis of congenital heart disease [115]. He com- piled data on 83 patients and generated a symptom- disease matrix consisting of 53 symptoms (attributes) and 35 disease entities. The diagnostic performance of the computer, based on the presence or absence of the 53 symptoms in a new pa- tient, was then compared to that of two experienced physi- cians. The program was shown to reach diagnoses with an accuracy equal to that of the experts. Furthermore, system performance was shown to improve as the statistics in the symptom-disease matrix stabilized with the addition of in- creasing numbers of patients. In 1968 Gorry and Barnett pointed out that Warner’s pro- gram had required making all 53 observations for every patient to be diagnosed, a situation which would not be realistic for many clinical applications. They therefore used a modifica- tion of Bayes’ Theorem in which observations are considered sequentially.!” Their computer program analyzed observations one at a time, suggested which test would be most useful if performed next, and included termination criteria so that a diagnosis could be reached, when appropriate, without needing to make all the observations [32]. Decisions regarding tests and termination were made on the basis of calculations of ex- pected costs and benefits at each step in the logical process. !8 Using the same symptom- disease matrix developed by Warner, they were able to attain equivalent diagnostic performance using only 6.9 tests on average.'? They pointed out that be- cause the costs of medical tests may be significant (in terms of patient discomfort, time expended, and financial expense), the use of inefficient testing sequences should be regarded as ineffective diagnosis. Warner has also more recently included Gorry and Barnett’s sequential! diagnosis approach in an appli- cation regarding structured patient history-taking [118]. The medical computing literature now includes many ex- amples of Bayesian diagnosis programs, most of which have used the nonsequential approach, in addition to the necessary assumptions of symptom independence and mutual exclusive- '€ The purest form of Bayes’ Theorem allows conditional dependen- cies and the order in which evidence is obtained to be explicitly con- sidered in the analysis. However, the number of required conditional probabilities is so unwieldy that conditional independence of observa- tions and nondependence on the order of observations are generally assumed [108]. A similar approach was devised in Russia at approximately the same time by Vishnevskiy and associates. Their analyses and a sum- mary of the impressive amount of statistical data they ave amassed are contained in [111]. 18 See the decision theory discussion in Section VI. 1? Tests for determining attributes were defined somewhat differently than they had been by Warner. Thus the maximum number of tests was 31 rather than the 53 observations used in the original study. 1218 ness of disease as discussed above. One particularly successful research effort has been chosen for discussion. B. Example Since the late 1960’s deDombal and associates, ai the Uni- versity of Leeds, England, have been studying the diagnostic process and developing computer-based decision aids using Bayesian probability theory. Their area of investigation has been gastrointestinal diseases, originally acute abdominal pair: [12] with more recent analyses of dyspepsia [44) aad gastric carcinoma [134]. Their program for assessment of acute abdominal pain was evaluated in the emergency room of their affidiated hospital [12]. Emergency physicians filled out data sheets summaziz- ing clinical and laboratory findings on 304 patients presenting with abdominal pain of acute onset. The data from these sheets became the attributes that were subjected to Bayesian analysis; the required conditional probabilities had been pre- viously compiled from a large group of patients with one of seven possible diagnoses.”° Thus the Bayesian formulation: assumed each patient had one of these ciseases and would select the most likely on the basis of recorded observations. Diagnostic suggestions were cbtained in batch inede and did not require direct interaction between physician and com puter; the program could generate resulis within 30.5 to 15 min depending upon the jevel of system use at the time of analysis [43]. Thus the computer output could have been made available to the emergency rcom physician, ia average, within S min after the data form was completed and f:anded io the technician assisting with the study. During the study [12], however, these computer-generzted diagnoses were simply saved and later compared to (a) the dtag- noses reached by the attending clinicians, and (b) the ultimate diagnosis verified at surgery or through appropriate tests. Al- though the clinicians reached the correct diagnosis in only 65-80 percent of the 304 cases (with accuracy depending upon an individual’s training and experience), the program was correct in 91.8 percent of cases. Furthermore, in 6 of the 7 disease categories the computer was proved more likely than the senior clinician in charge of a case to assign the patient to the correct disease category. Of particular interest was the program’s accuracy regarding appendicitis, a diagnosis which is often made incorrectly. In no cases cf appendicitis did the computer fail to make the correct diagnosis, and in only six cases were patients with nonspecific abdominal pain incor- rectly classified as having appendicitis. Based on the actual clinical decisions, however, over 20 patients with nonspecific abdominal pain were unnecessarily taken to surgery for ap- pendicitis, and in six cases patients with appendicitis were ‘“‘watched”’ for over eight hours before they were finaily taken to the operating room. These investigators also performed a fascinating experiment in which they compared the program’s performance based on data derived from 600 real patients, with the accuracy the sys- tem achieved using “estimates” of conditional probabuitics obtained from experts [60].7? As discussed above, the pro- 2° Appendicitis, diverticulitis, perforated ulcer, cholecystitis, smal! bowel obstruction, pancreatitis, and nonspecific abdominal pain. 21 Such estimates are referred to as ‘subjective’ or “personal” prova- bilities, and some investigators have argued that they should be used in Bayesian systems when formaliy derived conditional probabilities are not available [64]. 1216 gram was significantly more effective than the unaided clini- cian when real-life data were used. However, it performed significantly less well than clinicians when expert estimates were used. The results supported what several other observers have found, namely that physicians often have very little idea of the “‘true”’ probabilities for symptom-disease relationships. Another Leeds study of note was an analysis of the effect of the system on the performance of clinicians [13]. The trial we have mentioned that involved 304 patients was eventually extended to 552 before termination. Although the computer’s accuracy remained in the range of 91 percent throughout this period, the performance of clinicians was noted to improve markedly over time. Fewer negative laparotomies were per- formed, for example, and the number of acute appendices that perforated (ruptured) also declined. However, these data slowly returned towards baseline after the study was terminated, sug- gesting that the constant awareness of computer monitoring and feedback regarding system performance had temporarily generated a heightened awareness of intellectual processes among the hospital surgeons. C. Discussion of the Methodology The ideal matching of the problem of acute abdominal pain and Bayesian analysis must be emphasized; the technique can- not necessarily be as effectively applied in other medical do- mains where the following limitations of the Bayesian approach may have a greater impact. 1) The assumption of conditional independence of symp- toms usually does not apply and can lead to substantial errors in certain settings [72]. This has led some investigators to seek new numerical techniques that avoid the independence assumption [8]. If a pure Bayesian formulation is used with- out making the independence assumption, however, the number of required conditional probabilities becomes pro- hibitive for complex real world problems [108]. 2) The assumption of mutual exclusiveness and exhaustive- ness of disease categories is usually false. In actual practice concurrent and overlapping disease categories are common. In deDombal’s system, for example, many of the abdominal pain diagnoses missed were outside the seven “recognized” possibilities; if a program starts with an assumption that it need only consider a small number of defined likely diagnoses, it will inevitably miss the rare or unexpected cases (precisely the ones with which the clinician is most apt to need assistance). 3) In many domains it may be inaccurate to assume that relevant conditional probabilities are stable over time (e.g., the likelihood that a particular bacterium will be sensitive to a specific antibiotic). Furthermore, diagnostic categories and definitions are constantly changing, as are physicians’ obser- vational techniques, thereby invalidating data previously ac- cumulated.??_ A similar problem results from variations in @ priori probabilities depending upon the population from which a patient is drawn.?2? Some observers feel that these are major limitations to the use of Bayesian techniques [16]. In general, then, a purely Bayesian approach can so constrain problem formulation as to make a particular application un- 2? Although gradual changes in definitions or observational techniques may be statistically detectable by database analysis, a Bayesian analysis that uses such data is inevitably prone to error, 73deDombal has examined such geographic and population-based variations in probabilities and has reported early results of his analysis {14]. PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979 realistic and hence unworkable. Furthermore, even when diag- nostic performance is excellent such asin deDombal’s approach to abdominal pain evaluation, clinical implementation and system acceptance will generally be difficult. Forms of repre- sentation that allow explanation of system performance in familiar terms (i.e., a more congenial interface with physician users) will heighten clinical acceptance; it is at this level that Bayesian statistics and symbolic reasoning techniques may most beneficially interact. VII. DECISION THEORETICAL APPROACHES A. Overview Bayes’ Theorem is only one of several techniques used in the larger field of decision analysis, and there has recently been in- creasing interest in the ways in which decision theory might be applied to medicine and adapted for automation. Several ex- cellent reviews of the field are available in basic reviews [45], textbooks [84], and medically oriented journal articles [67], [94], [109]. In general terms, decision analysis can be seen as any attempt to consider values associated with choices, as well as probabilities, in order to analyze the processes by which decisions are made or should be made. Schwartz identi- fies the calculation of “expected value” as central to formal decision analysis [94]. Ginsberg contrasts medical classifica- tion problems (e.g., diagnosis) with broader decision problems (e.g., “What should I do for this patient?’’), and asserts that most important medical decisions fall in the latter category and are best approached through decision analysis [29]. The following topics are among the central issues in the field. 1} Decision Trees: The decision making process can be seen as a sequence of steps in which the clinician selects a path through a network of plausible events and actions. Nodes in this tree-shaped network are of two kinds: decision nodes, where the clinician must choose from a set of actions, and chance nodes, where the outcome is not directly controlled by the clinician but is a probabilistic response of the patient to some action taken. For example, a physician may choose to perform a certain test (decision node) but the occurrence or nonoccurrence of complications may be largely a matter of statistical likelihood (chance node). By analyzing a difficult decision process before taking any actions, it may be possible to delineate in advance all pertinent chance and decision nodes, all plausible outcomes, plus the paths by which these out- comes might be reached. Furthermore, data may exist to allow specific probabilities to be associated with each chance node in the tree. 2) Expected Values: In actual practice physicians make sequential decisions based on more than the probabilities as- sociated with the chance node that follows. For example, the best possible outcome is not necessarily sought if the costs associated with that ‘‘path” far outweigh those along alternate pathways (e.g., a definitive diagnosis may not be sought if the required testing procedure is expensive or painful and patient management will be unaffected; similarly, some patients prefer to “‘live with’ an inguinal hernia rather than undergo a surgical repair procedure). Thus anticipated ‘“‘costs’’ (financial, compli- cations, discomfort, patient preference) can be associated with the decision nodes. Using the probabilities at chance nodes, the costs at decision nodes, and the ‘“‘value” of the various outcomes, an ‘“‘expected value” for each pathway through the SHORTLIFFE et al: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING tree (and in turn each node) can be calculated. The ideal path- way, then, is the one which maximizes the expected value. 3) Eliciting Values: Obtaining from physicians and patients the costs and values they associate with various tests and out- comes can be a formidable problem, particularly since formal analysis requires expressing the various costs in standardized units. One approach has been simply to ask for value ratings on a hypothetical scale, but it can be difficult to get the physi- cian or patient to keep the values?* separate from their knowl- edge of the probabilities linked to the associated chance nodes. An alternate approach has been the development of lottery games. Inferences regarding values can be made by identifying the odds, in a hypothetical lottery, at which the physician or patient is indifferent regarding taking a course of action with certain outcome and betting on a course with preferable out- come but with a finite chance of significant negative costs if the “‘bet” is lost. In certain settings this approach may be ac- cepted and provide important guidelines in decision making {77}. 4) Test Evaluation: Since the tests which lie at decision nodes are central to clinical decision analysis, it is crucial to know the predictive value of tests that are available. This leads to consideration of test sensitivity, specificity, receiver opera- tor characteristic curves, and sensitivity analysis. Such issues are discussed by Komaroff in this issue [57] and have also been summarized elsewhere in the clinical literature [68] . Many of the major studies of clinical decision analysis have not specifically involved computer implementations. Schwartz et al. examined the workup of renal vascular hypertension, developing arguments to show that for certain kinds of cases a purely qualitative theoretical approach was feasible and use- ful [94]. However, they showed that for more complex clini- cally challenging cases the decisions could not be adequately sorted out without the introduction of numerical techniques. Since it was impractical to assume that clinicians would ever take the time to carry out a detailed quantitative decision analysis by hand, they pointed out the logical role for the computer in assisting with such tasks and accordingly de- veloped the system we discuss as an example below [33]. Other colleagues of Schwartz at Tufts have been similarly active in applying decision theory to clinical problems. Pauker and Kassirer have examined applications of formal cost- benefit analysis to therapy selection [74] and Pauker has also looked at possible applications of the theory to the management of patients with coronary artery disease [76]. An entire issue of the New England Journal of Medicine has also been devoted to papers on this methodology [46]. B. Example Computer implementations of clinical decision analysis have appeared with increasing frequency since the mid-1960’s. Perhaps the earliest major work was that of Ginsberg at Rand Corporation [28], with more recent systems reported by Pliskin and Beck [80] and Safran et al. [91]. We will briefly describe here the program of Gorry et al., developed for the management of acute renal failure [33]. Drawing upon Gorry’s experience with the sequential Bayesian approach previously mentioned [32], the investigators recog- nized the need to incorporate some way of balancing the 24 Also termed “utilities” in some references; hence, the term “utility theory” [84]. 1217 dangers and discomforts of a procedure against the value of the information to be gained. They divided their program into two parts: phase I considered only tests with minimal risk (e.g., history, examination, blood tests) and phase II con- sidered procedures involving more risk and inconvenience. The phase I program considered 14 of the most common causes of renal failure and used a sequential test selection process based on Bayes’ Theorem and omitting more advanced decision theoretical techniques [32}. The conditional prob- abilities used were subjective estimates obtained from an expert nephrologist and were therefore potentially as proble- matic as those discussed by Leaper et al, [60] (see Section VI-B). The researchers found that they had no choice but to use expert estimates, however, since detailed quantitative data were not available either in databanks or the literature. It is in the phase II program that the methods of decision theory were employed because it was in this portion of the decision process that the risks of procedures became important considerations. At each step in the decision process this program considers whether it is best to treat the patient im- mediately or to first carry out an additional diagnostic test. To make this decision the program identifies the treatment with the highest current expected value (in the absence of further testing), and compares this with the expected values of treatments that could be instituted if another diagnostic test were performed. Comparison of the expected values are made in light of the risk of the test in order to determine whether the overall expected value of the test is greater than that of immediate treatment. The relevant values and prob- abilities of outcomes of treatment were obtained as subjective estimates from nephrologists in the same way that symptom- disease data had been obtained. Alli estimates were gradually refined as they gained experience using the program, however. The program was evaluated on 18 test cases in which the true diagnosis was uncertain but two expert nephrologists were willing to make management decisions. In 14 of the cases the program selected the same therapeutic plan or diagnostic test as was chosen by the experts. For three of the four remaining cases the program’s decision was the physi- cians’ second choice and was, they felt, a reasonable alterna- tive plan of action. In the last case the physicians also ac- cepted the program’s decision as reasonable although it was not among their first two choices. C. Discussion of the Methodology The excellent performance of Gorry’s program, despite its reliance on subjective estimates from experts, may serve to emphasize the importance of the clinical analysis that under- lies the decision theoretical approach. The reasoning steps in managing clinical cases have been dissected in such detail that small errors in the probability estimates are apparently much less important than they were for deDombal’s purely Bayesian approach [60]. Gorry suggests this may be simply because the decisions made by the program are based on the combina- tion of large aggregates of such numbers, but this argument should apply equally for a Bayesian system. It seems to us more likely that distillation of the clinical domain in a formal decision tree gives the program so much more knowledge of the clinical problem that the quantitative details become somewhat less critical to overall system operation. The ex- plicit decision network is a powerful knowledge structure; the “knowledge” in deDombal’s system lies in conditional probabilities alone and there is no larger scheme to override 1218 the propagation of error as these probabilities are mathemati- cally manipulated by the Bayesian routines. The decision theory approach is not without problems, however. Perhaps the most difficult problem is assigning numerical values (e.g., dollars) to a human life or a day of health, etc. Some critics feel this is a major limitation to the methodology [120]. Overlapping or coincidental diseases are also not well-managed, unless specifically included in the analysis, and the Bayesian foundation for many of the calcula- tions still assumes mutually exclusive and exhaustive disease categories. Problems of symptom conditional dependence still remain, and there is no easy way to include knowledge regarding the time course of diseases. Gorry points out that his program was also incapable of recognizing circumstances in which two or more actions should be carried out concurrently. Furthermore, decision theory per se does not provide the kind of focusing mechanisms that clinicians tend to use when they assume an initia! diagnostic hypothesis in dealing with a patient and discard it only if subsequent data make that hypothesis no tonger tenable. Other similar strategies of clinical reasoning are becoming increasingly well-recognized {53] and account in large part for the applications of symbolic reasoning techniques to be discussed in the next section. VIII. SymBoLtic REASONING APPROACHES A. Overview In the early 1970’s researchers at several institutions simul- taneously began to investigate potential clinical applications of symbolic reasoning techniques drawn from the branch of computer science known as artificial intelligence (Al). The field is well-reviewed in a recent book by Winston [128]. The term “‘artificial intelligence’ is generally accepted to include those computer applications that involve symbolic inference rather than strictly numerical calculations. Exam- ples include programs that reason about mineral exploration, organic chemistry, or molecular biology; programs that con- verse in English and understand spoken sentences; and pro- grams that generate theories from observations. Such programs gain their power from qualitative, ex peri- ential judgments, codified in so-called “rules-of-thumb” or “heuristics.” in contrast to numerical calculation programs whose power derives from the analytica} equations used. The heuristics focus the attention of the reasoning program on parts of the problem that seem most critical and parts of the knowledge base that seem most relevant. They also guide the application of the domain knowledge to an individual case by deleting items from consideration as well as by focusing on items. The result is that these programs pursue a line of rea- soning as opposed to following a sequence of steps in a calcula- tion. Among the earliest symbolic inference programs in medicine was the diagnostic interviewing system of Klein- muntz [54]. Other early work included Wortman’s informa- tion processing system, the performance of which was largely motivated by a desire to understand and simulate the psycho- logical processes of neurologists reaching diagnoses [130]. It was a landmark paper by Gorry in 1973, however, that first critically analyzed conventional approaches to computer- based clinical decision making and outlined his motivation for turning to newer symbolic techniques [34]. He used the acute renal failure program discussed in Section VII-B [33] as an PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979 example of the problems arising when decision analysis is used alone. In particular, he analyzed some of the cases on which the program had failed but the physicians considering the cases had performed well. His conclusions from these observations include the following four points. 1) Clinical judgment is based less on detailed knowledge of pathophysiology than it is on gross chunks of knowledge and a good deal of detailed experience from which rules of thumb are derived. 2) Clinicians know facts, of course, but their knowledge is also largely judgmental. The rules they learn allow them to focus attention and generate hypotheses quickly. Such heuris- tics permit them to avoid detailed search through the entire problem space. 3) Clinicians recognize levels of belief or certainty asso- ciated with many of the rules they use, but they do not routinely quantitate or use these certainty concepts in any formal statistical manner. 4) It is easier for experts to state their rules in response to perceived misconceptions in others than it is for them to generate such decision criteria @ priori. In the renal failure program medical knowledge had been embedded in the structure of the decision tree. This knowl- edge was never explicit, and additions to the experts’ judg- mental rules had generally required changes to the tree itself. Based on observations such as those above, Gorry identified at least three important problems for investigation. 1) Medical Concepts: Clinical decision aids had tradition- ally had no true ‘‘understanding”’ of medicine. Although ex- plicit decision trees had given the decision theory programs a greater sense of the pertinent associations, medical knowledge and the heuristics for problem solving in the field had never been explicitly represented nor used. So-called ““common sense” was often clearly lacking when the programs failed, and this was often what most alienated potential physician users. 2) Conversational Capabilities: Both for capturing knowl- edge from collaborating experts, and for communicating with physician users, Gorry argued that further research on the de- velopment of computer-based linguistic capabilities was crucial. 3) Explanation: Diagnostic programs had seldom empha- sized an ability to explain the basis for their decisions in terms understandable to the physician. System acceptability was therefore inevitably limited; the physician would often have no basis for deciding whether to accept the program’s advice, and might therefore resent what could be perceived as an at- tempt to dictate the practice of medicine. Gorry’s group at MIT and Tufts developed new approaches to explaining the renal failure problem in light of these obser- vations [75]. Due to the limitations of the older techniques, it was per- haps inevitable that some medical researchers would turn to the AI field for new techniques. Major research areas in AI include knowledge representation, heuristic search, natural language understanding and generation, and models of thought processes—all topics clearly pertinent to the problems we have been discussing. Furthermore, AI researchers were beginning to look for applications to which they could apply some of the techniques they had developed in theoretical domains. This community of researchers has grown in recent years, and a recent issue of Artificial Intelligence was devoted entirely SHORTLIFFE et al.: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING to applications of AI to biology, medicine, and chemistry {105} .25 Among the programs using symbolic reasoning techniques are several systems that have been particularly novel and suc- cessful. At the University of Pittsburgh, Pople and Myers have developed a system called INTERNIST that assists with test selection for the diagnosis of ali diseases in internal medicine [81]. This awesome task has been remarkably suc- cessful to date, with the program correctly diagnosing a large percentage of complex cases selected from clinical pathologic conferences in the major medical journals.2° The program uses a hierarchic disease categorization, an ad hoc scoring system for quantifying symptom-disease relationships, plus some clever heuristics for focusing attention, discriminating between competing hypotheses, and diagnosing concurrent diseases [82]. The system currently has a limited human interface, however, and is not yet implemented for clinical trials. Weiss, Kulikowski, and Amarel (Rutgers University) and Safir (Mt. Sinai Hospital, New York City) have developed a model of reasoning regarding disease processes in the eye, specifically glaucoma [125]. In this specialized application area it has been possible to map relationships between observa- tions, pathophysiologic states, and disease categories. The resulting causal associational network (termed CASNET) forms the basis for a reasoning program that gives advice regarding disease states in glaucoma patients and generates management recommendations. The system is undergoing evaluation by a nationwide network of ophthalmologists but is not yet offered for routine clinical use. For the Al researchers the question of how best to manage uncertainty in medical reasoning remains a central issue. The programs mentioned have developed ad hoc weighting systems and avoided formal statistical approaches. Others have turned to the work of statisticians and philosophers of science who have devised theories of approximate or inexact reasoning. For example, Wechsler [122] describes a program that is based upon Zadeh’s fuzzy set theory [133], and Shortliffe and Buchanan [101] have turned to confirmation theory for their model of inexact reasoning. B. Example The symbolic reasoning program selected for discussion is the MYCIN System at Stanford University [102]. The re- searchers cited a variety of design considerations which moti- vated the selection of AI techniques for the consultation system they were developing [99]. They primarily wanted it to be useful to physicians and therefore emphasized the selec- tion of a problem domain in which physicians had been shown to err frequently, namely the selection of antibiotics for patients with infections. They also cited human issues that they felt were crucial to make the system acceptable to 75 Many of the systems which use AI techniques for medical decision making were developed on the SUMEX-AIM computing resource, a nationally shared system devoted entirely to applications of AI to the biomedical sciences. The SUMEX-AIM computer is physically located at Stanford University but is used by researchers nationwide via connec- tions to computer networks. The resource is funded by the Division of Research Resources, Biotechnology Branch, National Institutes of Health. 76 Data communicated by Drs. Pople and Myers at the Fourth Annual A.I.M. Workshop, Rutgers University, June 1978. 1219 physicians: 1) it should be able to explain its decisions in terms of a line of reasoning that a physician can understand; 2) it should be able to justify its performance by responding to questions expressed in simple English: 3) it should be able to “learn” new information rapidly by interacting directly with experts; 4) its knowledge should be easily modifiable so that per- ceived errors can be corrected rapidly before they recur in another case; and 5) the interaction should be engineered with the user in mind (in terms of prompts, answers, and information volunteered by the system as well as by the users). All these design goals were based on the observation that previous computer decision aids had generally been poorly accepted by physicians, even when they were shown to per- form well on the tasks for which they were designed. MYCIN’s developers felt that barriers to acceptance were largely concep- tual and could be counteracted in large part if a system were perceived as a clinical tvol rather than a dogmatic replacement for the primary physician’s own reasoning. Knowledge of infectious diseases is represented in MYCIN as production rules, each containing a “packet” of knowledge obtained from collaborating experts [102].27 A production rule is simply a conditional statement which relates observa- tions to associated inferences that may be drawn. For exam- ple, a MYCIN rule might state that “if a bacterium is a gram positive coccus growing in chains, then it is apt to be a strepto- coccus.”” MYCIN’s power is derived from such rules in a variety of ways: 1) it is the program that determines which rules to use and how they should be chained together to make decisions about a specific case ;7° 2) the rules can be stored in a machine-readable format but translated intu English for display to physicians: 3) by removing, altering, or adding rules, the system’s knowledge structures can be rapidly modified without explicitly restructuring the entire knowledge base; and 4) the rules themselves can often form a coherent ex plana- tion of system reasoning if the relevant ones are trans- lated into English and displayed in response to a user’s question. Associated with all rules and inferences are numerical weights reflecting the degree of certainty associated with them. These numbers, termed certainty factors, form the basis for the SyS- tem’s inexact reasoning [101]. They allow the judgmental knowledge of experts to be captured in rule form and then used in a consistent fashion. The MYCIN system has been evaluated regarding its per- formance at therapy selection for patients with either septi- cemia [132] or meningitis [131]. The program performs comparably with experts in these two task domains, but as yet it has no rules regarding the other infectious disease prob- lem areas. Further knowledge base development will there- fore be required before MYCIN is made available for clinical use; hence, questions regarding its acceptability to physicians ?? Production rules are a technique frequently employed in AI re- search [9] and effectively applied to other scientific problem domains [6]. 28 The control structure used is termed “goal-oriented” and is similar to the consequent-theorems used in Hewitt’s PLANNER (42]. 1220 cannot yet be assessed. However, the required implementation stages have been delineated [100], attention has been paid to all the design criteria mentioned above, and the program does have a powerful explanation capability [95]. C. Discussion of the Methodology Whereas the computations used by the other paradigms mostly involve straightforward application of well-developed computing techniques, artificial intelligence methods are largely experimental; new approaches to knowledge represen- tation, language understanding, heuristic search, and the other symbolic reasoning problems we have mentioned are still needed. Thus the Al programs tend to be developed in re- search environments where short-term practical results are unlikely to be found. However, out of this research are emerging techniques for coping with many of the problems encountered by the other paradigms we have discussed. AI researchers have developed promising methods for handling concurrent diseases [82], [125], assessing the time course of disease [18], and acquiring adequate structured knowledge from experts [11]. Furthermore, inexact reasoning tech- niques have been developed and implemented [101] (although they tend to be justified largely on intuitive grounds). In addition, the techniques of artificial intelligence provide a way to respond to many of Gorry’s observations regarding the three major inadequacies of prior paradigms as described in Section VIII-A: 1) the medical AI programs all tend to stress the representation of medical knowledge and a sense of under- standing the underlying concepts; 2) many of them have conversational capabilities which draw on language processing research; and 3) explanation capabilities have been a primary focus of systems such as MYCIN. Szolovits and Pauker have recently reviewed some applica- tions of AI to medicine and have attempted to weigh the successes of this young field against the very real problems that lie ahead [108]. They identify several deficiencies of current systems. For example, termination criteria are still poorly understood. Although INTERNIST can diagnose simultaneous diseases, it also pursues all abnormal findings to completion, even though a clinician often ignores minor un- explained abnormalities if the rest of a patient’s clinical status is well understood. In addition, although some of these pro- grams now cleverly mimic the reasoning styles observed in experts [17], [53], it is less clear how to keep the systems from abandoning one hypothesis and turning to another one as soon as new information suggests another possibility. Pro- grams that operate this way appear to digress from one topic to another—a characteristic that decidedly alienates a user regardless of the validity of the final diagnosis or advice. Still largely untapped is the power of an AI program to understand its own knowledge base, i.e., the structure and content of the reasoning mechanisms as well as of the medical facts. In effect, Al programs have the ability to “know what they know,” the best working example of which can be found in the prototype system named Teiresias [10]. Because such programs can reason about their own knowledge, they have the power to encode knowledge about strategies, e.g., when to use and when to ignore specific items of medical knowledge and which leads to follow up on. Such “meta-level’’ knowl- edge offers a new dimension to the design of “intelligent assistant’? programs which we predict will be exploited in medical decision making systems of the future. PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979 IX. CONCLUSIONS This review has shown that there are two recurring questions regarding computer-based clinical decision making: 1) Performance-—how can we design systems that reach better, more reliable decisions in a broad range of appli- cations, and 2) Acceptability-how can we more effectively encourage the use of such systems by physicians or other intended users? We shall summarize these points separately by reviewing many of the issues common to all the paradigms discussed in this paper. A, Performance Issues Central to assuring a program’s adequate performance is a matching of the most appropriate technique with the problem domain. We have seen that the structured logic of clinical algorithms can be effectively applied to triage functions and other primary care problems, but they would be less naturally matched with complex tasks such as the diagnosis and manage- ment of acute renal failure. Good statistical data may support an effective Bayesian program in settings where diagnostic categories are small in number, nonoverlapping, and well- defined, but the inability to use qualitative medical knowledge limits the effectiveness of the Bayesian approach in more difficult patient management or diagnostic environments. Similarly, mathematical models may support decision making in certain well-described fields in which observations are’ typically quantified, and related by functional expressions, but in which the knowledge is typically limited to numerical encoding. These examples, and others, demonstrate the need for thoughtful consideration of the technique most appro- priate for managing a clinical problem. In general the simplest effective approach is to be preferred,?? but acceptability issues must also be considered as discussed below. As researchers have ventured into more complex clinical domains, a number of difficult problems have tended to de- grade the quality of performance of computer-based decision aids. Significant clinical problems require large knowledge bases that contain complex interrelationships including time and functional dependencies. The knowledge of such domains is inevitably open-ended and incomplete, so the knowledge base must be easily extensible. Not only does this require a flexible representation of knowledge, but it encourages the development of novel techniques for the acquisition and inte- gration of new facts and judgments. Similarly, the inexactness of medical inference must somehow be represented and mani- pulated within effective consultation systems. As we have discussed, all these performance issues are important knowl- edge engineering research problems for which artificial intelli- gence already offers promising new methods. It is also important to consider the extent to which a pro- gram’s “understanding” of its task domain will heighten its performance, particularly in settings where knowledge of the field tends to be highly judgmental and poorly quantified. We 291¢ is also always appropriate to ask whether computer-based ap- proaches are needed at all for a given decision making task. For all but the most complex clinical algorithms, for example, the developers have tended to discard computer programs. Similarly, Schwartz et ai. pointed out that the decision analyses can often be successfully accom- plished in a qualitative manner using paper and pencil [94]. SHORTLIFFE et al.: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING use the term “understanding” here to refer to a -program’s ability to reason about, as well as reason with, its medical knowledge base. This implies a substantial amount of judg- mental or structural knowledge (in addition to data) contained within the program. Analyses of human clinical decision making [17], [53] suggest that as decisions move from simple to complex, a physician’s reasoning style becomes less algo- rithmic and more heuristic, with qualitative judgmental knowl- edge and the conditions for invoking it coming increasingly into play. Furthermore, the performance of complex decision aids will also be heightened by the representation and utiliza- tion of high-level ‘‘meta-knowledge” that permits programs to understand their own limitations and reasoning strategies. In order to design medical computing programs with these capa- bilities, the designers themselves will have to become cognizant of “knowledge engineering” issues. It is especially important that they find effective ways to match the knowledge struc- tures they use to the complexity of the tasks their programs are designed to undertake. B. Acceptability Issues A recurring observation as one reviews the literature of computer-based medical decision making is that essentially none of the systems has been effectively used outside of a research environment, even when its performance has been shown to be excellent! This suggests that it is an error to concentrate research primarily on methods for improving the computer’s decision making performance when clinical impact depends on solving other problems of acceptance as well, There are some data [106] to support the extreme view that the biases of medical personnel against computers are so strong that systems will inevitably be rejected, regardless of perfor- mance. However, we are beginning to see examples of applica- tions in which initial resistance to automated techniques has gradually been overcome through the incorporation of ade- quate system benefits [121]. Perhaps one of the most revealing lessons on this subject is an observation regarding the system of Mesel et al. {70] de- scribed in Section II-B. Despite documented physician resis- tance to clinical algorithms in other settings [38], the physi- cians in Mesel’s study accepted the guidance of protocols for the management of chemotherapy in their cancer patients. It is likely that the key to acceptance in this instance is the fact that these physicians had previously had no choice but to refer their patients with cancer to the tertiary care center in Bir- mingham where all complex chemotherapy was administered. The introduction of the protocols permitted these physicians to undertake tasks that they had previously been unable to do. It simultaneously allowed maintenance of close doctor- patient relationships and helped the patients avoid frequent long trips to the center. The motivation for the physician to use the system is clear in this case. It is reminiscent of Rosati’s assertion that physicians will first welcome computer decision aids when they become aware that colleagues who are using them have a clear advantage in their practice [87]. A heightened awareness of ‘human engineering’’ issues among medical computing researchers will also make com- puters more acceptable to physicians by making the programs easier and more pleasant to use. Fox has recently reviewed this field in detail [22]. The issues range from the mechanics of interaction with the computer (e.g., using display terminals with such features as light pens, special keyboards, color, and 1221 graphics) to the features of the program that make it appear as a helpful tool rather than a complicating burden. Also involved, from both the mechanical and global design sides, is the development of flexible interfaces that tailor the style of the interaction to the needs and desires of individual physicians. Adequate attention must also be given to the severe time constraints perceived by physicians. Ideally they would like programs to take no more time than they currently spend when accomplishing the same task on their own. Time and schedule pressures are similarly likely to explain the greater resistance to automation among interns and residents than among medical students or practicing physicians in Starts- man’s study [106]. The issue of a program’s “self-knowledge” impacts on the acceptance of consultation systems in much the same way as it does upon program performance. Decision makers, in general, and physicians, in particular, will place more trust in systems that appear to understand their own limitations and capa- bilities, and that know when to admit ignorance of a problem area or inability to support any conclusion regarding an individual patient. Moreover, physicians will have a means for checking up on these automated assistants if the programs have an ability to explain not only the reasoning chain lead- ing to their decisions but their problem solving strategies also. High-level knowledge, including a sense of scope and limita- tions, may thus allow a program to know enough about it- self to prevent its own misuse. Furthermore, since systems that are not easily modifiable tend not to be accepted, meta- level knowledge about representation and interconnections within the knowledge base may help overcome the problem of programs becoming tied too closely to a store of knowledge that is regionally or temporally specific. It is therefore im- portant to stress that considerations such as those we have mentioned here may argue in favor of using symbolic reason- ing techniques even when a somewhat less complex approach might have been adequate for the decision task itself. IX. SUMMARY In summary, the trend towards increased use of knowledge engineering techniques for clinical decision programs stems from the dual goals of improving the performance and increas- ing the acceptance of such systems. Both acceptability and performance issues must be considered from the outset in a system’s design because they dictate the choice of methodology as much as the task domain itself does. As greater experience is gained with these techniques, and as they become better known throughout the medical computing community, it is likely that we will see increasingly powerful unions between symbolic reasoning and the alternate paradigms we have dis- cussed. One lesson to be drawn lies in the recognition that much basic research remains to be done in medical computing, and that the field is more than the application of established computing techniques to medical problems. ACKNOWLEDGMENT We wish to thank R. Blum, L. Fagan, J. King, J. Kunz, H. Sox, and G. Wiederhold for their thoughtful advice in review- ing earlier drafts of this paper. We are also grateful to Dr. Herbert Sherman and the reviewers for their constructive suggestions regarding revisions. 1222 [1] [2] [3] [4] [5] [6] (7) {8] 19} [10] f11] [12] 113} [14] 1s} [16] {17} [18] {19} [20] {21] [22] [23) [24] [25] [26] [27] [28] 129} REFERENCES P. Armitage and E. A. Gehan, “Statistical methods for the identification and use of prognostic factors,’ Int. J. Cancer, vol. 13, pp. 16-36, 1974. H. L. Bleich, “Computer evaluation of acid-base disorders,” J. Clin, Invest., vol. 48, pp. 1689-1696, 1969. —., “The computer as a consultant,” New Eng. J. Med., vol. 284, pp. 141-147, 1971. -——, “Computer-based consultation: Electrolyte and acid- base disorders,” Amer. J. Med., vol. 53, pp. 285-291, 1972. R. L. Blum and G. Wiederhold, “Inferring knowledge from clinical data banks: Utilizing techniques from artificial intel- ligence,”’ in Proc, 2nd Annu. Symp. Comput. Appl. Med. Care (IEEE, Washington, DC}, pp. 303-307, Nov. 1978. B. G. Buchanan and E. A. Feigenbaum, “Dendral and meta- dendral: Their applications dimension,” Artific. Intell., vol. 11, pp. 5-24, 1978. D. J. Croft, “Is computerized diagnosis possible?’? Comput. Biomed. Res., vol. 5, pp. 351-367, 1972. J. Cumberpatch and H. S. Heaps, “A disease-conscious method for sequential diagnosis by use of disease probabilities without assumption of symptom independence,” Int. J. Biomed. Comput., vol. 7, pp. 61-78, 1976. R. Davis and J. King, “An overview of production systems,”’ in Machine Representation of Knowledge, E. W. Elcock and D. Michie, Eds. New York: Wiley, 1976. R. Davis, “Applications of meta-level knowledge to the con- struction, maintenance, and use of large knowledge bases,” Heuristic Programming Project, Stanford Univ., Stanford, CA, Memo HPP.-76-7, July 1976. “Interactive transfer of expertise: Acquisition of new inference rules,” in Proc. of Sth Int. Joint Conf. Artific. Intell. (Cambridge, MA), 1977. F. T. deDombal, D. J. Leaper, J. R. Staniland et al., “Computer- aided diagnosis of acute abdominal pain,” Brit. Med. J., vol. 2, pp. 9-13, 1972. F. T. deDombal, D. J. Leaper, J. C. Horrocks et al., “Human and computer-aided diagnosis of abdominal pain: Further report with emphasis on performance of clinicians,” Brit. Med. J., vol. 1, pp. 376-380, 1974. F. T. deDombal and F. Gremy, Eds. Decision Making And Medical Care: Can Information Science Help? Amsterdam, The Netherlands: North-Holland, 1976. R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973. W. Edwards, ‘‘N = 1: Diagnosis in unique cases,” in Computer Diagnosis And Diagnostic Methods, J. A. Jacquez, Ed. Spring- field, IL: Charles C. Thomas, 1972, pp. 139-151. A. S. Elstein, L. S. Shulman, and S. A. Sprafka, Medical Prob- lem Solving: An Analysis of Clinical Reasoning. Cambridge, MA: Harvard Univ. Press, 1978. L. M. Fagan, “Knowledge engineering for dynamic clinical settings: Giving advice in the intensive care unit,’? Heuristic Programming Project, Stanford Univ., Stanford, CA. (Doctoral dissertation), 1979. E. A. Feigenbaum, “The art of artificial intelligence: Themes and case studies of knowledge engineering,” in AFIPS Conf. Proc., NCC, 1978, vol. 47. Montvale, NJ: AFIPS Press, 1978, p. 227. A. R. Feinstein, “Quality of data in the medical record,” Comput. Biomed. Res., vol. 3, pp. 426-435, 1970. A. R. Feinstein, J. F. Rubinstein, and W. A. Ramshaw, “Esti- mating prognosis with the aid of a conversational mode com- puter program,” Ann. Intern. Med., vol. 76, pp. 911-921, 1972. J. Fox, “Medical computing and the user,” Int. J, Man-Mach. Stud., vol. 9, pp. 669-686, 1977.” R. B. Friedman and D. H. Gustafson, “Computers in clinical medicine: A critical review,” Comput. Biomed, Res., vol. 8, pp. 199-204, 1977. J. F. Fries, ‘“Time-oriented patient records and a computer databank,” J. Amer. Med. Ass., vol. 222, pp. 1536-1542, 1972. -—, “A data bank for the clinician?’’ (editorial), New Eng. J. Med., vol. 294, pp. 1400-1402, 1976. L. H. Garland, “Studies on the accuracy of diagnostic proce- dures,” Amer. J. Roentgen., vol. 82, pp. 25-38, 1959. P. W. Gilt, D. J. Leaper, P. J. Guillou e¢ al., ‘‘Observer variation in clinical diagnosis: A computer-aided assessment of its magni- tude and importance,” Meth. Inform. Med., vol. 12, pp. 108- 113, 1973. A. S. Ginsberg, ‘‘Decision analysis in clinical patient manage- ment with an application to the pleural effusion syndrome,”’ The Rand Corp., Santa Monica, CA, R-751-RC/NLM, July 1971. , “The diagnostic process viewed as a decision problem,” PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979 [30] [31] [32] [33] [34] [35] [36] [37] [38] 139} [40] [41] [42] [43] [44] [45] [46] [47] [48] {49} [50] [51] [52] [53] [54] [55] [56} [57] [58] [59} in Computer Diagnosis and Diagnostic Methods, J. A. Jacquez, Ed. Springfield, IL: Charles C. Thomas, 1972. M. A. Gleser and M. F. Collen, ‘‘Towards automated medical decisions,” Comput. Biomed. Res., vol. 5, pp. 180-189, 1972. R. M. Goldwyn, H. P. Friedman, and J. H. Siegel, ‘Iteration and interaction in computer data bank analysis: As case study in the physiologic classification and assessment of the critically ill,” Comput. Biomed, Res., vol. 4, pp. 607-622, 1971. G. A. Gorry and G. O. Barnett, “Experience with a model of sequential diagnosis,’ Comput. Biomed. Res., vol. 1, pp. 490- 507, 1968. G. A. Gorry, J. P. Kassirer, A. Essig, and W. B. Schwartz, “Decision analysis as the basis for computer-aided management of acute renal failure,’’ Amer. J. Med., vol. 55, pp. 473-484, 1973. G. A. Gorry, “Computer-assisted clinical decision making,’’ Meth, Inform. Med., vol. 12, pp. 45-51, 1973. G. A. Gorry, H. Silverman, and S. G. Pauker, ‘“‘Capturing clinical expertise: A computer program that considers clinical responses to digitalis,’ Amer. J, Med., vol. 64, pp. 452-460, 1978. R. A. Greenes, G. O. Barnett, S. W. Klein et al., “Recording, retrieval, and review of medical data by physician-computer interaction,’ New Eng. J. Med., vol. 282, pp. 307-315, 1970. S. Greenfield, A. L. Komaroff, and H. Anderson, ‘‘A headache protocol for nurses: Effectiveness and efficiency ,’’ Arch. Intern. Med., vol. 136, pp. 1111-1116, 1976. R. H. Grimm, K. Shimoni, W. R. Harlan, and E. H. Estes, “Evaluation of patient-care protocol use by various providers,” New Eng. J. Med., vol. 292, pp. §07-511, 1975. G,. F. Groner, R. L. Clark, R. A. Berman, and E. C. De Land, “BIOMOD—an interactive computer graphics system for model- ing,” in Proc. Fall Joint Conput. Conf., pp. 369-378, 1971. T. Groth, “Biomedical modelling,’’ in MEDINFO 77. Amster- dam, The Netherlands: North-Holland, 1977, pp. 775-784. E. V. Hess, “A uniform database for rheumatic diseases,”’ Arthrit. Rheumat., vol. 19, pp. 645-648, 1976. C. Hewitt, “Description and theoretical analysis (using schemata) of PLANNER: A language for proving theorems and manipulat- ing models in a robot,” Ph.D. dissertation, Dep. Mathematics, Massachusetts Inst. Technol., Cambridge, MA. 1972. J. C. Horrocks, A. P. McCann, J. R. Staniland et al., ““Computer- aided diagnosis: Description of an adaptable system, and opera- tional experience with 2,034 cases,’ Brit, Med. J., vol. 2, pp. §-9, 1972. J. C. Horrocks and F. T. deDombal, “Computer-aided diagnosis of dyspepsia,’ Amer. J. Diges. Dis., vol. 20, pp. 397-406, 1975. R. A. Howard, Ed., Special Issue on Decision Analysis, JEEE Trans. Syst., Sci. Cybern., vol. SSC-4, Sept. 1968. F. J. Inglefinger, ‘“‘Decision in medicine’’ (editorial), New Eng J. Med., vol. 293, pp. 254-255, 1975. J. A. Jacquez, Computer Diagnosis and Diagnostic Methods. Springfield, IL: Charles C. Thomas, 1972. R. W. Jelliffe, J. Buell, R. Kalaba et al., “A computer program for digitalis dosage regimens,’’ Math. Biosci., vol. 9, pp. 179- 193, 1970. R. W. Jelliffe, J. Buell, and R. Kalaba, ‘“‘Reduction of digitalis toxicity by computer-assisted glycoside dosage regimens,” Ann, Intern, Med., vol. 77, pp. 891-906, 1972. D. C. Johnson and G. O. Barnett, ‘“MEDINFO-A medical information system,’ Comput. Prog. Biomed., vol. 7, pp. 191- 201, 1977. L. N. Kanal, ‘Patterns in pattern recognition: 1968-1974,” IEEE Trans. Inform. Theory, vol. I1T-20, no. 6, 1974. R. H. S. Karpinski and H. L. Bleich, “MISAR: A miniature information storage and retrieval system,’’ Comput. Biomed. Res., vol. 4, pp. 655-660, 1971. J. P. Kassirer and G. A. Gorry, “Clinical problem solving: A behavioral analysis,’’ Ann. Intern. Med., vol. 89, pp. 245-255, 1978. B. Kleinmuntz and R. S. McLean, “Diagnostic interviewing by digital computer,’’ Behav. Sci., vol. 13, pp. 75-80, 1968. R. G. Knapp, S. Levi, D. Lurie, and M. Westphal, ‘A computer- generated diagnostic decision guide: A comparison of statistical diagnosis and clinical diagnosis,’? Comput. Biol. Med., vol. 7, Pp. 223-230, 1977. A. L. Komaroff, W. L. Black, M. Flatley et al., ‘Protocols for physician assistants: Management of diabetes and hyperten- sion,” New Eng. J. Med., vol. 290, pp. 307-312, 1974. A. L. Komaroff, ‘Medical data collection: Hard decisions from soft data,”’ this issue, pp. 000-000. J. Korein, M. Lyman, and J. L. Tick, ‘‘The computerized medi- cal record,” Bull. NY Acad, Med., vol. 47, pp. 824-826, 1971. N. Koss and A. R. Feinstein, “(Computer-aided prognosis: II. Development of a prognostic algorithm,’? Arch. Intern. Med., vol. 127, pp. 448-459, 1971. SHORTLIFFE et al.: KNOWLEDGE ENGINEERING FOR MEDICAL DECISION MAKING {60} [61] [62] 163] [64] [65] [66] [67] [68] [69] [70] [71] {72] [73] [74] {75] (76] {77] [78] [79] [80] [81] [82] [83] [84] [85] [86] D. J. Leaper, J. C. Horrocks, J. R. Staniland, and F. T. de- Dombal, “Computer-assisted diagnosis of abdominal pain using estimates provided by clinicians,’ Brit. Med. J., vol. 4, pp. 350-354, 1972. R. S. Ledley and L. B. Lusted, “Reasoning foundations of medical diagnosis,” Science, vol. 130, pp. 9-21, 1959. S. Levi, J. R. Frant, M. C. Westphal, and D. Lurie, “Develop- ment of a decision guide -optimal discriminations for meningitis determined by statistical analysis,’ Meth. Inform, Med., vol. 15, pp. 87-90, 1976. M. Lipkin and J. D. Hardy, “Mechanical correlation of data in differential diagnosis of hematologic diseases,’ J. Amer. Med. Ass., vol. 166, pp. 113-125, 1958. L. B. Lusted, Introduction To Medical Decision Making. Spring- field, IL: Charles C. Thomas, 1968. J. C. Mabry, H. K. Thompson, M. D. Hopwood, and W. R. Baker, “A protoype data management and analysis system (CLINFO): System description and user experience,” in MED- INFO 77. Amsterdam, The Netherlands: North-Holland, 1977, pp. 71-75. C. McDonald, B. Bhargava, and D. Jeris, “A clinical information system (CIS) for ambulatory care,” Proc. 1975 NCC, vol. 44 AFIPS Press, (1975) pp. 749-756. B. J. McNeil, E. Keeler, and S. J. Adelstein, “Primer on certain elements of medical decision making,’ New Eng. J. Med., vol. 293, pp. 211-215,1975. B. J. McNeil and S. J. Adelstein, ‘Determining the value of diagnostic and screening tests,’ J. Nucl. Med., vol. 17, pp. 439-448, 1977. S. J. Menn, G. O. Barnett, D. Schmechel et al,, “A computer Program to assist in the care of acute respiratory failure,” J. Amer, Med. Ass., vol. 223, pp. 308-312, 1973. E. Mesel, D. D. Wirtschafter, J. T. Carpenter et al., “Clinical algorithms for cancer chemotherapy—systems for community - based consultant-extenders and oncology centers,’’ Meth, Inform. Med., vol. 15, pp. 168-173, 1976. R. A. Nordyke, C. A. Kulikowski, and C. W. Kulikowski, “A comparison of methods for the automated diagnosis of thyroid dysfunction,” Comput. Biomed. Res., vol. 4, pp. 374-389, 1971. M. J. Norusis and J. A. Jacquez, “Diagnosis. I. Symptom non- independence in mathematical models for diagnosis,’’ Comput. Biomed, Res., vol. 8, pp. 156-172, 1975. E. A. Patrick, “Pattern recognition in medicine,’ Syst.. Man, Cybern, Rev., vol. 6, p. 4, 1977. S. G. Pauker and J. P. Kassirer, “Therapeutic decision making: A cost-benefit analysis,” New Eng. J. Med., vol. 293, pp. 229- 234,1975. S. G. Pauker, G. A. Gorry, J. P. Kassirer, and W. B. Schwartz, “Towards the simulation of clinical cognition: Taking a present illness by computer,” Amer. J. Med., vol. 60, pp. 981-996, 1976. S. G. Pauker, “Coronary artery surgery: The use of decision analysis,” Ann, Intern. Med., vol. 85, pp. 8-18, 1976. S. P. Pauker and S. G. Pauker, “Prenatal diagnosis: A directive approach to genetic counseling using decision analysis,” Yale J. Biol, Med., vol. 50, pp. 275-289, 1977. C. C. Peck, L. B. Sheiner, C. M. Martin et al., “Computer- assisted digoxin therapy,” New Eng. J. Med., vol. 289, pp. 441-446, 1973. H. V. Pipberger, ‘Clinical application of a second generation electrocardiography computer Pprogram,’’ Amer, J, Electro- cardiol., vol. 35, pp. 597-608, 1975. J. S. Pliskin and C. H. Beck, “Decision analysis in individual clinical decision making: A real-world application in treatment of renal disease,’’ Meth. Inform. Med., vol. 15, pp. 43-46, 1976. H. E. Pople, J. D. Myers, and R. A. Miller, “DIALOG: A model of diagnostic logic for internal medicine,’ in Proc. 4th Int. Joint. Conf. Artific. Intell., MIT, Cambridge, MA, 1975. H. Pople, “The formation of composite hypotheses in diagnostic problem solving: An exercise in synthetic reasoning,” in Proc. Sth Int, Joint Conf. Artific. Intell., Cambridge, MA, pp. 1030- 1037, 1977. J. Prutting, “Lack of correlation between antemortem and postmortem diagnosis,”” NY J. Med., vol. 67, pp. 2081-2084, 1967. H. Raiffa, Decision Analysis: Introductory Lectures on Choices Under Uncertainty. Reading, MA: Addison-Wesley, 1968. B. Richards and A. E. S. Goh, “Computer assistance in the treatment of patients with acid-base and electrolyte distur- bances,” in MEDINFO 77. Amsterdam, The Netherlands: North-Holland, 1977, pp. 407-410. J. Rodnick and G. Wiederhold, “Review of automated ambula- tory medical record systems: Charting services that are of essential benefit to the physician,” in MEDINFO 77. Amster- dam, The Netherlands: North-Holland, 1977, pp. 957-961. [87] [88} [89] [90} i9t] [92] [93] [94] [95] [96] 197] {98} [99} [100] {101] [102] {103} [104] {105] {106] [107] {108] [109] {110] [111] [112] [113] {114] (115] 1223 R. A. Rosati, A. G. Wallace, and E. A. Stead, “The way of the future.” Arch, Intern, Med., vol. 131, pp. 285-287, 1973. R. D. Rosati, J. F, McNeer, C. F. Starmer ef al., “A new infor- mation system for medical practice,” Arch, Intern. Med.. vol. 135, pp. 1017-1024, 1975. M. B. Rosenblatt, P. K. Teng, and S. Kerpe, “Diagnostic accu- tacy in cancer as determined by post-mortem examination,” Prog. Clin. Cancer, vol. §, pp. 71-80, 1973. A. D. Rubin and J. F. Risley, “The PROPHET system: An experiment in providing a computer resource to scientists,” in MEDINFO 77. Amsterdam, The Netherlands: North- Holland, 1977, pp. 77-81. C. Safran, P. N. Tsichlis, A. Z. Bluming, and J. F, Desforges, “Diagnostic planning using computer-assisted decision making for patients with Hodgkins disease,” Cancer, vol. 39, pp. 2426 - 2434,1977. H. Schoolman, L. Bernstein, “Computer use in diagnosis, prog- nosis, and therapy,’ Science, vol. 200, pp. 926-931,1978. W. B. Schwartz, “Medicine and the computer: The promise and problems of change,’ New Eng. J. Med., vol. 283, pp. 1257- 1264, 1979. W. B. Schwartz, G. A. Gorry, J. P. Kassirer, and A. Essig, “Decision analysis and clinical judgment,’ Amer, J. Med., vol. S$. pp. 459-472, 1973. A. C. Scott, W. Clancey, R. Davis, and E. H. Shortliffe, “Ex- planation capabilities of knowledge-based production systems,” Amer, J. Comput, Ling., Microfiche 62, 1977. L. B. Sheiner, H. Halkin, C. Peck, ef al., “Improved computer- assisted digoxin therapy,”’ Ann, Intern. Med., vol. 82. pp. 619- 627, £975. H. Sherman, B. Reiffen, and A. L. Komaroff, ‘Ambulatory care systems,’ in Problem-Directed and Medical Information Systems, M.F. Driggs, Ed. New York: Intercontinental Medical Book Corp., 1973, pp. 143-171. M. Shimura, “Learning procedures in pattern classifiers—intro- duction and survey,” in Proc. Int, Joint Conf. Pattern Recog., Kyoto, Japan, 1978, pp. 125-138. E. H. Shortliffe, S. G. Axline, B. G. Buchanan, and S. N. Cohen, “Design considerations for a program to provide consultations in clinical therapeutics,” in Proc. 13th San Diego Biomed. Symp., San Diego, CA, Feb. 1974, pp. 311-319. E. H. Shortliffe and R. Davis, “Some considerations for the implementation of knowledge-based expert systems,” SIGART Newsletter, no. 55, pp. 9-12, Dec. 1975. E. H. Shortliffe and B. G. Buchanan, “A model of inexact reasoning in medicine,” Math. Biosci., vol. 23, pp. 351-379, 1975. E. H. Shortliffe, Computer-Based Medical Consultations: MYCIN. New York: Elsevier/North Holland, 1976. Vv. Stamecka, H. N. Camp, A. N. Badre, and W. D. Hall, “MARIS: A knowledge system for internal medicine,” Inform, Process. Manag., vol. 13, pp. 273-276, 1977. H. C. Sox, C. H. Sox, and R. K. Tompkins, “The training of physicians’ assistants: The use of a clinical algorithm system,” New Eng, J. Med., vol. 288, pp. 818-824, 1973. N. S. Sridharan, Artif. Intell. guest editorial, vol. 11, pp. 1-4, 1978. T. S. Startsman and R. E. Robinson, “The attitudes of medical and paramedical personnel towards computers,” Comput, Biomed. Res., vol. 5, pp. 218-227, 1972. W. W. Stead, R. G. Brame, W. E. Hammond et al., “A comput- erized obstetric medical record,” Obstet. Gyn., vol. 49, pp. §$02-509, 1977. P. Szolovits and S. G. Pauker, “Categorical and probabilistic reasoning in medical diagnosis,” Artif. Intell., vol. 11, pp. 115-144, 1978. T. R. Taylor, “Clinical decision analysis,” Meth. Inform, Med., vol. 15, pp. 216-224, 1976. D. M. Vickery, ‘Computer support of paramedical personnel: The question of quality control,” in MEDINFO 74. Amster- dam, The Netherlands: North-Holland, 1974, pp. 281-287. A. A. Vishnevskiy, 1. I. Artobolevskiy, and M. L. Bykovskiy, Machine Diagnosis And Information Retrieval In Medicine In The USSR. DHEW Publication No. (NIH) 73-424, 1973. G. Wagner, P. Tautu, and U. Wolber, ‘Problems of medical diagnosis: A bibliography,” Meth. Inform. Med., vol. 17, pp. §5-74, 1978. B. T. Walsh, W. W. Bookhein, R. C. Johnson, et ai., “Recogni- tion of streptococcal pharyngitis in adults,” Arch. Intern. Med., vol. 135, pp. 1493-1497, 1975 A. Wardle and L. Wardle, “Computer-aided diagnosis: A review of research,’ Meth, Inform. Med., vol. 17, pp. 15-28, 1978. H. R. Warner, A. F. Toronto, and L. G. Veasy, “Experience with Bayes’ Theorem for computer diagnosis of congenital heart disease,” Ann. N.Y. Acad. Sci., vol. 115, pp. 558-567, 1964. 1224 [116] {117] [113] [119] {120] [121] {122] {123] {124] [125] H. R. Warner, “Experiences with computer-based patient moni- toring,’’ Anes. Anaglesia Current Res., vol. 47, pp. 453-461, 1968. H. R. Warner, C. M. Olmsted, and B. D. Rutherford, “HELP— a program for medical decision-making,’ Comput. Biomed. Res., vol. 5, pp. 65-74, 1972. H. R. Warner, B. D. Rutherford, and B. Houtchens, “A sequen- tial approach to history taking and diagnosis,’’ Comput. Biomed. Res., vol. 5, pp. 256-262, 1972. H. R. Warner, J. D. Morgan, T. A. Pryor et al., ““HELP-—A self- improving system for medical decision making,” in MEDINFO 74, Amsterdam, The Netherlands: North-Holland, 1974. H. R. Warner, “Knowledge sectors for logical processing of patient data in the HELP system,” in Proc. 2nd Annu. Symp. Comput. Appl. Med Care (IEEE, Washington, DC), 1978, pp. 401-404. R. J. Watson, ‘‘Medical staff response to a medical informa- tion system with direct physician-computer interface,’ in MEDINFO 74. Amsterdam, The Netherlands: North-Holland, 1974, pp. 299-302. H. Wechsler, “A fuzzy approach to medical diagnosis,” Int. J. Biomed. Comput., vol. 7, pp. 191-203, 1976. L. L. Weed, “Medical records that guide and teach,” New Eng. J. Med., vol. 278, pp. 593-599, pp. 652-657, 1968. —., “Problem-oriented medical records,” in Problem-Directed and Medical Information Systems, M. F. Driggs, Ed. New York: Intercontinental Medical Book Corp., 1973. S. M. Weiss, C. A. Kulikowski, S. Amarel, and A. Safir, “A PROCEEDINGS OF THE IEEE, VOL. 67, NO. 9, SEPTEMBER 1979 [126] [127] [128] [129] [130] [131] (132] [133] {134} model-based method for computer-aided medical decision- making,” Artific, Inteill., vol. 11, pp. 145-172, 1978. S. Weyl, J. Fries, G. Wiederhold, and F. Germano, ‘‘A modular self-describing clinical databank system,’? Comput. Biomed. Res., vol. 8, pp. 279-293, 1975. G. Wiederhold, J. F. Fries, and S. Weyl, “Structured organiza- tion of clinical data bases,” in Proc. 1975 NCC, AFIPS Press, vol. 44, 1975, pp. 479-485. P. H. Winston, Artificial Intelligence. Wesley, 1977. D. Wirschafter, J. T. Carpenter, and E. Mesel, ‘‘A consultant- extender system for breast cancer adjuvant chemotherapy,” Ann, Intern. Med., vol. 90, pp. 396 -401, 1979. P. M. Wortman, ‘Medical diagnosis: An information processing approach,” Comput, Biomed, Res., vol. 5, pp. 315-328, 1972. Vv. L. Yu, L. M. Fagan, S. M. Wraith, et al., ‘‘Antimicrobial selection by a computerized consultant: A blinded evaluation by infectious disease experts,” J. Amer. Med. Ass., vol. 241, 1979 (in press). Vv. L. Yu, B. G. Buchanan, E. H. Shortliffe, et al., ““An evalua- tion of the performance of a computer-based consultant,” Comput. Prog. Biomed., vol. 9, pp. 95-102, 1979. L. A. Zadeh, “Fuzzy sets,’? Inform. Cont., vol. 8, pp. 338- 353, 1965. N. Zoltie, J. C. Horrocks, and F. T. deDombal, ‘“Computer- assisted diagnosis of dyspepsia—report on transferability of a system, with emphasis on early diagnosis of gastric cancer,” Meth, Inform, Med., vol. 16, pp. 89-92, 1977. Reading, MA: Addison-