dUN 9 4 °1977 BIOMEDICAL KNOWLEDGE ENGINEERING AND INFECTIOUS DISEASES Submitted to the Nationa! Institutes of Health May, 1977 Stanford University Medical School Computer Science Department Stanford University — S. N. Cone Table of Contents SECTION II Page Detailed Budget é * . e . * e * . ‘ . e . . « « . 3 Budget Estimates and Justification . . . . .. 6.6.6. Biographical Sketches. . . « . « « «© «© «© « « « 4AH=4N Research Plan . 1 «© «6 «© » © «© «© © «© «© 2 ee 5 1. Objectives es © © ee ew lk le lk 5 2. Background and Rationale . . .. . +, 4... .,. «6 2.1 The Knowledge Engineering Problem... . , 6 2.2 The Medical Problem oe ee le 7 2.3 Our Work to Date a 9 2.4 Other Approaches = ee ee we ew le 3. Previous Work Done ee ee lk 4, Specific Goals yee ee we le | Competence oe ee eee eC 4,2 Knowledge Engineering Support Tools . . . . 18 4,3 Human Engineering and Clinical Capabilities . +. . . 6 2. 2. © «© « w 20 44 Exportability of the System. . . . . . . 22 4.5 Performance Evaluation ee ee ele lee OY 5. Significance of the Research. . . . . « « « « 26 6. Facilities Available . . . . . . oe uw uw we 27 T. Collaborative Arrangements ee ee ele le le OB 8. Appendix A: Progress Report Submitted to BHSRE . . . 29 PRIVILEGED COMMUNICATION 9. 10. 8.1 Summary . 8.2 Detailed Report . ‘ 8.3 Clinical Capabilities , 8,4 User Oriented Features 8.5 Knowledge Acquisition . 8.6 Technical Issues s 8.7 Evaluation Activities , Appendix B: Hardware Announcement REFERENCES . 10.1 MYCIN PUBLICATIONS 10.2 OTHER REFERENCES li se ew ee YQ "6 ew 6 e) 29 + 8 8 2 vw) 30 ee 6 ww OSM s 8 8 6 «6333 » ° » e 6 « 34 oe 6 «© «© 6) 34 ee ee we we AG e « * . . * 45 e e 4S . s a * . . 47 Form Approved SECTION } 0.4.8. 68-RO249 OEPARTMENT OF LEAVE BLANK HEALTH, EDUCATION, AND WELFARE TYP P PUBLIC HEALTH SEAVICE E ROGRAM NUMBER REVIEW GROUP FORMERLY GRANT APPLICATION COUNCIL (Month, Year} DATE RECEIVED TO BE COMPLETED BY PRINCIPAL INVESTIGATOR (items 1 through 1. TITLE OF PROPOSAL (Oo not exceed 53 typewriter spaces) Biomedical Knowledge Engineering and Infecti 7 and 15A} ous Diseases 2. PRINCIPAL INVESTIGATOR 3.DATES OF ENTIRE PROPOSED PROJECT PERIOD (This application. 2A. NAME (Last, First, Initiall FROM THROUGH COHEN, Stanley N. 4/1978 3/1983 2B. TITLE OF POSITION 4, TOTAL DIRECT COSTS RE- 5. DIRECT COSTS REQUESTED Professor of Medicine and Head, Division of Clinical Pharmacology QUESTED FOR PERIOD IN FOR FIRST 12-MONTH PERIOC ITEM 3 $1426444 $170425 2G. MAILING ADDRESS (Sire? City, State, Zip Code] Clinical Pharmacaq]ogy Medical Center Stanford University . PERFORMANCE SITE(S) (See /nstructions) Clinical Pharmacology Department of Computer Science Stanford University Stanford, California 94305 20. DEGREE 2€. UARITY NO. Ph.D. 2F. TELE [Area Codd TELEPHONE NSION H DATA m6 | 497-5315 2G. DEPARTMENT, SERVICE, LABORATORY OR EQUIVALENT (See Instructions) Clinical Pharmacology 7H. MAJOR SUBDIVISION (See Instructions} Department of Humanities and Sciences T. Research Involving Human Subjects (See Instructions} A.CAINO B.C) YES Approved: Cc. (7) YES — Pending Review Date 8. Inventions (Renewal Applicants Only - See Instructions) AKI NO 8.(1) YES — Not previously reported C.CIYES — Previously reported TO BE COMPLETED BY RESPONSIBLE ADMINISTRATIVE AUTHORITY (/tems 8 through 13 and 158) 9. APPLICANT ORGANIZATION(S) (See Instructions) Stanford University 11, TYPE OF ORGANIZATION (Check appliceble item! COreoerat CJstate COvocAL (XIOTHER (Specify) Private, non-profit University Clinical Pharmacology Stanford, California 94305 10. NAME, TITLE, AND TELEPHONE NUMBER OF OFFICIAL(S) SIGNING FOR APPLICANT ORGANIZATION(S) NAME, TITLE, ADDRESS, AND TELEPHONE NUMBER OF OFFICIAL IN BUSINESS OFFICE WHO SHOULD ALSO BE NOTIFIED IF AN AWARD IS MADE K.D. Creighton Controller Stanford University Stanford, California 94305 Telephone Number AL NENT RECEIVE C l ASI FOR INSTITUTIONAL GRANT PURPOSES (See Instructions} 20 School of Humanities and Sciences c/o Sponsored Projects Office (415) 497-2883 Telephone Number (s} 14, ENTITY NUMBER (Formerly PHS Account Number) 1941156365A1 15, CERTIFICATION AND ACCEPTANCE. We, the undersigned, certify that the statements herein are true and complete to the best of our knowledge and accept, 9s to any grant awarded, the obligation to comply with Public Healt Service terms and conditions in effect at the time of the ewerd. SIGNATURES A SIGNATURE-OF-PERSON NAMED IN ITEM 2A DATE (Signatures required on Original copy only. DATE Use ink, °Per®’ signatures not acceptable) B. SIGNATURE(S) OF PERSON(S) NAMED IN ITEM 10 NIH 398 {FORMERLY PHS 398) Rev. 1/73 SECTION 1 DEPARTMENT OF HEALTH, EDUCATION, AND WELFARE LEAVE BLANK PUBLIC HEALTH SEAVICE PROJECT NUMBER RESEARCH OBJECTIVES NAME AND ADDRESS OF APPLICANT ORGANIZATION Stanford University Stanford, California 94305 NAME, SOCIAL SECURITY NUMBER, OFFICIAL TITLE, AND DEPARTMENT OF ALL PRG PROJECT, BEGINNING WITH PRINCIPAL INVESTIGATOR See attached list. TITLE OF PROJECT . . : Biomedical Knowledge Engineering and Infectious Diseases Se a rs ety FS SP i SS USE THIS SPACE TO ABSTRACT YOUR PROPOSED RESEARCH. OUTLINE OBJECTIVES AND METHODS. UNODERSCORE THE KEY WORDS (NOT TO EXCEED 10) IN YOUR ABSTRACT, Knowledge about medical disciplines, such as infectious diseases, changes rapidly. To assist researchers and practittoners codify, access, and reason with the knowledge of their domain, we propose developing knowledge-based computer programs as ''knowledge engineering'' aids. We base the proposed work on the MYCIN program, which we developed over the last three years. MYCIN stores facts and relations about infectious diseases in a set of inference rules, or production rules. It reasons about complex case histories using this knowledge, and [t can explain its reasoning. We have also developed a prototype knowledge acquisition system to aid the experts who modify and extend the knowledge base. LEAVE BLANK NIH 398 (FORMERLY PHS 398) PAGE 2 Rev. 1/73 anwrnmMnaan . - 8 © «@ Name Cohen Buchanan Davis Shortliffe . Axline Wraith Scott Soc. Sec. No. Title Professor Adjunct Professor Research Associate Professor Professor Research Associate Research Associate 2A Department Medicine Computer Science Computer Science Medicine Medicine Medicine Computer Science SECTION i! — PRIVILEGED COMMUNICATION FROM THROUGH DETAILED BUDGET FOR FIRST 12-MONTH PERIOD 4/1978 4/1979 DESCRIPTION (/temize) TIME OR AMOUNT REQUESTED (Omit cents} PERSONNEL EFFORT FRINGE . ALARY TOTA NAME TITLE OF POSITION MINAS. | SALA BENEFITS L PRINCIPAL INVESTIGATOR See attached list 125859. 24667. 150525. CONSULTANT COSTS Clinical Consultant 3000. EQUIPMENT 2 Datamedia Display terminals L800. SUPPLIES Office supplies 4500. DOMESTIC 2000. TRAVEL FOREIGN w PATIENT COSTS (See instructions) ALTERATIONS AND RENOVATIONS -- OTHER EXPENSES (/temize) Telephone 2000. Maintenance Contracts 600. Postage/publications/miscel laneous 3000. TOTAL DIRECT COST (Enter on Page 1, item 5)

Ore [IMs (Heemate es EQUCATION (Zein with boecolsurcats training and include postdccioral) - = YEAS SCIENTIFIC t N JCATI — NSTITUTION AND LOC ON DEGREE CONEERRED FIELD College of William and Mary B.S. 1973 Mathematics Stanford University M.S. 1974 Computer Science HONORS Phi Beta Kappa MAJOR RESEARCH INTEREST Artificial Intelligence ROLE 1N PROPOSED PROJECT Scientific Programmer RESEARCH SUPPORT (see instructions) RESCARCH AND/OH PROFESSIONAL E XPERIENCE (Starting with present positon, jist training and exp ariance raizvant to ar33 of project, List ol! - OF INOS? representative publicstionms, Do not excoed 3 pages for each individual.) 1974-present Scientific Programmer, MYCIN project, Division of Clinical Pharmacology. Department of Medicine, Stanford University Publication: Scott, AC, Clancey, W, Davis, R and Shortliffe, EH; Explanation capabilities of knowledge-based production systems. (submitted to Amer. J. Linguistics) NIH 395 (FORMERLY PHS 398) Ray. 1773 4N OU S. COVERAMEDT PHINTISG OFFICE +1374 seaersaeqe . 5.1.conen RESEARCH PLAN 1 Objectives The overall objective of the proposed research is the development and evaluation of a computer based system for codifying judgmental knowledge of experts in order to improve the effectiveness of medical research and clinical decision making. Our work to date has concentrated on codifying knowledge for the diagnosis and selection of therapy for infectious diseases, and has produced a system (called “MYCIN”) capable of offering consultative advice for certain classes of infections. The development of this system over the past few years has provided a “laboratory” for the elucidation of the informal judgmental criteria used by experts in the field. In codifying that knowledge and testing it on real cases, we have encouraged the formal specification of what was previously informal knowledge, and have provided an arena in which conflicting judgements from different experts can be tested, We are requesting support for continued research and development in order to demonstrate MYCIN’s effectiveness asa research tool for biomedical scientists working with infectious diseases, and eventually as a general methodology for “knowledge engineering” in related disciplines. Proposed steps toward that end include: (a) expand the clinical knowledge base of the system to increase the range of clinical cases for which MYCIN can aid physicians and researchers. (b) systematize and organize knowledge and decision processes on a rigorous basis using MYCIN techniques so that one researcher can build on another’s research results, (c) improve the system’s interaction with physicians and researchers. (d) transform the system to a dedicated mini-computer to improve response time and make it exportable. (e) evaluate the research and clinical utility of the system, in part by showing that the system offers an effective forum that encourages experts in the field to reach a medical or technical consensus in their view of the domain. PRIVILEGED COMMUNICATION Sec. 2 | 5." conen Gi 2 Background and Rationale 2.1 The Knowledge Engineering Problem Computer programs can provide assistance to working scientists in several different ways. For a number of years, computers were used almost exclusively as numeric problem solvers, They acted as mathematical assistants, performing calculations that were complex, tedious or repetitious. They have been used for manipulating symbolic expressions as well, For example, hospitals and businesses have stored massive amounts of symbolic information in computer files, and have developed intricate programs for retrieval and display of the stored information. It is also possible to extend the metaphor of the Problem solving assistant into the realm of symbolic information, as demonstrated by several artificial intelligence (AI) programs. For example, the DENDRAL programs(1) assist research chemists with both the combinatorial and inferential aspects of chemical reasoning, both of which can be demanding and tedious for human scientists. The MYCIN program is an outgrowth of nearly a decade of work on DENDRAL. We are building on, and improving, many of the ideas from DENDRAL about representing large amounts of domain-~ specific knowledge for computer aided problem solving. The representation, use and acquisition of knowledge for computer programs has been called “knowledge engineering” [D.Michie, On Machine Intelligence, New York: Wiley, 1974]. The MYCIN and DENDRAL programs are important examples of this branch of AI work. One of the central ideas in this work is the belief that high performance in solving problems arises from a large store of task~specifie knowledge -- that is, a “knowledge base’ containing information specific to the task at hand. We represent that body of knowledge as a collection of decision rules -- in the case at hand, rules about diagnosis and therapy selection in infectious diseases. These conditional sentences are called “production rules’. The production rule formalism provides an easily understood representation of facts and relations. However, our experience has shown that writing new rules and integrating them into an existing knowledge base is not as simple as we had hoped. Thus we must provide more tools for the experts who write rules so that they can see the relationships of new rules to old ones and easily determine the consequences of adding new rules to the program. (1) see, for example, E.Feigenbaum, B.Buchanan, & J.Lederberg, “On Generality and Problem Solving: A Case Study Using the DENDRAL Program’. Machine Intelligence 6 (eds Meltzer & Michie), 165-90. Edinburgh: Edinburgh University Press 6 Gms. COMMUNICATION Sec. 2.1 S.N. conn D> We have already developed primitive mechanisms for checking the syntax of new rules and some aspects of their semantics. For example, the rule models developed by Davis [14] give the system the ability to check the similaritiés of a new rule with other rules of the same type in order to comment on (and ask about) the differences noticed. We will need to build on these ideas, in effect, to make the system smarter about what it notices. It would be premature to suggest that a computer program could arbitrate the scientific disagreements among experts and reach a consensus smoothly. This is a super-human task. However, we believe that a program that is able to keep track of the different ways experts express their knowledge can be an important aid to those experts in coming to an agreement. For example, the program can select case histories that highlight the consequences of using different facts and relations. Our past work has emphasized the use of judgmental knowledge in a high performance program that provides inferential assistance to physicians. We now propose to build on that work to provide knowledge engineering assistance to research scientists, with two long-range goals in mind: (a) using infectious disease as a case study to develop a methodology of knowledge engineering that will be applicable to building high performance systems in a range of disciplines, (bd) develop techniques for using such systems to provide a forum for formal specification of previously informal knowledge, aS a means of encouraging consensus among experts in the field, 2.2 The Medical Problem A number of recent studies indicate a major need to improve the quality of antimicrobial therapy. Almost one-half of the total cost of drugs spent in treating hospitalized patients is spend on antibiotics [1,2], and if results of a number of recent studies are to be believed, a significant part of this therapy is associated with serious misuse (2,3,4,5], Some of the inappropriate therapy involves incorrect selection of a therapeutic regimen [4], while another serious problem is the incorrect decision to administer any antibiotic [2,4,5]. One recent study concluded that one out of every four people in the United States was given penicillin during a recent year, and nearly 90% of these prescriptions were unnecessary [6]. Other studies have shown that physicians will often reach therapeutic decisions that differ significantly from the decisions that would have been suggested by experts in infectious disease therapy practicing at the same institution. Nonexperts sometimes choose a drug regimen designed to PRIVILEGED COMMUNICATION See. 2.2 oH. Cone cover for all possibilities, prescribing either several drugs or one of the so-called “broad spectrum’ antibiotics, even though appropriate use of clinical data might have led to more rational and less toxic therapy. ~ Within a hospital environment in which professional resources ’ are often overburdened, and in environments where expert sources are not readily available, a computer-based consultant will be highly useful. Such a system will also have broad fringe benefits in its educational impact on staff physicians and in providing a framework for quality control and peer-review evaluations. Antimicrobial therapy appears to be an especially suitable area for the initial development of a computer-based system to assist physicians with decisions in clinical therapeutics. The components of the decision making process in antimicrobial therapy are more readily definable than in many other areas of medicine, and the consequences of the physician’s decision can usually be assessed in terms of direct therapeutic action. Nevertheless, the general approach used here is applicable to other areas of clinical decision making. The basis of rational antimicrobial therapy decisions is identification of the microorganisms causing the infectious disease. Accurate identification is important because of the specificity of antibiotic action: drugs that are highly effective against certain organisms are often useless against others, The patients clinical status and history (ineluding information such as prior infections and treatments) provide data that may be valuable to the physician in identifying the disease-causing organisms. However, bacteriological cultures that use specimens taken from the site of the patient ‘s infection usually provide the most definitive identifying information. Initial culture reports from a microbiological laboratory may become available within 12 hours from the time a clinical specimen is obtained from the patient. While the information in these early reports often serves to classify the organism in general terms, it does not often permit precise identification. It may be clinically unwise to postpone therapy until such identification can be made with certainty, a process that usually requires 24 to 48 hours, or longer. Thus it is commonly necessary for the physician to estimate the range of possible infecting organisms, and to start appropriate therapy even before the laboratory is able to identify the offending organism and its antibiotic sensitivities, In this setting MYCIN plays two roles: (a) providing consultative advice that will assist the physician in making the best therapeutic decision that can be made on the basis of available information, and, (b) by its questioning of the physician, pinpointing the items of clinical data that are necessary to increase the validity of the clinical decision. aa. quam COMMUNICATION Sec. 2.3 (S.N. Cohen EE # 2.3 Our Work to Date A comprehensive review of our work appears in Section 3 of this proposal. Briefly, we have developed a computer program capable of offering consultative advice on the diagnosis and therapy selection for bacteremia and meningitis, two areas central to the management of infectious disease, This work has been guided by three fundamental objectives. (1) A major objective of the MYCIN system has been to provide a computer-based therapeutic tool designed to be useful in both clinical and research environments. This requires development of a system that has a medically and scientifically sound knowledge base, and that displays a high level of competence in its field. The program must first convince clinicians of the quality of the information it is providing before they will be willing to use it. (2) We believe it is important for the computer system to have the ability to explain the reasoning behind its decisions. It should be able to do so in terms that suggest to the physician that the program approaches the problem in much the same way that he does. This permits the user to validate the programs reasoning, and modify (or reject) the advice if he believes that some step in the decision process is not justified. It also gives the program an inherent instructional capability that allows the physician to learn from each consultation session. (3) A third major objective is to provide the program with capabilities that enable augmentation or modification of the knowledge base by experts in infectious disease therapy, in order to codify knowledge in the domain, as well as to improve the validity of future consultations. The system therefore requires Some capability for acquiring knowledge by interacting with experts in the field, and for incorporating this knowledge into its knowledge base. Three separate parts of the MYCIN system accomplish these objectives, The consultation system uses the knowledge base, along with patient-related data entered by the physician to generate therapeutic advice. The explanation system has the ability to explain the reasoning used during the consultation, and to document the motivation for questions asked or the rationale for conclusions reached. Finally, the knowledge acquisition system enables experts in antimicrobial therapy to update MYCIN’s knowledge base, without requiring that they know how to program a computer. A principal feature of MYCIN central to these objectives is the format in which its knowledge is encoded. Knowledge used by MYCIN is contained in diagnostic and therapeutic decision rules formulated during extensive discussions of clinical case histories. The MYCIN knowledge base currently consists of approximately 400 such rules. Each rule — COMMUNICATION Sec. 2.3 S-'- Cohen Ga consists of a set of preconditions (called the “premise’) which, if true, justifies the conclusion made in the ‘action’ part of the rule (an example is shown below). If 1} the gram stain of the organism is gram negative, and 2) the morphology of the organism is rod, and 3) the aerobicity of the organism is anaerobic, then there is suggestive evidence (.6) that the identity of the organism is Bacteroides. Many of the system’s unique and important capabilities are made possible by encoding knowledge in rules like the one above. Such rules form modular ‘chunks’ of knowledge about the domain, represented in a form that is comprehensible to clinicians and researchers, The consultation system uses its collection of rules to make conclusions about the patient. If, for, instance, it is attempting to determine the identity of an organism responsible for a particular infection, it retrieves the entire list of rules which, like the one above, conclude about identity. It then attempts to ascertain whether the conclusion of the first rule is valid, by evaluating in turn each of the clauses of the premise. Thus, for the rule above, the first thing to find out its gram stain. If this information is already available in the data base, the program retrieves it. If not, determination of gram Stain becomes the objective of a new rule, and the program retrieves all rules which conclude about it, and tries to use each of them to obtain the value of gram stain. If, after trying all the relevant rules, the answer still has not been discovered, the program asks the user for the relevant clinical information which will permit it to establish the validity of the premise clause, Thus, the rules “unwind” to produce a succession of goals, and it is the attempt to achieve each goal that drives the consultation. The use of a rule-based representation of knowledge makes it possible for the system to explain the basis for its recommendations. For example, if asked “How did you determine the identity of the organism?’ the program answers by displaying the rules which were actually used, and explaining, if requested, how each of the premises of the rules was established. This is something which people readily understand, and it provides a far more comprehensible and acceptable explanation than would be possible if the program were to use a simple statistical approach to diagnosis. As work proceeds to expand the program’s knowledge base, new “chunks” are added in much the same way that a clinician in training learns new pieces of knowledge about his field. This rule-based representation of Knowledge means that the expert himself can offer new “chunks” of knowledge by expressing them in 10 Gams: COMMUNICATION Sec. 2.3 S.N. Cohen > the same rule-based format. He can thus help make the program more competent, without having to know anything about computer programming. In addition, since the rules are largely independent of one another, and are used by the program as necessary in order to deal with the particular consultation underway, the addition of a new rule or modification of an existing rule requires little alteration of other items in the knowledge base, unlike systems using the decision-tree methodology. Other benefits gained from this approach have been explained in more detail in the references. 2.4 Other Approaches There are three other approaches to the problem of encoding medical decision making knowledge that have received extensive attention in the literature: . (i) decision trees - as in [7], in which a sequence of decisions i: structured in the form of a tree, Each node represents a particular question, and the answer determines which branch of the tree to follow to get to the next question. Final results are obtained by descending all the way to a leaf of the tree. (ii) Bayesian techniques ~ as in {8], in which extensive frequency data make it possible to use Bayes” theorem as a basis for diagnosis. (iii) Decision analysis and utility theory - as in [9], in which there is associated with each piece of information a likely cost of obtaining it, and a measure of the benefit to be derived from having it. Information is requested until the projected cost of asking another question (perhaps requiring another lab test or operative procedure) outweighs the benefit (in terms of a more precise diagnosis) to be obtained. Each of these has a number of attractive aspects, but also encounters some limitations which provided the motivation for our investigation of a rule-based system. Decision trees, for example, offer simple, readily understandable procedures for diagnosing specific ailments. Problems occur, however, if they encounter unexpected data or if test results are unavailable. The representation of knowledge they offer can be somewhat inflexible, as well, since the attempt to make changes deep down in the tree often requires consideration of all previous decisions made further up the tree. The Bayesian technique offers an appealing generality and precision, since it it a domain independent technique based on 11 “es GED COMMUNICATION Sec. 2.4 ee] exact principles, Limitations here arise from the need for extensive amounts of frequency data concerning a priori and conditional probabilities. Where these data exist, the technique can be used quite effectively, but such figures may not often be available [10]. Techniques based on utility theory can present a well- motivated sequence of questions that appears to ‘zero in’ on the underlying ailment. Like the Bayesian approach, however, it requires on extensive data on conditional probabilities of symptoms and disease. Since none of these is intended to bea model of the reasoning process typically employed by clinicians, it can at times prove difficult for a clinician to discover the basis for the conclusions drawn by any of them. While they each present a compact encoding of knowledge that can provide an appealing efficiency to programs based on them, there is an unavoidable loss of comprehensibility to the physician using them. Reasoning which requires several distinct inferential steps by a clinician, for instance, might be expressed in a Single value of a conditional probability in the Bayesian method, One additional technique has received some attention lately, as other researchers (e.g., [11] and [12]) have developed sophisticated models of physiological processes. Where the system involved is sufficiently well-understood and isolatable (e.g., the glaucoma model in [12]}, this can be a powerful approach. But this is not often true, Infectious disease diagnosis and therapy selection (like many other areas) involves a broad range of processes, many of which are only very imperfectly understood. Finally, we place great emphasis on the flexibility of the knowledge base. A substantial amount of knowledge is required to support a high level of performance, and this means that modification and augmentation of the knowledge base will continue for an extended period. Each modification must therefore be a reasonable task, or the program will soon begin to stagnate. A flexible knowledge base also means that the system is inherently dynamic in character. It is easily modified to take into account regional variations in practice, new results which arise from progress in medical research, and changes in drug resistance patterns. Our experience to date suggests that our current approach of codifying individual decision rules offers a large number of advantages, including flexibility and ready comprehensibility. It can provide the basis for a formalism capable of functioning in domains where little statistical data is available, or where information is uncertain or incomplete, and can thus offer a useful extension to existing techniques. 12 6 PRIVILEGED COMMUNICATION Sec. 3 s.v.conen 3 Previous Work Done See Appendix A, for a review of work performed under BHSRE funding (grant no. HS-01544), 4 Specific Goals The primary goals of our proposed work over the next five years are (i) to increase MYCIN’s abilities to integrate large collections of facts and relations. The content of this knowledge base will be specific to infectious diseases. However, we view this as a case study of the larger problem of developing a methodology of knowledge engineering applicable to a range of disciplines; (ii) to develop techniques for using this methodology to provide a forum for formal specification of previously informal knowledge, as a means of encouraging consensus among experts in the field. In keeping with these goals, five main foci of attention for our work will be: (a) increase the system’s competence, i.e., both the breadth and depth of the knowledge base (b) provide knowledge engineering support tools to aid experts codify and test their inferential rules about the domain (c) provide a number of human engineering features to insure that the program is faster, easier, and, in general more attractive to users (d) transfer the system to a small, dedicated mini-computer to improve response time and enable exportability. (e) establish an on-going evaluation program to monitor the growth and convergence of the knowledge base. with the assistance of collaborating clinicians on the wards. 13 PRIVILEGED COMMUNICATION See. 4.1 s.x.cohen (a p . , Competence 4.1.1 Breadth The work to be done in the future development of MYCIN is ilustrative of the expected needs of knowledge engineering programs in general. Our previous work has resulted in a program that is currently capable of dealing with bacteremia and meningitis, but for several reasons this is too narrow a range if it is to be useful in a research or clinical setting. One problem, for example is that the physician must decide whether the patient is suffering from either of these before he can determine if MYCIN would be an appropriate source of advice. But a significant part of the diagnostic task is this determination of infection etiology. Requiring the physician to make this decision thus presents a significant barrier to use of the program. A second problem arises from the interactions of multiple infections. Cases are often complicated by the presence of more than one infection, and it is not in general possible to consider each infection independently, To select precise therapy, MYCIN must be able to sort out the various sources of infection, and determine their influence on one another. In complex situations such as these printed textbooks usually fail to cover all combinations of dependencies. Thus a program that reasons about these situations can provide intelligent assistance to the researcher or clinician who wants to have expert-level advice. Finally, our experience with new users of the system suggests that they can at times overlook explicit instructions concerning the programs capabilities, and present it with medical problems outside of its competence. It will prove very important for the eventual unsupervised use of the program, then, that MYCIN be able to recognize the limits of its capabilities, and respond appropriately. That is, like the human consultant, the program must be able to say “I don’t know’. In response to these problems, we intend to work 6n three specific issues. First, we will extend the system’s range of competence to cover both urinary tract and pulmonary infections. Based on analyses of infection frequencies seen at our medical center, the inclusion of urinary tract and pulmonary infections should permit MYCIN to handle 76% of hospital acquired bacterial infections and 64% of all bacterial infections. Strategies similar to those employed in our approach to bacteremia will be used to expand the system to include these important areas of infectious disease. In addition, it will be necessary to develop the ability to identify the underlying foci of infection, so that the program can bring to bear the appropriate subset of its knowledge of the field. Second, we will extend the system’s knowledge base to cover 14 Quam... COMMUNICATION Sec. 4.1 S.N. Cohen > the prophylactic use of antimicrobial agents. This will be an especially useful area, since prophylaxis (defined as the use of antimicrobial agents before disease due to an infectious agent is present or before infection or colonization with an organism has occurred) represents one of the largest categories of use and abuse of antimicrobials. There are circumstances, such as prevention of endocarditis in patients with underlying heart disease, in which such treatment can be justified, In most eases, however, financial costs and potential drug toxicity exceed the marginal benefits to be achieved, and prophylactic therapy is thus unwise. Kunin [13] reported that 58% of surgical patients in a major university medical center received prophylaxis, but such therapy was judged appropriate in only 38% of these cases. Thus, prophylactic use of antimicrobial agents represents a substantial fraction of antimicrobial misuse, and the inclusion in MYCIN of knowledge about this area would greatly enhance its clinical utility. The final issue is the further development of the systems ability to recognize and convey its limitations. The current system has something of this already, and can recognize (in cases of bacteremia and meningitis) those situations when there is too little clinical or lab data available to draw any substantive conclusions about therapy. This will have to be extended to enable the system to recognize the situation in which the problem is not insufficient data, but insufficient knowledge about a medical problem outside of its domain of competence. Such a capability will inerease physician confidence in the system, as well, since he knows that the system is capable of indicating its inability to advise. Once MYCIN has this broader range of medical knowledge, along with the ability to select the applicable part of its Knowledge base and the ability to recognize its own limitations, the system can be used with confidence. It can integrate a large amount of judgmental knowledge from experts and advise other researchers and clinicians about specific problems on the basis of that knowledge. This offers a much greater assurance that MYCIN will be playing an effective role in health care research, 4.1.2 Depth Experience with new users has also Suggested that some of the questions asked by the system during the course of the consultation require too much judgment and sophistication on the part of the user. One question, for instance, inquires whether the patient “is febrile due to the infection’. Since determining the source of a fever can be a difficult and subtle problem, this question presumes a great deal of the user. In addition, it was the shortage of exactly this sort of expertise among non-experts that motivated the choice of infectious disease as a domain and the design of MYCIN as a clinical consultant. If the program is 15 a" COMMUNICATION Sec. 4.1 S.N. Cohen Fe to be useful, it should focus on objective data, and be able to rely onits set of rules to supply the judgmental knowledge necessary to make the difficult, subjective judgments. In practice, this means that concepts like “febrile due to the infection” must be further decomposed to discover the grounds on which such decisions are made, and new rules written to embody those decisions. Each of those rules should be examined in turn, to insure that they do not require unreasonable levels of expertise from the user. In this fashion, the basis on which the program makes its conclusions (and hence the questions which it asks) will move away from ‘softer’, subjective information, and toward more easily quantified objective data. The point here is not to reduce the physician’s role to that of simply entering data, since some of these subjective judgements are best performed by the physician. We intend rather simply to increase the system’s judgmental capacity and level of sophistication, as our fundamental aim is to create an effective symbiosis between physician and computer, making the best use of the talents of both, (This movement toward objective bases for decisions would also provide an effective solution to the problem of variations between users of the program. Especially where questions of judgment are concerned, there can be some variation in the answers to MYCIN’s questions given by two clinicians running a consultation about the same patient. We expect that if the program were to request less subjective data, this variation would be much reduced.) 4.1.3 Disease Models One important capability of a human consultant is the ability to detect and take appropriate action in response to inconsistent information. This appears to be based on a knowledge of what constitutes a ‘normal’ constellation of symptoms for a particular pathology, i.e., a model of the disease. For example, consider the case of a 24 year old military recruit presenting with meningitis. History taken from the patient reveals that he has been recently exposed to other recruits with meningococcal disease while physical examination shows areas of purpura over his entire body. However, gram stain of the CSF is interpreted as showing gram negative rods. An infectious disease expert would have the gram stain of the CSF of this particular patient reexamined to ensure that there had been no misinterpretation. MYCIN currently has a very simple version of consistency checking, in that each individual answer given during a consultation is checked for validity. For instance, the system will challenge a response indicating an age of more than 100 years, or a white blood cell count of more than 30,000. But each 16 Gaal... COMMUNICATION Sec. 4.1 S.NeCohen a of these is an independent test based on the entire possible range of each piece of data. The program should have the same sort of disease models that human consultants seem to employ, to allow it to test the plausible validity of each piece of information in the context of the likely pathology. This would add an important capability to the program, if it were faced with a situation in which some particular piece of information seemed to be at variance with the current hypothesis about disease etiology. It could suggest to the clinician the possibility of a technical or clerical error in the lab report, and indicate that the test should be re-run. Where this was impractical due to considerations of time or expense, the inconsistent datum could justifiably be ignored in the remainder of the consultation. This ability to judge the likely validity of information within the context of the clinical situation is an important part of human performance on the task, and will make a significant contribution to MYCIN’s competence. H,1.4 sensitivity Analysis Extensive testing of the program on real eases has suggested two other types of reasoning ability that will markedly enhance the program’s performance as an intelligent assistant. We noted above the program’s ability to recognize the situation in which it has insufficient data to make a recommendation. Ina similar situation, a human consultant does not simply indicate the lack of data, but goes on to suggest additional tests to run, and indicates exactly which pieces of information will be required before a conclusion can be reached. This is the first of the additional forms of reasoning the program should have -- it should be able to indicate the source of its inability to reach a conclusion, and determine what information is necessary before it can proceed. There is a large body of work in the field of decision analysis (see, e.g., [15]) that will provide a useful foundation for this. Second, a human consultant may at times offer a recommendation with the warning that the evidence was contradictory, and even a slight change in the data might make a large difference in the final result. That is, he can indicate how sensitive his final answer is to small changes in the information on which it is based. The fundamental mechanism on which MYCIN is based is particularly well suited to implementing both of these abilities. Since the system performs a_ step-by-step analysis of the case, with each decision expressed by one of the rules in the knowledge base (rather than a one-step probabilistic computation, for instance), MYCIN is capable of reviewing its own reasoning process, re-examining it, and making further decisions about it. Thus, if unable to reach a conclusion, it might re-examine the reasoning used to see what missing information prevented it !7om 17 PRIVILEGED COMMUNICATION Sec. 4.4 S.¥.Cohen GD reaching an answer. Similarly, it might routinely re-examine its results at the conclusion of a consultation, to determine if any are sensitive to slight changes inthe information about the ease. If so, it might offer the physician a very specific warning, indicating exactly what changes should be made to its current. recommendation in response to specific changes in information about the patient. In the example given above, for instance, the system might indicate that the meningitis may, indeed, be of gram negative etiology, but that the validity of this diagnosis is based solely on the results of the gram stain of the CSF, The system would also note that an abundance of clinical data suggests the diagnosis may be meningococcal meningitis and that antibiotic coverage for neisseria- meningitidis should also be considered. 4.2 Knowledge Engineering Support Tools As is clear from the preceding discussion, much of the knowledge engineering work of increasing the system’s competence involves ongoing development of the knowledge base, and requires constant re-testing and evaluation on real cases. We intend to develop several types of support facilities designed to speed this task. 4.2.1 Patient Library An on-line patient library, for instance, will provide many useful features. It can offer a standard set of cases against which the knowledge base can be tested periodically, to insure that modifications and extensions to improve performance in one area do not inadvertently degrade performance in other areas. It can also offer a ready source of examples on which newly added rules can be tested. The first step will be to provide efficient cataloging and access facilities, so that library contents are easily surveyed and retrieved. More sophisticated features would include automatic case selection. Since most changes to the knowledge base will have no effect on the majority of cases in the library, appropriate selection of test cases gains importance as the library size increases, With an automatic selection ability, the program would choose a range of relevant cases on which to test the modification, basing its choice on the nature of the particular modification made, N22 Knowledge Acquisition A second important tool is the further development of the existing knowledge acquisition capability. The primary aim here is to provide a mechanism to allow the infectious disease expert to ‘educate’ the program directly, and to build a large 18 | PRIVILEGED COMMUNICATION Sec. 4.2 S.N.Cohen [a collection of rules without undue effort, Currently, most changes to the knowledge base are Suggested by our clinical experts and effected by the programming staff, There is thus often a delay of a few days between the discovery of a problem and its repair. By bridging the gap between the clinical expert (who communicates his ideas in English) and the system (which ‘understands’ only programming languages), it becomes possible for the expert to make changes in the knowledge base by himself. He can thus make and test his changes in a few minutes, and see immediately if they improve performance. A system designed to do this has been constructed, and has demonstrated the utility of acquiring new knowledge directly from the expert, in the context of an existing shortcoming in the knowledge base [14]. But further development of these features is necessary. For instance, we intend to improve the existing record keeping facilities, to include extensive background information about all rules, giving such things as the name of the expert who wrote the rule, the motivation for adding it to the system, references to published literature which corroborate the conclusions it draws, and a history of modifications made to it. For the expert extending the knowledge base, this provides a “seratch pad’ of sorts, making the ongoing task of knowledge base development a good deal easier. For the clinician using the system, it means increased confidence in the advice offered, since not only ean that advice be explained, but there will be literature references available for each step in that explanation, A second improvement would be a more powerful ‘rule editor’, that would make it easy for the clinical expert to make any of a number of common changes to rules in the knowledge base. He could then make small changes without going through the more extensive routines necessary to re-write the rule. 4.2.3 Testing the Effect of Adding @ New Rule As the knowledge base grows Significantly larger, we will encounter new problems of modifying and using it. Testing the effect of a new rule, for instance, is currently done with empirical techniques, as indicated above, by running the new system on a large number of cases. However, it may eventually become impractical to do this as the knowledge base gets very large, since too many cases may have to be tried, Hence the empirical techniques should be Supplemented with analytical techniques, in which the system examines its own knowledge base to determine what effect adding the new rule may have. This is, once again, made feasible by the particular rule-based representation of knowledge that we use. 42.4 strategies Problems inthe use of a very large knowledge base may arise because, currently, the system tests every rule for 19 PRIVILEGED COMMUNICATION Sec. 4.2 S.N-Cohen GS CHEB... --revance to the patient being discussed. This may ~~ eventually become impractical as the knowledge base gets quite large. It may then prove necessary to add to the system a number of strategies which allow it to apply its rules more selectively. We have developed a mechanism for the expression and use of these strategies, and plan to begin assembling and testing a number of them to improve MYCIN’s performance. 4.3 Human Engineering and Clinica] Capabilities The common reluctance of researchers and physicians to accept computers as intelligent assistants presents a challenging design problem. It means that a high level of performance alone is insufficient to assure that a program will have an impact on health care research and practice. We must present the physician with a program that is similar in some respects to the source of advice he is used to, the human consultant. It was this that motivated the explanation facilities in MYCIN, since we recognized early in the program’s development that physicians were unlikely to accept dogmatic advice from a program without further explanation of its basis. We will continue developments of this sort, to insure that the system is not only a competent consultant, but one that is *friendly’ and easy to use. 4.3.1 Dose Modification in Renal Failure As one example of a new development to increase the system’s utility, we will be developing new uses for the routines which modify drug dose in renal failure, They are currently invoked when the therapeutic regimen is printed, near the end of the consultation. But the problem of dose modification in renal failure is a common one, and the computation required is a complex operation. Thus a physician may be reluctant to undertake the necessary computation for a regimen he may have selected on his own. In response, we intend to make the dose modification routines available as a separate option in MYCIN. A physician would be able to request a “mini-consultation’ concerned solely with renal failure and dose modification. This is one example of a more general idea: providing a number of small, but highly useful auxillary routines that can assist the clinician with many of the necessary tasks he must perform in administering antimicrobial therapy. We believe that the physician’s bias against computers may not be so strong where straightforward mathematical computations are concerned. These simple-to-use utility routines can provide the initial inducement to the physician to use the computer, and may eventually encourage him to view our entire system asa _ useful tool in patient care and disease management. 20 que. COMMUNICATION Sec. 4,3 S.N-Conen 4.3.2 Improvements in Data Collection Our experience with new users of the system also indicates that physicians tend to become impatient with the system’s current approach to data collection. They are used to offering the consultant a brief summary of the case that compacts a great deal of important information into a few sentences, Since the problem of having a computer understand ordinary English is well known to be very difficult, we have instead settled for having the program request each piece of information individually. We will be making several changes in this process to speed it up, The Progress Report section above mentioned a revision to the organization of the consultation that offers several advantages, including faster, more uniform data entry. This process will be Simplified still further, by tabularizing it. That is, instead of answering each of a number of individual questions, the physician will be presented with a table he can complete, one that will have room for the necessary information. This should make the process even easier. The remainder of the consultation will then consist of a relatively few questions that are specific to the case under consideration. 4.3.3 Facilitating Communication These innovations will speed the process considerably, but the problem of typing ability remains a barrier to convenient use of the system. In response, we have begun to explore new forms of data entry. One possibility is the use of a customized keyboard that would make it possible to enter an answer like pseudomonas-aeruginosa with a single keystroke. Another is the use of a ‘response completion’ feature, Using this, the physician need only type enough of his response to make it unambiguous, and can then indicate that the system should finish it. With this feature, he may only have to type pseu, and can leave the remainder to the system. There is also the possibility of using amore sophisticated type of terminal, perhaps one equipped with a “light pen”, a pointer-like device that allows the user to point to items displayed on the terminal screen, Any or all of these facilities will insure that unfamiliarity with a computer terminal, or lack of typing ability, will not present a problem for persons who use the systen. Techniques like these help to facilitate the communication from the physician to the system. There is an analogous problem of ease and clarity of communication in the other direction, from the computer to the physician. We have found that some of the explanations the system offers to validate its conclusions extend to several lines of text. These are occasionally long and verbose enough that reading them can interrupt the flow of the consultation. We will explore the possibility of replacing these text-based responses with answers oriented around graphics capabilities. Given the natural interpretation of the use of 21 PRIVILEGED COMMUNICATION Sec. 4.3 s.v.conen GD rules during a consultation as the exploration of a reasoning tree, we believe this can provide an especially effective means of communication. An answer that requires several lines of text at present could easily be expressed with a simple diagram that made quite clear the system’s motivation for asking a particular question, or the foundation for any particular piece of advice it offered. This would be a significant improvement in the clarity of communication between the system and expert, In the past we have purposely avoided the use of any specially equipped computer terminals (e.g., those with light pens or graphics capabilities), in order to insure that the final version of MYCIN was easily exportable to a wide range of physician communities. With the progress in technology, however, it has become clear that many advanced features are becoming routinely available on inexpensive terminals that can be used over standard phone lines. We can take advantage of these new developments to make major improvements in the speed and ease of communication with the program, without requiring that each user make any large investment in specialized equipment. 4.4 Exportability of the System We see, within the five-year time scale of this proposal, a change in the character of our work, resulting from the growing importance of “hands-on” involvement by clinical experts. In the initial phases we have concentrated on building the basic methodology -- the production rule encoding of decision criteria, along with techniques involved in using them (the consultation, explanation, and knowledge acquisition systems). We are still involved inthis phase, and as noted above, will continue to develop these ideas and programs. There is of necessity, therefore, a very close connection between our work and that of the experts who are codifying their knowledge of the field. But our framework has begun to converge on a_ solid foundation, as changes to the basic methodology have become far fewer and further between. By year O4 of this proposal we foresee having a solid enough foundation that it can be adopted by outside experts as a basis for codifying their knowledge of the field, largely independent of our own. continued development work. This added dimension of the research -- clinicians and researchers working directly with our system to develop agreed- upon collections of judgmental decision ecriteria -- will put a Significant strain on the available resources. We are presently running on the computer at the SUMEX-AIM research facility supported by the Biotechnology Resource Program (under Grant RR-00785), and although the system is loaded close to capacity, we find it an effective facility for our program development research, The clinical experts, however, will need 22 Quam. COMMUNICATION Sec. 4.4 s.N.conen GD to use the results of our work (i.e., the programs we develop) as the medium for their own research. If their work is to be effective, and their continued involvement assured, they must be given high performance tools that provide a speedy response. We do not believe that SUMEX can now provide that additional level of research support, nor do we believe it is within the scope of the SUMEX-AIM charter to support widespread use of programs ina service mode. In addition, the long-term impact of our research is not likely to be very widespread if our system is available only on a very large and expensive computer. Given the potentially wide range of applicability of the proposed work, we believe it important in the long run to provide a relatively inexpensive, exportable system. Finally, we are currently relying on the SUMEX facility for both aspects of our work (methodology development by the computer scientists and knowledge base construction by the clinicians). Even now the latter computational load is large enough that moving it to a separate system would be an important contribution to reducing the burden on SUMEX. As a result of these problems, we recognize the need for Some additional means of exporting the system to the community for whom it is intended. There are two alternatives we are exploring in conjunction with the SUMEX facility staff: machine independent implementation of the programs and moving the programs to a satellite computer with many of the capabilities of a PDP-10. The MAINSAIL language is currently under development at the SUMEX facility asa machine independent programming language which will make possible wide dissemination of programs. Programs coded in this language will require little conversion effort to run on other computers. As a practical matter, however, this approach seems best suited to the design of new progran systems. It does not now appear to be a desirable solution for exporting programs the size and complexity of MYCIN, due to the magnitude of the reprogramming task, Another alternative, which is still consistent with MAINSAIL implementation, is the use of mini-computers that could be added to SUMEX as satellites . In this approach, one of the “large mini’s’ currently under development by DEC would be set up aS a peripheral to the main system, sharing the file system and other I/0 devices, but with its own memory and CPU to provide additional computing power. With the cost of such a system currently projected in the range of $250,000, it presents an adequate solution to the problem at a much smaller investment. If the satellite machine were capable of running INTERLISP (or a close dialect), we would realize several other advantages 23 PRIVILEGED COMMUNICATION Sec. 4,4 S.v-Conen aS well. With a direct hardware connection between machines, and minimal software conversion necessary, the two phases of our work (research on methodology and di velopment of knowledge bases) can proceed in parallel. This also provides an effective mechanism for feedback from the experts on their experience in building the knowledge base, and means we can more quickly incorporate their suggestions and ideas into our research work. There is of course an unavoidable degree of uncertainty in these plans. There is not now on the market an “off the shelf’ mini-system that meets our needs. However, several current and projected developments combine to make this a reasonable prospect. First, work is currently underway at Bolt, Beranek and Newman (Cambridge, MA) on a version of INTERLISP for the PDP-11. This software development is under-written by the Advanced Research Projects Agency (ARPA) of the Department of Defense, and will be available to the ARPA research community, which includes several of the projects at Stanford. In addition, the BBN work includes the development of an augmented PDP-11/45 as a hardware facility for running their System. We believe that an off the shelf system would be more desirable in the long run than the BBN hardware, which is in part “home grown’. But the existence of such a system, and the availability of the software to run it, is an important demonstration of the feasibility of our plans. In addition, the work underway at MIT on a “LISP machine” (a computer designed specifically to run LISP code, and intended to be price competitive with existing mini-machines) is another demonstration of the practicality of our plans, and another possible source for the facility. Finally, recent reports in the trade press (see Appendix B) indicate that commercial manufacturers will soon be offering a machine of the size and architecture needed to run our system, at a price in the range quoted above. Thus while we cannot now specify the precise piece of hardware which will provide the facility required, the developments noted indicate that it should be commercially available at about the same time as our projected need for it. We thus feel that, despite the uncertainty of projecting four years ahead, the necessary hardware and software will be available at an attractive price. ALS Performance Evaluation In order to demonstrate the effectiveness of the MYCIN framework for codifying knowledge for research scientists we will need to demonstrate that a disparate group of scientists can communicate with a growing knowledge base; find their points of disagreement, and reach consensus on a common expression of their knowledge. As a check on whether the experts converge ona correct set of rules, we will also need to demonstrate that the 24 tem and to uncover CIN’s Impact on Consensus Among PRIVILEGED COMMUNICATION Sec. 4.5 S.N.Cohen aa» resulting system comes to expert-quality decisions on difficult cases. Dr. Axline will be our initial outsidé collaborator. Because he understands the system and was instrumental in developing the bacteremia knowledge base, long distance collaboration will be less difficult than with any other infectious disease expert. He will be a major source of knowledge about prophylactic uses of antibiotics, for which the Stanford group will act as critics. The Stanford group, on the other hand, will be the primary source of rules about urinary tract and pulmonary infections, which Dr. Axline will then criticize. The knowledge engineering tools that we now have for examining the knowledge base constitute the minimal capabilities we need for long distance interaction. As soon as the University of Arizona and Stanford groups converge on these three initial rule sets, we will extend the collaborative community to experts at other institutions. Our evaluation activities will concern two areas, parallelling our two central goals, 4.5.1 Evaluation of MYCIN’s Performance in Infectious Disease During the next three years we will implement a formal program of performance evaluation, to insure the maintenance of a high level of performance in areas currently within MYCIN’s expertise, and to aid in extending that performance. Maintenance of existing performance levels will depend primarily on the patient library mechanisms described earlier. Using some of the advanced facilities we will be developing, it will be possible to have the program running unattended in the evenings, testing modifications to the knowledge base by selecting cases from the library, and comparing the new answers with the expected results. The system will be able to run and test a number of cases overnight, and make available in the morning a detailed report of the results. Extensions to the system will be made with the help of infectious disease fellows. By making the program available to them, we hope to profit by their extensive use and testing of it, isti weak CXESEANE wea points. 4,5 i ve Evaluation Experts in the Field suggest new additions to the sys of MY PRIVILEGED COMMUNICATION Sec. 4.5 S.N.Cohern We will also measure several different factors over an extended period to determine the effect of our system on consensus among experts in the field. First, we will keep track of the rate of changes made to the knowledge base, in terms of both the growth (addition of new material) and modifications (changes to existing material). Our premise is that a decrease in changes (perhaps even to zero) indicates that the experts using the system have come to an agreement on the basic decision criteria to be used and the appropriate answer for each case. We will measure the completeness of the knowledge base by the number of counterexamples (proposed either by the experts, or perhaps ultimately by the system itself) that force addition of new rules or changes to existing rules. While a decrease in changes to the knowledge base and number of counterexamples may suggest a consensus has been reached, it is important to verify that the agreed-upon set of decision criteria is in fact correct. For this reason, we will also monitor the correctness of the knowledge base by evaluating the quality of MYCIN’s conclusions. This will be done by asking other experts to rate the appropriateness of MYCIN’s conclusions and recommendations. This will also help us to measure the variability between experts. In infectious diseases, as in any other growing discipline, there is still some disagreement among experts as to what the “best” recommendations should be. We will measure this variability by proposing several cases to a panel of experts and asking for their opinions about MYCIN’s and each others’ recommendations. This measure will be important in determining the level of consensus before and after interaction with our programs. A decrease in this inter-expert variability will provide an indication that interacting with our system compels the expert to recognize explicitly the criteria that should be employed in reaching a decision, and hence provides an effective forum for discovering variations in those criteria among experts. 5 Significance of the Research By assembling the program’s knowledge base of rules, we will arrive at a compilation and systematization of the current knowledge of infectious disease diagnosis and therapy. While any one expert may be able to supply only a part of that entire collection, by calling on the services of many experts it should be possible to construct what may become a unique reference source for currently accepted practice. A system such as MYCIN can provide a source of consistent, up-to-date consultative advice, available at ali hours to any 26 PRIVILEGED COMMUNICATION Sec. 5 s.v. cohen GS physician with a computer terminal and a telephone. It can be systematically modified to reflect regional differences in clinical practice, and quickly updated to take advantage of progress in medical research. We believe that in the long run it can favorably affect the prescribing habits of physicians, resulting in better medical care. In addition, the system may have a significant educational impact. It is prepared to offer a detailed explanation for every step in its diagnostic process, and can also answer more general questions about its knowledge of the field. These explanation and question answering capabilities not only assure the clinician that the program reaches its conclusions by a reasoning process similar to his own, but can provide a strong instructive influence for the student. Finally, where most attempts at quality assurance are retrospective and involve mechanisms like chart review, MYCIN offers the possibility of prospective assistance. This is not only effective in maintaining quality, but by offering assistance before treatment is initiated, can have a more immediate impact on health care practice. Prospective intervention is also likely to meet with greater physician acceptance, since it offers him an opportunity to obtain advice before acting, encouraging him to avoid making mistakes rather than pointing them out after the fact. MYCIN may also be useful in situations where chart review remains the preferred technique for quality assurance, A common problem with the standard approach is that it requires either subjective judgments and a significant time investment by the very specialists whose expertise is in short supply, or the use of a single set of global criteria by which to evaluate performance, promoting what has been called ‘stereotyped medicine’, The existence of a program whose performance was known to be of high quality would provide an effective solution. House staff could conduct the chart review (freeing the specialist), and the system would provide a perfectly repeatable, objective standard by which to judge performance. Note that MYCIN is currently capable both of making specific conclusions on the basis of each case individually, and of offering an assessment of the range of possible causative organisms and therapeutic regimens. It thus becomes possible to evaluate performance on individual cases, rather than setting global (usually statistical) standards, and to judge the accuracy of a range of answers. 6 Facilities Available The Stanford University Medical Experimental Computer (the 27 PRIVILEGED COMMUNICATION Sec. 6 s.n.conen QD One. system) is a dual-processor, time-shared Digital Equipment Corporation PDP-10 available via both a number of direct dial phone lines and the TYMSHARE national network of telephone lines. The system is a National Biotechnology Resource for applications of Artificial Intelligence to Medicine (AIM). MYCIN is one of the research projects accepted as part of the national AIM community, and given access to the system at no cost. Since all of MYCIN’s development for the past three years has been on the SUMEX system, this represents a significant saving. The Stanford University Medical Center and Computer Seience Department and the University of Arizona Medical Center are both involved in this work as a result of the participation of the co- principal investigators and Dr. Stanton Axline and associated clinical fellows. As noted above, we have used the Stanford Center as a source of both cases on which to test the system and physicians who can evaluate its performance, and will involve the faculty and fellows of both Centers in ongoing development and evaluation programs. 7 Collaborative Arrangements Dr. Axline has been a part of the project since its earliest days, and along with the principal investigator functioned as co-principal investigator during the initial three years of our work. He will continue to direct the University of Arizona portion of the project, acting as a primary source of infectious disease expertise, to improve the performance of the system. In addition, he will help design and carry out our evaluation program. This will offer the added benefit of giving us two different clinical groups contributing to knowledge base development, as well as a new patient population for program evaluation. The grant to SUMEX makes explicit the importance of collaborative scientific work, and to further this the SUMEX staff have provided a number of support facilities that make joint work more feasible. One of them is a collection of message handling programs which make communication from remote sites quite easy. Other facilities make it possible for one user to run a program while another user (anywhere else in the country) “watches over his shoulder,” perhaps offering a advice and evaluation. We expect as a result of all these factors that continued collaboration with Dr. Axline will offer significant advantages. 28 oe... COMMUNICATION Sec. 8 S.N,Cohen Es 8 Appendix A: Progress Report Submitted to BHSRE 8.1 Summary Over the past three years we have designed, built and partially evaluated a computer program capable of diagnosis and therapy selection for certain varieties of infectious diseases. The program is intended to function as a consultant, and “interviews” a doctor about his patient, requesting information on clinical findings and results of laboratory tests. It relies on a store of judgmental knowledge (obtained from experts in infectious disease) to determine the conclusions which can be drawn from the answers it receives. This judgmental knowledge is in the form of some 400 decision rules dealing with the wide range of topics that must be considered in determining the likely identity of causative organisms and selecting appropriate antimicrobials, MYCIN is composed of the three systems described earlier (the consultation, explanation, and Knowledge acquisition systems), all of which reference the knowledge base of decision rules. The program is currently capable of dealing with bacteremia and meningitis infections. It can diagnose the likely presence of more than 35 different organisms and can recommend therapy for 100 organisms, selecting drugs from a ‘pharmacopoeia’ of 30 antimicrobials. The system can tailor its therapy recommendations to a specific organism and infection, can adjust dosage levels and durations in response to impaired renal status, and can combine drugs to create combination therapies, giving it a wide range of clinical applicability. 8.2 Detailed Report Our work in the past several years has been organized around five main areas of investigation. We have a) increased the system’s competence in existing areas of clinical expertise while expanding its scope b) developed a number of user-oriented features to increase the program’s attractiveness to clinicians c) developed a range of knowledge acquisition capabilities to speed the process of expanding the system’s clinical competence d) solved a number of technical problems to insure that the program does not outgrow the computer resources available to it 29 | we, COMMUNICATION Sec, 8.2 S.N.Conen (i, e) evaluated the system’s level of expertise, 8.3 Clinical Capabilities Since the primary qualification for any clinical consultant is competence in the domain, we have devoted significant effort to expanding MYCIN’s knowledge base and widening its scope of competence. For instance, the system was directed initially at patients with positive blood cultures, the basic methodology was generalized to support a much broader approach to the problem. MYCIN has now gained the ability to deal with infections from which the causative pathogen hasnt been isolated (e.g., pneumonia), or which haven’t even been cultured (e.g., brain abscess). With this broadening of scope, it has also become necessary to be able to evaluate the meaningfulness of isolates for cultures taken from sites other than blood. For urine and sputum isolates, for example, the System gained the ability to base its evaluation of sterility of an isolate on both the method of collection and the user’s estimation of conscientiousness of collection. An extensive review of the program’s approach to drug selection has led to a major revision in the basis for therapy selection during the course of program development. The program was given the ability to consider both the infectious disease diagnosis and the significance of the organism as further determinants of therapy, in addition to organism identity. These three together have become the primary factors in drug selection, with drug toxicity and ecological factors as secondary considerations. The result is a more appropriate, more sharply focussed drug selection that also includes dose, route, and duration. While the initial development of the Knowledge base focussed on rules concerned with the diagnosis and therapy for blood infections (bacteremia), the complexity of infectious disease therapy and the frequent occurrence of multiple infections in a single patient requires a bro: der knowledge if the system is to be clinically useful. In response we have extended MYCIN’s knowledge base, while at the same time improving the degree of sophistication with which the system deals with bacteremia. The second major area has been the diagnosis and treatment of meningitis, and more than 100 rules were added to provide the ability to deal with it. In the processs the program was also extended beyond bacteria, as it gained the ability to consider and treat both fungi and viruses. This area has proved to be an especially useful domain 30 Oma: COMMUNICATION Sec. 8.3 S.NeCohen f because it has presented several new challenges, In particular, meningitis requires the ability to deal with a disease that is often diagnosed on clinical grounds alone, before any specific microbiological evidence is available (by comparison, the diagnosis of bacteremia on clinical grounds alone is far less certain, and usually requires establishment of the fact that bacterial growth has occurred in blood cultures.) For this reason, extension of the project into the meningitis area has made it necessary for MYCIN to consider a larger range of clinical factors, and has resulted ina system which has a broader picture of the whole patient. Other contributions to the system’s competence have come from expansion of the knowledge base to include information about normal bacteriological flora for a wide range of culture sites, This enables the program to distinguish between normal and pathological flora, and it can as a result decide more precisely on whether to treat. 8.4 User Oriented Features Clinicians traditionally shun computer programs, and we believe this is in large measure due to insufficient attention paid to user oriented features. As a result, we have devoted significant effort to insuring that MYCIN is responsive to its users in a number of unique ways. The development of the explanation and question answering capabilities have been a essential for this work, and both have grown extensively in power, The system’s ability to explain the motivations for its questions, for instance, underwent a major design revision. It is now based ona more powerful approach that relies on the program’s knowledge of its own control structure and ability to examine its own rules. The user can now fully explore the system’s current line of reasoning, rather than just a single level, as initially implemented. The language understanding capabilities of the question answering system have also been extensively revised, They now allow a broader range of questions to be asked and offer more precise answers. The use of this feature was also simplified so that the user no longer needs to classify his questions. A comprehensive review of the kinds of questions asked by users of the system has led to a number of important features. MYCIN can now answer a much wider range of questions, and can, in particular, explain why it did not take a specific action, as well as why positive conclusions were reached. It is our feeling that capabilities such as these are of great importance in enabling the project’s staff and clinical experts to understand 31 PRIVILEGED COMMUNICATION Sec. 8.4 S.N.Cohen > the program’s rationale for its actions in instances where its recommendations do not appear to be the most appropriate and most correct, Thus, the line of reasoning of the program can be evaluated, and requirements for new or modified rules can be uncovered, These kinds of capabilities are also important in optimizing user acceptance of the system. A substantial addition to the question-answering facility enables the system to explain the process of therapy selection, In comparison to the diagnostic process, therapy selection is complicated somewhat by the need to consider a range of different factors simultaneously, such as the total number of drugs recommended, the degree of sickness of the patient, possible interactions between drugs, toxicity and other side effects, etc. Despite this complexity, explanations of therapy selection are phrased at a conceptual level that makes them comprehensible to the physician. As before, this makes it possible for the physician to verify the validity of the system’s decisions, and makes it clear to him that the system reaches its results in much the same way that he does. The explanation consists of a step-by-step review of the reasoning which led to recommending a particular drug fora specific organism. It considers such issues as why a drug was first considered for an organism, why a drug may have been chosen as the best therapy for that organism, how the total number of drugs was reduced by considering common drug classes among the candidates, and consideration of possible contraindications based on the patient’s allergies, age, and other factors, By characterizing each drug according to this scheme, the program can explain why a drug was or wasn’t prescribed, as well as why one drug is to be preferred over another, This offers an important explanatory capability that will make the system more attractive and acceptable to clinicians. Several capabilities have been added to make the program easy to use. The system is now more tolerant of erroneous or inappropriate responses, and is able to provide a reworded question, along with a list of acceptable answers. In addition, it has the ability to recognize responses which are not sufficiently precise, and can rephrase its questions accordingly. We have recently added to the system the ability to modify drug dosage in cases of renal failure. Where, previously, the system only issued a warning to modify doses, it is now able to use either creatinine clearance or serum creatinine levels to compute the level of renal function. The program then uses drug- specific information (e.g., half-life, percent loss of the drug via renal excretion, etc.) to adjust the regimen. It can either (a) adjust dose levels downward and leave dosing interval unchanged, or (b) increase dosing interval and leave levels unchanged, or (c) allow the physician to select a dose interval, for which it chooses an appropriate dose level. 32 i a. COMMUNICATION Sec. 8.4 S.N« Cohen Ga» Since the problem of determining renal status and the proper adjustment of drug dose is important in the use of aminoglycoside antibiotics, cephalosporins, and other antimicrobial agents, the customization of drug dosage recommendations will be an important addition to the power of the systen. We have found, in addition, that there is a substantial amount of information that is routinely collected in every consultation, like the date and site of each of the cultures, gramstain and morphology results for each of the organisms that grew out, etc. Currently, the program exhaustively analyzes each culture and all of its organisms in turn. Some users of the program appear to be impatient with this method, and would much prefer to enter all the relevant data on all the cultures and organisms at once. This is faster and easier, since the information can be gathered in a single review of the chart, instead of having to review it several times as each culture is processed. In response to this, we have reorganized the consultation slightly, so that it is possible to enter all of this data at once, at the beginning. This offers two other advantages in addition to improving the program’s acceptability to its users. First, it provides a basis for our future efforts to write rules which deal with interactions between infections (see below, “Specific Aims’), and second, it suggests a mechanism for eventually merging our work with the product of existing efforts to organize and automate the recording and handling of medical record data. This latter development may in time make it possible for MYCIN to obtain a large part of the information it requires directly from such automated records, sharply reducing the number of questions it has to ask, and speeding up the consultation considerably. Finally, several new capabilities make the system convenient to use, in anticipation of its evaluation in the clinical setting. Among these are the option of the user to type a comment about system performance at any time during the consultation. His comment is recorded in a special file which is reviewed periodically by our medical staff, and provides an ongoing opportunity for users to offer feedback aimed at improving the usefulness of the system. The user can also indicate his belief that the system has “broken down’ in some way and he is invited to describe the problem. His description is saved along with information about the current state of the program, so that our systems programmers can deal with the problem later. 8.5 Knowledge Acquisition A preliminary knowledge acquisition program was completed in the middle of 1974, and demonstrated the feasibility of having 33 PRIVILEGED COMMUNICATION Sec. 8.5 S.N.Cohen Gaz» a physician teach the system new rules using a rather stylized subset of English. Building on the experience gained here, work began on a revised program designed to allow the user to examine and modify the program‘s knowledge and behavior as a single, unified action. _ This program was designed to make the explanation and knowledge acquisition capabilities available together, to make use of the fact that the nature of the explanations requested can give a clear hint about the content of anew rule. The program was also designed to advise the user about the effect of his rule on the original deficiency, indicating, for instance, whether or not it corrects the problem he noticed. Work on a preliminary version of this new program was completed in 1976, making available a broad range of useful features enabling our clinical experts to add rules to the system without requiring that they have a knowledge of programming. If the expert finds that MYCIN’s handling of a particular problem is at variance with his own expert knowledge, he can use the explanation capabilities to discuss the line of reasoning in use at that time, can add or modify rules in the knowledge base, and can determine the effects of the changes on MYCIN’s subsequent performance. (Quality control is maintained on the overall system by regular meetings of our clinical and pharmacological experts who determine the “official” MYCIN knowledge base.) 8.6 Technical Issues As MYCIN’s clinical capabilities have expanded, efficiency has improved as a result of a number of modifications to the system’s technical capabilities. Early in our work, for instance, a comprehensive review and modification of the control structure was undertaken to improve efficiency and generality. The resulting program was both more direct, and faster. More recently, modifications have been made so that the the large English dictionary can be kept on the disk and accessed only as needed, rather than keeping it in core, which slows down the system’s response speed. The self documenting features of the program have also been improved to make them faster, and the system’s interaction with the terminal has been made more uniform, to prepare for the time when different users of the system may have various different kinds of terminals. 8.7 Evaluation Activities Since clinicians are likely to require documentation of MYCIN’s competence and utility before seeking its advice, considerable time has been spent on evaluating the system and on 34 ED COMMUNICATION Sec. 8.7 s.n.conen Qi implementing a range of program features to support these efforts. In the past two years we have obtained many useful suggestions from clinicians when the system was presented to several different conferences. In February 1975 it was presented to the Western Society for Clinical Research, in September 1975 to the International Symposium on Clinical Pharmacy and Clinical Pharmacology, and more recently (June 1976), it was presented to the Drug Information Association. A large scale formal study and evaluation of MYCIN’s performance was begun in January 1976. The same set of clinical data was provided to both MYCIN and a set of experts in infectious disease therapy. [Five of the experts were nationally recognized authorities in the field, the other five were clinical fellows in the Infectious Disease Division at Stanford. A complete list of names, titles and affiliations is found in the list of evaluators at the end of this report.] The judgments of the program and the experts were compared, and the experts were asked to evaluate MYCIN’s performance. To do this, we first designed a form to allow us to separate the variables requiring analysis. The parameters evaluated include A. the “quality” of the interaction - were any questions irrelevant or missing B. the program’s ability to determine organism identity Cc. the programs ability to determine organism significance D. the program’s ability to select proper therapy E. overall performance evaluation F. potential impact as a clinical tool or teaching facility The evaluation form was designed to be informative yet simple to complete. It was tested in a pre-evaluation trial run, then used for the formal study, Consecutive patients with positive blood samples were evaluated for inclusion in the study by project personnel, until we obtained at least 10 patients for which MYCIN recommended therapy, and 15 patients overall (patients were rejected if they were outpatients when the sample was drawn, if they had a previous blood culture in the preceding seven days, or if they had a diagnosis of meningitis or infectious endocarditis.) For each of the patients accepted, a one to two page clinical summary 35 onl, COMMUNICATION Sec. 8.7 S.N.Cohen (i was prepared and combined with a Summary of the laboratory test data as of the time when the first blood culture was obtained. This information was then used to obtain a therapeutic evaluation from MYCIN, Each of the participating experts received a set of fifteen evaluation forms (one for: each patient). Each form contained: (a) the clinical summary and lab data; (b) space for the expert to record his conclusions about the nature of the infection, likely causative organisms, and appropriate therapy; and (ec) a transcript of the MYCIN consultation along with space for the expert to record his opinion of various aspects of MYCIN’s performance. By presenting the information in this order, we obtained a therapeutic regimen from the expert based on the same information supplied to MYCIN. This allowed us to compare the expert’s answers to MYCIN’s, and also gave us the expert’s opinion of the system’s performance. In the past few months a sufficient number of the forms have been returned that we were able to do a preliminary analysis. The figures below are based on the nine (out of ten) which have been returned. Since it is difficult to select a single number which summarizes performance, we have in general measured each of the parameters listed above in three ways: (i) the percent of instances in which the program was judged exactly correct, (ii) the percent of instances in which the program’s performance was judged exactly correct or an acceptable alternative, and (ii) the percent of cases in which a majority of the experts judged its performance exactly correct or an acceptable alternative. By using all three measures, we obtain a range of figures which give a good picture of the program’s performance. All of these attempts to evaluate performance are complicated by the fact that (as expected) the experts’ own choices about each patient were not unanimous. Thus, we cannot ask whether MYCIN’s answers were ‘correct’ in any absolute sense, since there was no agreement on what constitutes “correct”. Instead, we ask how often each individual expert rated the program’s responses as correct. But given the variation among experts themselves, the program can never be expected to reach 100%, and depending on the extent of the intra-group variation, the absolute limit may in fact be much lower. Thus the ideal question to ask is Do experts rate MYCIN’s performance correct at least as often as they rate each other’s performance correct? This would give a good indication of how close the system’s performance was to that of the group of experts as a whole. We have been able to do this in a few isolated eases, but in general it requires more information than we were able to collect. This is discussed in more detail below, but in general terms the problem is that we were able to ask each expert for his choices for each patient, and ask him to rate MYCIN’s choices. But, without a second round of questionnaires, which would ask 36 PRIVILEGED COMMUNICATION Sec. 8.7 SN. Cone each expert to rate the acceptability of the other 9 experts’ responses, we lack direct information about intra-expert variability. The figures below should be reviewed with this caveat in mind. A. “Quality” of the interaction To measure the first item, the experts were instructed to mark any questions in the consi.ltation which they felt were irrelevant, and to note any questions which they felt were omitted by the system. Overall MYCIN did quite well, as there were no consultations in which a majority of the experts felt that any particular question was irrelevant or omitted, On the average, there were 0.53 questions judged irrelevant and 0.55 indicated as omitted. Table I summarizes the next four measurements. 37 PRIVILEGED COMMUNICATION Sec. 8.7 S.N. Conen QS + % of instances % of instances MYCIN’s first % of cases MYCIN’s MYCIN’s first choice choice was identical to, or first choice was was identical to an was judged an acceptable identical to, or was experts first choice alternative to an expert’s judged an acceptable first choice alternative by a majority of experts rt wre nee ee weno we wee w en ewne ee nn nt nn en ww nn op ne me ww nen noe ewe enna} i ' i ORGANISM 56. 3% i 75.6% i 81.8% | IDENTITY i ! { N= 414 ' N= 414 ' N= 11 ' ee rrr ee te ew ween ween nee en me ee Rn ew ee mw ee wen peewee eee wee wee ee ee eee} i { i ORGANISM 91.7% ' NA | 100% i SIGNIFICANCE ' ' | N = 36 { ' N= 4 | wet ew we ee nee mw enw wero os ewene Pe a re rw pee ne ween wen ee ee ee ny THERAPY 12% ' 15% | 91% i SELECTION { ' i N = 99 i N= 99 ! N= 11 i ort ew rn He ee eww enw ee ww eee Fe en rn re er en ee enn een ene en peewee owe e eww e mew mmo enwon} ' ' | OVERALL 17.0% i 59.3% ' 60.0% i PERFORMANCE } i N= 135 i N= 135 i N= 15 { ee re ee ew sewn e wwe cee wenn pr nn tn ee er ee ee nn pn nn ew eee nny Table I. Summary of nine experts’ responses to MYCIN’s performance on 15 cases 38 QB > comunzcation — sec. 8.7 S.N.Cohen Qa B. Organism Identity For organism identity, the experts were asked to rate each of MYCIN’s selections as exactly correct (they agreed that the organism was likely to be present), an acceptable alternative (they had not chosen that organism, but agreed it might be present), or an unacceptable choice (they disagreed with its selection). Since 11 of the cases were not contaminants, and there was a total of 46 organisms chosen by the system, with 9 experts rating each of those choices we have an N of 414 for the first two columns and 11 for the third. In 56% of the instances the system’s choices were identical to the experts”, 75% of them were either identical or acceptable alternatives, and in 82% of the eases, its results were acceptable to a majority of the experts, In addition, the experts were asked to indicate which organisms they felt MYCIN had overlooked in its diagnosis. For the 11 non-contaminant cases, the experts indicated an average of only 0.35 organism identities that were overlooked by the system. In no case did a majority of experts feel that any particular organism had been overlooked, Suggesting that even the 0.35 figure is a result of intra-expert variation. C. Organism Significance The first question on the evaluation form gave the expert a chance to indicate that he felt the patient did not need to be treated. The first column of the second row indicates the number of times the expert indicated no treatment was necessary for a case in which MYCIN also judged the organism to be a contaminant. (There is no number in the second column since we did not ask about a “close call’ on whether or not to treat. In addition, the measurement is based only on the contaminant eases, since in many of the cases where both MYCIN and the expert determined that treatment was necessary, they based that decision on different organisms. We felt that it would be misrepresentative to call these situations ‘agreements ”.) As the figures show, in only three out of 36 instances was there any disagreement with the system’s decision on whether or not to treat, D. Therapy Selection The expert was asked to select therapy for the organisms which he felt were likely to be present before looking at MYCIN’s therapy recommendation. He was then asked to judge MYCIN’s choice of therapy for that patient. Since MYCIN was selecting therapy for the organisms which it felt were present (which may have differed from those chosen by the expert), this provides a fundamental comparison of performance - it compares therapy 39 . a COMMUNICATION Sec. 8.7 S.N.Cohen GD selection performance of the two when they are faced with the same clinical situation. This comparison isa difficult one to make, since it is complicated by the difficulty noted above, of variability in the experts” performance and the need to judge MYCIN with respect to that variability. Looking only at exact agreements (i.e., two identical therapies) produces the figure inthe first column, which indicates that 12% of the time MYCIN’s recommendation was identical to that of an expert. Comparing each expert’s therapy choice with the other 8 indicates that 35% of the time (N= 396) any pair of experts chose identical regimens. The experts were also asked to judge whether MYCIN’s therapy was an acceptable alternative (if it was not identical to their own), producing the figure in the second column. This indicates that it was either identical, or they felt it was an acceptable alternative 75% of the time. (Unfortunately, we have no reliable way of judging the intra-expert variability here, without a second round of questionnaires which asked each expert to rate the acceptability of the other experts” choices.) [As an alternative, we have attempted to develop a measure of how ‘far apart” two non- identical regimens are. But the problem is difficult: for example, for gram negative rods with salmonella most likely, is gentamycin and chloramphenicol ‘very different’ from gentamycin and ampicillin? We have been working on a “drug metric’ to solve this problem, attempting to base the difference between two drugs on factors like organism susceptibility, toxicity, and drug efficacy, but this work is still in progress. ] The figure in the third column gives a crude overall measure of therapy selection performance, and indicates that in 91% (10 out of 11 cases), a majority of the experts rated MYCIN’s regimen as either identical to their own or an acceptable alternative. [The evaluation form also asked each expert to choose a regimen for the organisms which MYCIN had selected. The intent here was to compare the system’s performance against the expert when both were faced with the same set of organisms (rather than compared with the same clinical situation, as above). Unfortunately, inconsistent answers on the part of the experts indicated that they were not answering the question according to the instructions. It appeared that they were not able to suspend their own judgments about organism identity sufficiently to select a regimen based on MYCIN’s organisms alone. For this reason, we believe the data to be unreliable, and have not included it here. ] E. Overall Performance At the end of each evaluation form, the expert was asked to rate the system’s overall performance as either excellent, good, fair, or poor. The first two columns of the last row indicate 40 = . COMMUNICATION Sec. 8.7 S.N.Cohen Gi» that 17% of these evaluations were ‘excellent’, and almost 60% were either “excellent” or “good” (only 13% were ‘poor’). In 60% of the cases (9 out of 15), a majority of the experts felt that MYCIN’s overall performance was either ‘excellent’ or ‘good’. F. Present Utility and Future Potential Finally, after completing the entire set of 15 patients, each expert was asked to rate MYCIN’s present utility and future potential as a clinical tool and as an educational tool, rating it as having ‘considerable’, ‘some’, or ‘no’ potential. The table below summarizes their response. Evaluation of Present Utility “considerable” “some” “none” eee eer wenn ee e+e Her ee nn pe ee eww pe ewe n nny clinical tool i 11% i 67% ' 22% { wor cee eee ween ween ee-- towne em mn en en penn nn ey educational tool ! 11% | 89% ' 0% ' we eee ne non eee te ee nnn ne ce mn n ewe pore nn wee men pew ween enna} Evaluation of Future Potential “considerable” * some” “none” wee enn w nn enn eee te mm tenn ee ewe ww wn pe een eee wen pee nn weeny clinical tool i 11% ! 89% ! oF i oe een mew meee ance en ne Hew cm wn ew en rn pn an we enn pe ee educational tool ' 67% | 33% i Of ' —_— -- -- Heer ewewwe nese nn tame nneen nee +ueeen-— ~~ + Table II. Opinions of 9 experts on MYCIN’s present utility and future potential To aid these evaluation efforts, we have also implemented a number of useful features in the system. For instance, MYCIN now keeps continuing statistics of the use of rules in its knowledge base. This will help us to monitor its long term performance, to study the interrelationship between rules, and perhaps detect automatically any inconsistencies or gaps in the knowledge base. We have also designed and implemented a mechanism for ‘on- line” evaluation. At the end of each consultation, the system asks a few questions about the quality of its performance from the clinicians who are using it. This interchange will be brief to avoid being a burden to the user, but it is expected to represent an important addition to the other evaluation efforts. It will, for instance, make possible a new form of evaluation of the system. Rather than using a_ series of “prepackaged” cases as was done in our initial evaluation, the next stage will be carried out using information entered at a 44 PRIVILEGED COMMUNICATION Sec. 8.7 S.N. Cohen terminal by the evaluator. The participating panel of experts will be selecting patients in areas covered by the MYCIN knowledge base, and will engage in a dialogue with the system about those patients. Following completion of the session, the on-line evaluation feature will ask questions about system performance, and the responses will be tabulated and evaluated on-line by appropriate biostatistical programs. Specific recommendations which may point out problem areas in the consultation will be reviewed by our staff. By this process we expect to be able to maintain a continuing evaluation of MYCIN’s capabilities in various areas, and pinpoint specific areas where performance is suboptimal, STAFFING Infectious Disease Dr. Stanton Axline, MD 6/74 to present co-prin. invest, Dr. Victor Yu, MD 9/75 to present research affiliate Dr. Frank Rhame, PhD 9/74 to 9/75 research affiliate Dr. Edward Shortliffe, PhD,MD 6/74 to 6/76 research assistant Clinical Pharmacology Dr. Stanley Cohen, MD 6/74 to present prin. investigator Dr. Robert Blum, MD 6/76 to present research affiliate Ms. Sharon Wraith, BS Pharm 6/75 to present research associate Dr. M. Goldberg, MD 9/75 to 9/76 research affiliate Dr. Rudolfo Chavez-Pardo, MD 9/74 to 9/75 research affiliate Computer Science Dr. Bruce Buchanan, PhD 6/74 to present investigator Dr. Randall Davis, PhD 6/74 to present research associate Ms. A. Carlisle Seott, MS 6/74 to present sci. programmer Mr. William van Melle, MS 6/74 to present research assistant Dr. Cordell Green, PhD 6/74 to 6/75 asst. professor Panel of Experts Participating in the 1976 Evaluation National Experts Dr. Dennis Maki, Chief of Infectious Disease, University of Wisconsin Hospital 42 Sa... COMMUNICATION Sec. 8.7 S.N. Cohen Dr. John McGowan, Assistant Professor of Medicine, Infectious Disease Division, Grady Memorial Hospital, Atlanta, Ga. Dr. Allan Kaiser, Chief of Infectious Disease, Vanderbilt Hospital Dr. William Schaffner, II, Associate Professor of Medicine, Vanderbilt Hospital Dr. Harvey Elder, Chief of Infectious Disease, Associate Professor of Medicine, Loma Linda University Local Experts (and their current positions) Dr. John Galgiani, Postdoctoral Fellow in Infectious Disease, Stanford Medical Center Dr. Larry Lutwick, Postdoctoral Fellow in Infectious Disease, Stanford Medical Center Dre Rudy Johnson, Assistant Professor of Medicine, Vanderbilt University Dr. Jerome Hruska, Assistant Professor of Infectious Disease, University of Rochester Dr. Stanley Deresinski, Assistant Professor of Infectious Disease, University of South Florida 43 PRIVILEGED COMMUNICATION Sec. 9 9 Appendix B: Hardware Announcement Say DEC ‘Readies Ist Unit fA 32-Bit Computer Line : Oo ___By RON ROSENBERG _ = MAYNARD, Mass. — — Digital Equipment is reportedly readying 5 ; the first of an expected new family of 32-bit computers that will be . ssoftware-compatible to its. 16-bit high-end PDP-11/70 and will <3; , utilize many of the performance features of its more expensive ; Se DECsystem 20 large computer. “ Be: * ‘Code named “VAX,” the new ‘computer could be introduced as ‘early’ as. October, according to sources, and be priced in the PDP- el ELECTRONIC NEWS, MONDAY, APRIL 25, 1977,”.. <: a -11/70 range but below the large DECsystem 20-starting price of- 3.1 $250,000. The system would initially compete against Interdata’s: ge 8/32 and System Engineering. Laboratories’ 32/75 machines. Both tf firms. are currently the major suppliers. of 32-bit-systems. oh. “Sources claim the key to the new DEC machine is its ability to (et “pun: PDP-11 software using an emulation mode: slightly slower . = -| "than the PDP-11/70. T ney also note that VAX will utilize many of- tq 4, the DECsystem 20 features, such as a mass bus with five unibus. Boat Ports. Machine : throughput: is _Teportedly_ “between 10° and 25. “al oe Lees See SAY, Page 6 . a Continusd i irom Page One - megabytes: per second— © The system is said to use emitter coupied logic (ECL) to achieve speeds approaching the DECsystem 10, DEC's . largest and most expensive system. - DEC reportedly has launched, main- tenance, manufacturing and test train- ing at the company’s leased Salem, N.H.. facilities, not far from a major manufacturing center DEC is con- -. structing. The new machines, sources . claim,’ will not have DEC in-house developed 32-bit software-at the VAX. introduction this fall. However, the : main features of the instruction set are ~~ expected to be similar to. IBM's 360 ap- proach. - While Digital Equipment declined to comment on the new 32-bit system, it has been learned that DEC has made- several presentations of the new system | . devices on a large 32-bit system. Inter-- ~ data’s 3/ S.N.Cohen a! Ste Bell. Laboratories’ Holmdel, "” switching center and, reportedly, 4 . several other large customers. : _The new machine, according to several Wail Street sources, could be DEC's next generation of small com- puters designed to compete against the expected inroads of IBM's Series/1 and, to some extent, its mainframes. They cite how the small computer industry is approaching capacity performance with . Aebit architecture.. - Industry sources said that 32-bit: ‘ machines offer more direct memory ad-- dressing, doubling of the instruction length and a dramatic.increase in ; peripherals and other’ input/output : men = can directly address one megabyte of memory, considerably larger than the PDP-11/70. os The smallest DECsystem 20 is priced : at $250,000 and the biggest is $400,000. The basic VAX" system is expected to be considerably less than the former. Industry sources noted that DEC- system 20, introduced less than 16 months ago, was designed to. “bridge the gap’ between the 16-PDP-11 and 1 business with a machine that has a CPU: with 370/145 performance, but the price range of a 370/115-125. The 2040 model also effectively replaces the earlier DECsysterm 1040. . The move to 32-bit architecture has been rooted in the PDP-11/70 which has data transfer paths that would also be: employed in a new machine, one source noted, adding that the PDP-11/70 is designed with a mass bus architecture. i . . vee : erm -. the DECsystem 10 (EN, Jan. 19, 1976): It is aimed to expand DEC’s mainframe | +1 yy nm... COMMUNICATION Sec. 10 S.N.Cohen a> 10 REFERENCES 10.1 MYCIN PUBLICATIONS Shortliffe, EH, Axline, S G, Buchanan, BG, Merigan, TC, Cohen, S N. An artificial intelligence program to advise physicians regarding antimicrobial therapy, Computers and Biomedical Research, 6:544-560 (1973). Shortliffe, EH, Axline, S G, Buchanan, BG, Cohen, S N, Design considerations for a program to provide consultations in clinical therapeutics, Presented at San Diego Biomedical Symposium 1974 (February 6-8, 1974). Shortliffe, E H, MYCIN: A rule-based computer program for advising physicians regarding Antimicrobial therapy selection, Thesis: Ph.D. in Medical Information Sciences, Stanford University, Stanford CA, 409 pages, October 1974. Also, Computer-Based Medical Consultations: MYCIN, American Elsevier, New York, 1976. Shortliffe E H MYCIN: A rule-based computer program for advising physicians regarding antimicrobial therapy selection (abstract only Proceedings of the ACM National Congress (SIGBIO Session), p. 739, November 1974. Reproduced in Computing Reviews 16:331 (1975). Shortliffe EH, Rhame F S, Axline S G, Cohen S N, Buchanan BG, Davis R, Scott AC, Chavez-Pardo R, and van Melle WJ MYCIN: A computer program providing antimicrobial therapy recommendations (abstract only). Presented at the 28th Annual Meeting, Western Society For Clinical Research, Carmel, CA, 6 Feb 1975. Clin. Res. 23:107a (1975). Reproduced in Clinical Medicine, p. 34, August 1975. Shortliffe, E H and Buchanan, BG, A Model of Inexact Reasoning in Medicine, Mathematical Biosciences 23:351-379, 1975. Shortliffe, EH, Davis, R, Axline, SG, Buchanan, BG, Green, C C, Cohen, S_ N, Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN systen, Computers and Biomedical Research, 8:303-320 (August 1975). Shortliffe EH, Axline S, Buchanan B G, Davis R, Cohen S, A computer-based approach to the promotion of rational clinical use of antimicrobials, International Symposium on Clinical Pharmacy and Clinical Pharmacology, Sept 1975, Boston, Mass. (invited paper) Shortliffe E H, Judgmental knowledge as a basis for computer- 45 __ PRIVILEGED COMMUNICATION Sec. 10.1 S.N.Cohen FS Pe sie ea assisted clinical decision making, Proceedings of the 1975 International Conference on Cybernetics and Society, pp 256-7, September 1975. Davis R, King J J, An Overview of Production Systems, Machine Intelligence 8: Machine Representations of Knowledge (eds E W Elcock and D Michie), John Wylie, April 1977. (Also Memo HPP-75-7, Stanford University, October 1975). Davis R, Buchanan B G, Shortliffe EH, Production rules asa representation for a knowledge-based consultation systen, Artificial Intelligence, Vol 8, No 1 (February 1977). (Also Memo HPP-75-6, Stanford University, October 1975). Shortliffe E H, Davis R, Some considerations for the implementation of knowledge-based expert systems, SIGART Newsletter, 55:9-12, December 1975. Scott A C, Clancey W, Davis R, Shortliffe E H, Explanation capabilities of knowledge based production systems, American Journal of Computational Linguistics, Microfiche 62, 1977. (Also Memo HPP-77-1, Stanford University, February 1977). Wraith S, Aikins J, Buchanan BG, Clancy W, Davis R, Fagan L, Scott A C, van Melle W, Yu V, Axline S, Cohen S, Computerized consultation system for the selection of antimicrobial therapy. American Journal of Hospital Pharmacy 33:1304-1308 (December 1976). B.G. Buchanan, R. Davis, V. Yu and S, Cohen, “Rule Based Medical Decision Making by Computer,’ (To appear in Proceedings of MEDINFO.77, 1977). Davis R, Applications of Meta Level Knowledge to the Construction, Maintenance, and Use of Large Knowledge Bases. Memo HPP-76-7, Stanford University, June 1976. Davis R, Meta rules: content directed invocation, to appear, Proc ACM Conf. on AI and Programming Languages, August 1977. Davis R, Knowledge acquisition in rule-based systems: knowledge about representations as a basis for system construction and maintenance, to appear, Proc. Conf. on Pattern- directed Inference Systems, May 1977. Davis R, Interactive transfer of expertise: acquisition of new inference rules, to appear, Proc. Fifth TJCAI, August 1977. Davis R, A decision support system for medical diagnosis and therapy selection, in "Data Base" (SIGBDP Newsletter), 8:58 (Winter 1977). 46 D COMMUNICATION Sec. 10.1 S.N.Cohen a> Davis R, Buchanan B G, Meta level knowledge: overview and applications, to appear, Proc. Fifth IJCAI, August 1977. 10.2 OTHER REFERENCES {1} Reiman H H, D’ambola J, The use and cost of antimicrobials in hospitals, Arch Environ Health, 13:631-636 (1966). [2] Kunin C M, et.al., Use of antibiotics: a brief exposition of the pro!:lem and some tentative solutions, Anns Int Med, 79:555-560 (1973). (3] Sheckler W E, Bennett J V, Antibiotic usgae in seven community hospitals, J Amer Med Assoc, 213:264-267 (1970). {4] Roberts A W, Visconti J A, The rational and irrational use of systemic antimicrobial drugs, Amer J Hosp Pharm, 29:828- 834 (1972). {5] Simmons H E, Stolley P D, This is medical progress? Trends and consequences of antibiotic use in the United States, J Amer Med Assoc, 227: 1023-1026 (1974), (6] Kagan B M, Fanin SL, Bardie F, Spotlight on antimicrobial agents, JAMA, 226:306-310 (1973). {7] Meyer A U, Weissman WK, Computer analysis of the clinical neurological exam, Computers and Biomedical Research, 3:111-117, (1973). {8] Warner H R, Toronto AF, Veasy LG, Experience with Bayes’s Theorem for computer diagnosis of congenital heart disease, Anns NY Acad Sei, 115:558-567, (1964). {9] Gorry G A, Barnett G 0, Experience with a model of sequential diagnosis, Computers and Biomedical Research, 1:490-507, (1968). . {10] Edwards W, N = 1, diagnosis in unique cases, Computer Diagnosis and Diagnostic Methods, (Jacquez, ed.), pp 139- 151, C C Thomas, Springfield, Illinois, (1972). {11} Silverman H, A Digitalis Therapy Advisor, MAC-TR-143, EE Department, Mass, Inst. Tech,(1974). (12] Kulikowski CA, Weiss S, Safir A, Glaucoma diagnosis and therapy by computer, Proc Annual Meeting of Assn for Research in Vision and Opthalmology, (May 1973). {13] Kunin C M, Tupasi T, Craig WA, Use of Antibiotics: a brief 47 wt‘ COMMUNICATION Sec. 10.2 S.N.Cohen exposition of the problem and some tentative solutions, Annals Int Med, 79:555-560, Oct 1973. [14] Davis R, Applications of Meta Level Knowledge to the Construction, Maintenance, and Use of Large Knowledge Bases. Memo HPP-76-7 Stanford University, July 1976. (15] Raiffa H, Decision analysis: introductory lectures on choices under uncertainty, Addison Wesley, 1968. 48