MOLGEN Project Section 9.1.4 Martin N., Friedland P. ' King J., Stefik M.J., Knowledge Base Management for Experiment Planning in Molecular Genetics, Fifth International Joint Conference on Artificial Intelligence. 882-887 (August 1977) Stefik M., Friedland P., Machine Inference for Molecular Genetics: Methods and Applications, Proceedings of the National Computer Conference, (June 1978) Stefik M.J., Martin N., A Review of Knowledge Based Problem Solving As a Basis for a Genetics Experiment Designing System, Stanford Computer Science Department Report STAN-CS-77-596. (March 1977) Stefik M., Inferring DNA Structures From Segmentation Data: A Case Study, Artificial Intelligence 11, 85-114 (December 1977) Stefik, M., An Examination of a Frame-Structured Representation System, Proceedings Sixth International Joint Conference on Artificial Intelligence, 844-852 (August 1979) Stefik, M., Planning with Constraints, Ph.D. Thesis, Stanford CS Report CS80-784 (March 1980) E. Funding Support The MOLGEN grant is titled: MOLGEN: A Computer Science Application to Molecular Biology. It is NSF Grant MCS 78-02777. Current Principal Investigators are Edward A. Feigenbaum, Professor of Computer Science and Laurence H. Kedes, Investigator, Howard Hughes Medical Institute and Associate Professor of Medicine. The new grant (September 1980) will add Bruce G. Buchanan, Adjunct Professor of Computer Science, and Douglas Brutlag, Associate PRofessor Biochemistry as Co-PI's. MOLGEN is currently funded from 12/79-11/80 at $153,959 including indirect costs and has had a total funding from 6/78-3/81 at $294,476 including indirect costs. TI. INTERACTIONS WITH THE SUMEX-AIM RESOURCE All system development has taken place on the SUMEX-AIM facility. The facility has not only provided excellent support for our programming efforts but has served as a major communication Tink among members of the project. Systems available on SUMEX-AIM such as INTERLISP, TV-EDIT, and BULLETIN BOARD have made possible the project's programming, documentation and communication efforts. The interactive environment of the facility is especially important in this type of project development. We have taken advantage of the collective expertise on medically- oriented knowledge-based systems of the other SUMEX-AIM projects. In addition to especially close ties with other projects at Stanford, we have greatly benefitted by interaction with other projects at yearly meetings and through exchange of working papers and ideas over the system. The combination of the excellent computing facilities and the instant communication with a large number of experts in this field has been a E. A. Feigenbaum 176 Privileged Communication Section 9.1.4 . MOLGEN Project. determining factor in the success of the MOLGEN project. It has made possible the near instantaneous dissemination of MOLGEN systems to a host of experimental users in laboratories across the country. The wide-ranging input from these users has greatly improved the general utility of our project. We find it very difficult to find fault with any aspect of the SUMEX resource management. It has made it easy for us to expand our user group, to give demonstrations (through the 20/20 adjunct system), and to disseminate software to non-SUMEX users overseas. We do find that we are running moderately close to machine capacity both in size and in speed since our user group has been rapidly expanding during the last year. TIT. RESEARCH PLANS A. Project goals and plans We have proposed further MOLGEN research in several broad categories: representation, planning, knowledge base development, and immediate applications to molecular biology. As would be expected, there will be much interaction among those ganeral areas. Representation As part of the MOLGEN effort, a new representation package, the Units System, has been developed and tested. Its basis was mainly theoretical; we now have the opportunity to improve it from the practical considerations of a targe knowledge base containing many different types of information. We expect to learn which features are important and which are window- dressing. These findings will increase in importance as many other problem-solving systems using large domain-specific knowledge bases are developed. The MOLGEN knowledge base will serve as a laboratory for this research. Among the issues we would like to explore are: 1. MOLGEN currently uses the hierarchy representation features of the Units System for both acquisition and design. Will this continue to be practical as the knowledge base grows, or will the two representation functions have to be divorced? 2. The Units System allows different types of knowledge, e.g. numbers and nucleic acid sequences, to be described and stored in different manners. How much diversity is useful, both from the viewpoint of the representation system and from the viewpoint of the user? 3. Will new features become necessary to make large knowledge bases "perusable” by the human expert describing his domain? Is there some point at which graphics are needed for the expert to have a good grasp of what the system already knows? Privileged Communication 177 E. A. Feigenbaum MOLGEN Project Section 9.1.4 Planning Both of the two problem solving methods developed in MOLGEN have shown promise. We plan to keep pushing their development until we know their respective limitations and until a practical laboratory tool results. As was previously mentioned, we will combine the two planning methods to produce a system which should produce substantially higher performance than either of its two components. The current experiment design systems are not designed to take an already existing laboratory plan and determine if the plan will satisfy some stated goal. We have proposed using the knowledge base to simulate the result of applying each step of a plan in succession to see if the experiment goal really would be achieved. This sort of a plan verifier will serve to take scientist-designed plans and provide guidance on whether the plan will work before it is actually tried in the laboratory. The plan verifying system will be extended to become first a plan optimizing system and then a plan debugging system. Plan optimization will involve both domain-specific heuristics about how particular steps interact and domain-free heuristics about what good experiment designs should took like. The plan optimizer will make minor changes and introduce subgoals in order to take an already working experiment design and make it more efficient, convenient, reliable, or inexpensive. The knowledge base already contains most of the raw information humans use to make optimization decisions. The research is in developing the proper methods to make automated use of this knowledge. Plan debugging means taking a partially working experiment design and finding and fixing any errors in it. This involves aspects of both verification and optimization as well as new error-correction heuristics. According to Feitelson and Stefik, the serendipity of the experimental laboratory also contributes greatly to plan debugging. Extending the MOLGEN design systems to become execution monitoring systems that can note and take advantage of this serendipity will be a major research effort of about thesis level in magnitude. Knowledge Base Acquisition and Development The current MOLGEN knowledge base is the result of over a man-year of effort by Professors Douglas Brutlag and Laurence Kedes and Drs. Peter Friedland and John Sninsky. It will continue to grow and improve throughout the term of the new proposal with the full time work of Dr. Rene' Bach. By the end of the period covered in the proposal the knowledge base will be in itself a useful tool for teaching, information retrieval, and sequence analysis. It will be expert in some of the most important areas of molecular biology. It will be especially proficient in those judgmental heuristics that guide technique selection as an experiment is being designed. A major new research goal is to provide a facility for self- improvement of the knowledge base. When the design system produces a plan that is especially efficient or innovative, it would be useful to E. A. Feigenbaum 178 Privileged Communication Section 9.1.4 MOLGEN Project generalize and save that plan so that it can drive future problem-solving without having to be reinvented. The generalization and learning process has roots in the MACROPS work in STRIPS. Having such a capability would mean that the experiment design system would be a learning system, able to continuously improve it knowledge base. There are two main research questions inherent in the problem: how to recognize when a plan is worth saving, and how to decide how general to make it while still retaining its utility. There are several possible measures of plan "worthiness." One would be whether the plan performed dramatically better than previous plans (e.g. it may have decreased the time to perform an experiment by an order of magnitude). Another would be related to how difficult it was for the system to create the plan. In other words, the plan should be saved because it would take a tong time to find it again. The question is an experimental one; the research will involve trying many heuristics and balancing the improvement in system planning performance against the growth of an unwieldy and overly constrained knowledge base. The question of how general to make the plan and how to parameterize it should also be solved experimentally. There will be trade-offs between how frequently the plan is used and what percentage of the time it will lead to a useful instantiated experiment design. Another research goal is to use the knowledge base and experiment design system as a testbed for an automated performance evaluation system. The goals of such a system are quite general: to determine exactly how well the system is making use of the knowledge base, and how suitable the knowledge base is for the task at hand. Among the specific questions a performance evaluation system for MOLGEN might answer are: 1. Is the system overlooking skeletal plans that it should find? 2. Is it neediessly considering many poor alternative plans? 3. Is it poorly modelling the consequences of plan steps? 4. In what areas of the knowledge base are decision heuristics weak or missing? 5. What types of knowledge are hardly ever being used? All of these questions should be generalizable to many other knowledge-based problem-solving systems. Since the construction of large, expert knowledge bases is such a difficult task, the feedback from the evaluation of the use of these knowledge bases will be invaluable to future system builders. Privileged Communication 179 E. A. Feigenbaum MOLGEN Project Section 9.1.4 Applications to Molecular Biology The direct applications of MOLGEN to the field of molecular biology fall into three categories: knowledge base development and experiment design, analysis of nucteic acid sequences, and miscellaneous tools. Knowledge Base Development and Experiment Design The original and principal goal of the MOLGEN project is to provide a sophisticated experiment planning program containing an extensive knowledge base in the domain of molecular biology. As described above, our progress towards this goal has succeeded in the development of an extensive outline of this broad domain with emphasis on the myriads of analytical laboratory techniques that exist in this field. Using this knowledge base, MOLGEN is now capable of designing a number of sophisticated analytical experimental procedures. The procedures designed by the system are those already utilized in the laboratory, indicating that the knowledge base contains the correct sorts of heuristics to produce at least competent experiment designs. The limited scope of the current knowledge base provides a constraint on the originality of plans that can be produced; the most novel plans designed by humans are those which draw from many different, perhaps unrelated, knowledge sources. Another success of the knowledge base concerns the organization of the information about each experimental technique. Because of the great flexibility of the Unit System, it is easy for the domain experts to modify and expand the existing information about each entity. . We are continuously fine tuning the type of information contained within the knowledge base, in both content and in organization, during the actual knowledge acquisition phase. . We now propose to attack problems in synthetic molecular biology. We feel that by focusing our efforts on this subject we can assure an extensive repertoire of knowledge for that particular type of problem. This will also allow the planning algorithms to develop more sophisticated plans in the particular area. We have chosen to develop a knowledge base dedicated to the problem of cloning specific genes by recombinant DNA techniques. We have chosen this problem for four reasons: it is one of the most widely used methods in molecular biology today; most of our existing knowledge base is relevant to this problem; both of our current planning algorithms have been successful on either this problem (Stefik's thesis) or closely related problems of analysis of recombinant DNAs (Friedland's thesis); and because the method can be readily divided into four limited Subdomains. These include choice of vectors, method of linking foreign DNA to the vector, transformation of host cells with the recombinant DNAs, and selection of the recombinant DNA containing the gene of interest. We will describe current methods for cloning genes in both eukaryotes and prokaryotes, using methods in which one can select either for the vector or the inserted gene, and we will describe all the known methods of selecting for genes including direct functional selection, hybridization methods and expression of specific gene products. In addition to specifying the starting population or DNA sample and the ultimate goal, we will allow the user to specify certain subgoals or substrategies. E. A. Feigenbaum 180 Privileged Communication Section 9.1.4 MOLGEN Project. Analysis of Nucleic Acid Sequences Our goal is to provide powerful, but easily used programs for the problem of the recognition of biologically significant patterns within nucleotide sequences. To make a set of programs both powerful and easy for a novice to use they must be interactive, self-documenting, and have easy to understand output formats. It also helps tremendously if they are very rapid so that they may be utilized online with nearly instantaneous feedback concerning the progress of the comparison. For this reason we have chosen to utilize the search algorithm developed by Korn and Queen and to convert it to an interactive form. This program was originally designed to provide for speed of comparison of very long nucleotide sequences while still allowing a degree of sophistication within the matching procedure. The algorithm compares two sequences beginning at every position where they share at least a dinucleotide but only carries the comparison as far as certain criteria of matching are allowed. This method, while lacking the sophistication of algorithms that potentially simulate evolutionary steps in the divergence of two sequences or the energetics of the pairing of single-stranded regions of dyad symmetry, is capable of detecting all statistically significant homologies or dyad symmetries given any level of significance desired. Unfortunately it is not capable of comparing more than two sequences at a time nor giving a quantitative measure of the divergence or relatedness of those two sequences. It merely describes the probability of each homology in terms of that expected for a random sequence of a given tength and base composition. Our improvements to the program have included converting it into SAIL and making it interactive. Whenever a user is in doubt about the next step he merely enters a ? and his options at that point are explained. We have also considerably improved the statistical calculations so that the probabilities and expectation frequencies that are determined for a homologous region are based not only on the length of the sequences being compared, but also on the base composition and on the exact algorithm being used in the search itself. Finally we have markedly improved the output displays so that that mismatches are indicated with stars and base pairs in dyad symmetries with bars. We have done all of this without any overhead in terms of execution time so that the program executes almost without delay in a time-sharing environment. We propose to improve our current sequence analysis capabilities by implementing more sophisticated algorithms within the interactive framework. For instance the pattern recognition algorithm of Sellers is currently being implemented in C language at Rockefeller University by Dr. Bruce Erickson. We believe that this program would be a useful addition to our current armory in that it would allow us an accurate metric of relatedness of two sequences which is essential for building phylogenetic trees. This would be the first step towards the comparison of more than one sequence. We would also like to develop methods for determining the secondary structure of single-stranded RNAs. The most commonly used methods are aften limited to short nucleotide regions because of the complexity of the energy calculations for large numbers of comparisons. By first utilizing a Privileged Communication 181 E. A. Feigenbaum MOLGEN Project Section 9.1.4 rapid method for finding homologous sequences or dyad symmetries, perhaps guided by statistical significance of very low stringency, one might be able to rapidly eliminate most of the fruitless comparisons. By then examining the resulting culled homologies by a set of heuristics concerning their additivity, extension, or exclusiveness, we could order them in terms of their biological significance. This would automate some of the tedious cutting and patching of homologies and dyad symmetries in which molecular biologists are now involved even after they have made comparisons with a computer. With respect to calculations of the thermal stability of symmetric regions it would reduce the total time of calculation by orders of magnitude. In other words, we would use a comparison algorithm based more on biological intuition than calculation in order to find the most profitable regions to apply the more quantitative methods of biophysics. We would further hope to automate the development of phylogenetic trees utilizing these sequence comparison algorithms. Once quantitative measures of relatedness are obtained in all pairwise combinations, then the matrix methods for the generation of the trees and the lengths of the branches is rather straightforward. These calculations are not likely to need any intelligent heuristics for their determination since they are defined analytically and they are rapid compared to the calculations involved in determining the relatedness of the sequences in the first place. Miscellaneous Tools Restriction Digest Analysis One of the best examples of the utility of the application of heuristics and production rules to problems of molecular biology is the GA1 program, developed in this project, for the analysis of restriction endonuclease digests. Determining restriction maps of even simple DNA structures from restriction enzyme digest data can require consideration of millions of possible structures. The application of heuristic methods simplifies the analysis by orders of magnitude allowing solutions to complex problems and even simplifying the amount of data that must be collected to ensure a unique solution. These methods have even resulted in the proposal of a new experimental method for the analysis of restriction data. GA1 is a program which determines all possible organizations of - restriction fragments based on restriction endonuclease digests with single, double, and triple combinations of enzymes. The program contains an intelligent hypothesis generator and a set of production rules which allow it to generate and evaluate hypothetical restriction maps which are consistent with atl of the data. These rules dramatically reduce the total number of possible structural candidates that must be both generated or evaluated. Modern laboratory methods for determining restriction maps include end labeling procedures and two dimensional cross hybridization procedures, In order to extend the program GA1 to cover this kind of data we propose to E. A. Feigenbaum 182 Privileged Communication Section 9.1.4 MOLGEN Project be able to set up initial constraints on the locations of all restriction sites in certain local regions of the hypothetical restriction map. Such initial conditions (regional constraints) would be useful not only for entering data obtained from partial digestion of end labelled DNA segments, but would also be very useful if the complete nucleotide sequence were known for a particular region. Such conditions are often found in recombinant DNAs in which the nucleotide sequence of the vector is completely knowr. Another improvement in GAi which would both simplify and extend its use would be to allow the user to describe the complete restriction map determined previously for a limited number of restriction enzymes and then to enter digestion data for new enzymes, singly and in combination with the previously analyzed sites. These initial conditions would impose global constraints over the entire map. Global constraints will not be as readily implemented as the regional constraints described above. If sufficient programming support is available we would also Vike to attempt to apply the hypothesis generating and production rule pruning approach to the analysis of two dimensional restriction data. In this method, radioactively labeled DNA segments generated from a DNA by a one restriction enzyme are hybridized to nonradioactive fragments generated by a second restriction enzyme thus indicating which pairs of fragments are homologous and hence overlapping. Currently the typical analysis is a data driven approach of finding a continuous path among all the overlapping DNA fragments cataloged by this experimental procedure. A model driven approach should extend this already powerful method. While the two dimensional cross-hybridization method only allows the generation of maps for two enzymes at a time, maps generated from all possible pairwise combinations of any set of enzymes are possible by analogy with the Standard one dimensional method. Furthermore, by alternately labeling the fragments from either restriction enzyme and hybridizing those fragments to unlabeled fragments derived from the second enzyme in both directions, sufficient data should be obtained in order to overcome most mapping ambiguities which are usually the downfall of this method. Utilization of the model driven approach to the cross-hybridization procedure will also allow the generation of restriction maps of much Tonger DNAs than currently possible. Synthesis of Specific Nucleic Acid Molecules The MOLGEN knowledge base contains complete sequence information for all published and many unpublished nucleic acid molecules. It also knows about restriction endonucleases and their cutting sites and about ligation methods for rejoining nucleic acid fragments. We see potential use for this knowledge in designing synthetic pathways for the in vitro production of specific target molecules. This may actually be considered a part of the main experiment design effort, but the problem is important enough to make an independent specialized system desirable. Currently, three major methods are used by molecular biologists to select specific sequences of interest from a recombinant DONA "library". The most widely used method uses isolated messenger RNA as radiolabeled Privileged Communication 183 E. A. Feigenbaum MOLGEN Project Section 9.1.4 probe to detect complementary DNA sequences in the recombinant molecules. This requires prior isolation of the mRNA which, unfortunately, is not always easily obtained. Secondly, and perhaps having the most long-term potential, are methods to select by expression in the host cell of the sought for functions. Such an approach will necessarily be limited to genes that can be made to supplement or rescue host functions. The problems of expression of eukaryotic genes in prokaryotic hosts may never be soluble because of the gene-splicing dichotomy. The utility of eukaryotic host- vector systems is now established but selection will still depend on prior creation of host mutants or use of immunological colony (or plaque) screening techniques still to be developed, A third approach has been to use relatively short chemically synthesized cligonucleotide segments that are complementary to the gene of interest. The probe is used to select genomic clones of recombinants containing specific protein coding sequences. In theory, if the amino acid sequence is known, appropriate probes can be constructed. The techniques for chemical oligonucleotide synthesis are difficult and laborious. We propose a different approach using the recombinatorics of the computer stored and generated nucleotide sequences of all known DNA moleculas. If the amino acid sequence of the protein whose gene is desired is known, then a computer assisted search through those sequences will attempt to locate oligonucleotides that could code for a short segment of that protein. By taking advantage of third base degeneracy and knowledge of restriction endonuclease cutting and splicing, constructions of natural oligonucleotides will be suggested. An intelligent algorithm might locate more than just one or two short segments capable of forming molecular hybrids with the DNA sequences being sought and these might be linked in a spaced out manner to provide a more powerful probe, B. Justification and requirements for continued SUMEX use. The MOLGEN project is dependent on the SUMEX facility. We have already developed several useful tools on the facility and are continuing research toward applying the methods of artificial intelligence to the Field of molecular biology. The community of potential users is growing nearly. exponentially as researchers from most of the bio-medical fields become interested in the technology of recombinant DNA. We believe the MOLGEN work is already important to this growing community and will] continue to be important. The evidence for this is are already large list of pilot exo-MOLGEN users on SUMEX. SUMEX is currently meeting the research needs of the MOLGEN project adequately. We expect to need more file space as our knowledge bases grow; perhaps an additional 5000 disk blocks in the next few years for that work. Our real difficulties will come in the applications testing of MOLGEN tools. We support with great enthusiasm the acquisition of satellite computers for technology transfer and hope that the SUMEX staff continue to develop and support these systems. One of the oft-mentioned problems of artificial intelligence research is exactly the problem of taking prototypical systems and applying them to real problems. SUMEX gives the MOLGEN project a chance to conquer that problem and potentially supply E. A. Feigenbaum 184 Privileged Communication Section 9.1.4 MOLGEN Project scientific computing resources to a national audience of bio-medical research scientists. Privileged Communication 185 E. A. Feigenbaum MYCIN Project Section 9.1.5 9.1.5 MYCIN Project MYCIN Project Edward. H. Shortliffe, M.D., Ph.D. Department of Medicine Stanford University Medical School Bruce. G. Buchanan, Ph.D. Computer Science Department Stanford University I. Summary of Research Program A. Project Rationale The MYCIN Project is a set of subprojects, each devoted to the development of knowledge-based expert systems for application to medicine and the allied sciences. The project retains the name of our first system, the MYCIN program, but has grown to involve five interrelated sub-projects (MYCIN, EMYCIN, CENTAUR, GUIDON, and ONCOCIN), each of which will be discussed in the sections that appear below. Our first system, MYCIN, is an interactive consultation program which gives physicians antimicrebial therapy recommendations .for patients with infectious diseases. The system must often decide whether and how to treat a patient before definitive laboratory results are available. It must recommend a therapeutic regimen which minimizes the risk of toxic side- effects while covering for ail organisms which are likely to be causing the infection. The relevant knowledge is stored in production rules, and the system currently has rules for treating bacteremias (blood infections) and meningitis. There has already been early work on the codification of cystitis knowledge. The primary goal of the project has been to develop a program which can provide advice similar in quality to that given by a human infectious disease consultant. Formal evaluations of the program's recommendations for patients with bacteremia or meningitis have shown that this goal has been achieved. We have also sought to develop a system that is easy to use and acceptable to physicians. To accomplish this, numerous human engineering features have been incorporated into the consultation. There is also an extensive explanation facility which enables the system to explain its reasoning and to justify its recommendations. The success of the MYCIN program has led us to try to generalize and expand the methods employed in that program to a number of ends: (1) to develop consultation systems for other domains (our generalized system-butlding tool is known as “Essential MYCIN”, or EMYCIN, and has been applied in several new areas); (2) to explore other uses of the knowledge base (our tutoring system, GUIDON, uses the infectious disease knowledge in MYCIN E. A, Feigenbaum 186 Privileged Communication Section 9.1.5 | MYCIN Project. to teach medical students about diagnosis and management of infections); (3} to continue to improve the interactive process, both for the developer of a knowledge-based system, and for the user of such a system (both EMYCIN and our newest system, ONCOCIN, have stressed simplified techniques for interacting with a knowledge base and entering data); and (4) to experiment with using other knowledge representations in conjunction with the production rules used in MYCIN (our CENTAUR system is a modification to EMYCIN which uses prototypical descriptors of situations or disease states to guide and focus a consultative session). B. Medical Relevance and Collaboration The MYCIN program was designed to help alleviate the well-documented problem of antimicrobial misuse. We felt that MYCIN would be clinically useful when it was able to handle all major infections that are likely to be encountered in a hospital. Our success in developing a high performance program for meningitis and bacteremia has been documented in two articles by Dr. Yu listed in the publications section below. However, the system is not ready for clinical use because it does not have rules for the other areas of infectious disease. A very large investment in time and human resources is required to develop, test and formally evaluate a rule set for each major infection area. By utilizing our EMYCIN system to collaborate on building the PUFF program, however, we learned that it is possible in a short period of time to develop a clinically useful consultation system using the domain- independent parts of MYCIN. EMYCIN has since been applied in a number of additional medical domains outlined below. Although EMYCIN was not used to build our new ONCOCIN program, the lessons learned in building prior production rule systems have allowed us to create a large oncology protocol Managenent system in only eight months. Furthermore, we expect to have ONCOCIN used by Stanford oncologists before the end of 1980. Finally, there is a growing realization that medical knowledge, originally codified for the purpose of computer-based consultations, may be utilized in additional ways that are medically relevant. Using the knowledge to teach medical students is perhaps foremost among these, and GUIDON continues to focus on methods for augmenting clinical knowledge in order to facilitate its use in a tutorial setting. C, Highlights of Research Progress MYCIN Due to the departure of Dr. Victor Yu, the infectious disease expert who worked with us until recently, it has not been possible to expand the rule set into new areas of infectious disease. The 500 rules relating to Privileged Communication 187 E. A. Feigenbaum MYCIN Project Section 9.1.5 bacteremia and meningitis are sufficiently rich and complex, however, that they serve as a particularly challenging vehicle for testing the new computational methods we are developing. MYCIN is now totally implemented as an EMYCIN system. Hence, our active work on EMYCIN has been thoroughly tested using MYCIN and our extensive library of patient cases. Ongoing efforts to expand MYCIN or prepare it for clinical implementation, however, have been temporarily set aside to allow us to concentrate on the projects below. EMYCIN Much of the work in the past year has been devoted to improving EMYCIN's facilities for allowing a system builder to construct and debug a knowledge base for a consultation system. This has included extensive documentation of the concepts used in EMYCIN consultation systems, the support programs for developing the knowledge base, and features of a working consultation system, A knowledge-base debugging package was developed to assist the system builder in the task of testing, refining, and validating the knowledge base. This package includes: 1) the EMYCIN explanation facility; 2) a program that automatically explains how the system arrived at the results of a consultation; 3) a program that reviews each result of a consultation, allowing the user to judge whether the result is correct, and assisting the user in refining the knowledge base in order to correct any errors noted in the result or in intermediate conclusions; and 4) a program that automatically compares the results of a consultation to stored “correct" results for the same case, and explains any errors in the conclusions. An additional development in the last year is the EMYCIN "rule compiler." Once a consultation program is built, it becomes important that it perform efficiently. This is most noticeable in large programs such as MYCIN. Production rules, while convenient in their modularity, are not the best representation for speedy execution. We have thus developed a rule compiler as part of EMYCIN that transforms a program's production rules into a decision tree, eliminating the redundant computation inherent ina rule interpreter, and compiles the resulting tree into machine code. The program can thereby use an efficient deductive mechanism for running the actual consultation, while the flexible rule format remains available for acquisition, explanation, and debugging. Finally, an extensive EMYCIN user's document has been drafted. ‘This manual is designed to be used by system builders who are creating a consultation system, not by the eventual users of the consultation system itself. EMYCIN Applications Several consultation systems have been written in EMYCIN. ATT but the most recent of these were developed in parallel with EMYCIN, and thus served to focus attention on certain features and shortcomings of the program to guide in its development. Their brief description here is intended to provide some indication of the range of potential applications of EMYCIN. E. A. Feigenbaum 188 Privileged Communication Section 9.1.5 MYCIN Project PUFF The PUFF system performs interpretation of measurements from the pulmonary function laboratory. The project is a collaboration of a pulmonary physiologist, biomedical engineers, and Stanford computer scientists who had previous experience with the MYCIN program. The data from over 1090 cases were used to create some 60 rules diagnosing the presence of pulmonary disease. These rules are used to create a complete report including the input measurements, other patient data, and the measurement interpretation. The system is a separate SUMEX project now, and is described in full elsewhere in this document. HEADMED The HEADMED program is an application of EMYCIN to clinical psychopharmacology. The system diagnoses a range of psychiatric disorders and can recommend drug treatment if indicated. Like PUFF, this project is a separate SUMEX project. SACON As a stronger test of domain independence, EMYCIN was applied to the completely non-medical domain of structural analysis. SACON (Structural Analysis CONsultation) provides advice to a structural engineer regarding the use of a large structural analysis program called Marc. The Marc program uses finite-element analysis techniques to simulate the mechanical behavior of objects. Engineers typically know what they want the Marc program to do, e.g., examine the behavior of a specific structure under expected loading conditions, but they do not know how the simulation program should be set up to do it. The goal of the SACON program is to recommend an analysis strategy; this advice can then be used to direct the Marc user in the choice of specific input data, numerical methods and material properties. The performance of the SACON program matches that of a human consultant for the Jimited domain of structural analysis problems that was initially selected. To bring the SACON program to its present level of performance, about two man-months of the experts’ time were required to analyze their task as consultants and formulate the knowledge base. About the same amount of time was required to implement and test the rules. CLOT A recent application of EMYCIN is CLOT, a system designed to diagnose disorders of the blood coagulation system of patients. It requests clinical evidence regarding an episode of bleeding, facts from the patient's general medical history, and the results of a battery of coagulation screening tests. From these data CLOT infers the presence and type of coagulation defect (if any) in the patient and then proceeds to make a refined diagnosis for any particular enzymatic deficiency or Privileged Communication 189 EE. A. Feigenbaum MYCIN Project Section 9.1.5. platelet defect. These diagnoses can be used by a physician to estimate the severity and cause of a particular episode of bleeding, evaluate the effects of various anti-coagulation therapies on a patient, or estimate the pre-operative risk of a patient having serious bleeding problems during surgery. CLOT was constructed by David Goldman, a medical student at the University of Missouri, with the help of James Bennett, a member of our Stanford group who is very familiar with EMYCIN. Following approximately 10 hours of discussion about the contents of the knowledge base, they entered and debugged in another 10 hours a preliminary knowledge base of some 60 rules. CLOT is now an ongoing project at the University of Missouri. GUIDON Bill Clancey's thesis (August '79) marked the completion of version one of the program. Key results include: (1) A language was developed for representing teaching expertise in the form of "Discourse Procedures"--sequences of rules that reflect dialogue patterns and are independent of the subject material to be taught. This representation was found to be suitable and convenient for incrementally developing a tutorial program. (2) Various teaching methods were demonstrated for carrying on a case method dialogue with a student who is solving a complex diagnostic problem. Meta-knowledge about the representation of the subject material made it possibte to express these Capabilities in a domain independent way. (3) The representation of subject material as modular production rules was studied and found wanting. Though rules conveniently separate relationships into readily accessible associations, an adequate knowledge base for teaching requires the addition of structural knowledge (clusters and patterns), support knowledge (underlying causal mechanisms), and strategical knowledge (managerial approaches). Ongoing GUIDON research focuses on a number of issues: The Student Model. A revised student model has been designed to deal with the following questions: (1) Can the student USE the program? i.e., is he able to enter recognizable input? (2) Is the dialogue with the student COHERENT? i.e., are there recognizable patterns of student input and meaningful transitions between segments of behavior? E. A. Feigenbaum 190 Privileged Communication Section 9.1.5 MYCIN Project (3) Is the student PASSIVE OR ACTIVE? i.e., does he use his own knowledge to solve the problem, or does he rely on the tutor's initiative and ability to provide help? (4) Does the student have a STRATEGY for solving the problem? i.e., is there some plan that organizes the student's data measurements and hypothesis selection? Representation of Problem Solving Strategies. One of the few formalized methods for teaching diagnostic strategies to medical students is a printed outline of data to collect. This outline is woefully inadequate as a teaching tool: it does not convey in itself the meaning or logic of the diagnostic process. Informal experiments with physicians have enabled us to formalize an ideal model of medical diagnostic strategy appropriate to our present domain of investigation (infectious meningitis). Work is underway to incorporate this model in MYCIN so that it "thinks like a clinician," and can thus be used to teach not only diagnostic rules, but human-usable methods for applying them. Some surprising findings coming out of this investigation include the following: (1) Establishing the hypothesis space is accomplished by considering causal links that might be enabled in this patient (called "risk factors"). This can be considered to be a process of determining the topology of the problem--causal connections that may have a bearing on the disorder. (2) “Dropping back” is important to human problem solvers. In fact, hypothesis formation as we have observed it might be described as a process of maintaining a sense of the differential. Focusing and delving deeper is just a temporary phenomenon. Acquisition of this strategical knowledge was greatly helped by analyzing protocols according to the structure/support/strategy framework we have established. This is one of the "knowledge engineering” results of our research, , CENTAUR During the last year we have completed an implementation of PUFF: using the augmented EMYCIN system known as CENTAUR. In this work, largely the effort of Jan Aikins, we have sought to strengthen the pure production rule representation of EMYCIN with additional focusing power provided by hypothesis "frames" or prototypes. CENTAUR now includes 24 prototypes and about 160 rules dealing with pulmonary disease. The system was tested on 100 cases from the files at Pacific Medical Center. CENTAUR agreed with two pulmonary physiologists 84 and 91 per cent of the time respectively on their diagnoses of pulmonary disease in the cases. (This was an improvement over PUFF, which had 74 and 85 per cent agreement with the two physiologists). Privileged Communication 191 E. A. Feigenbaum MYCIN Project Section 9.1.5 Basic AI research issues were also explored, such as the . representation of control knowledge for computer consultations, and the explicit representation of the context in which knowledge is applied. Furthermore, the MYCIN explanation facility was expanded to include explanations of control processes, and to give explanations of the prototypes, as well as the rules. Current CENTAUR research is concentrating on polishing and fine- tuning the PUFF implementation described above. Additional studies are contemplated to better define the precise reasons that CENTAUR has performed more accurately than PUFF on the 100 cases mentioned above. One expert collaborator, Dr. R. Fallat feels PUFF had performed less well because of the significant difficulties he has had in adding more rules and still keeping the knowledge base consistent. This was less difficult using the CENTAUR representation scheme. Other research that will draw upon CENTAUR work includes the creation of additional applications systems using the CENTAUR prototype representation mechanism. One challenge will be to interface CENTAUR with the “context-tree” that is provided in EMYCIN, a problem that was not addressed in PUFF because it utilizes only a single context. ONCOCIN The oncology protocol management system, termed ONCOCIN after its domain of expertise and its historical debt to the MYCIN program, has achieved many of its early goals since work on the project began in July 1979. We are developing an interactive system to be used by oncology faculty and fellows in the Debbie Probst Oncology Day Care Center at Stanford University Medical Center. Our overall? goals are: (1) to demonstrate that a rule-based consultation system with explanation capabilities can be usefully applied and gain acceptance in a busy clinical environment; (2) to improve the tools currently available, and to develop new tools, for building knowledge-based expert systems for medical consultation, and (3) to establish both an effective relationship with a specific group of physicians, and a scientific foundation, that will together facilitate future research and implementation of computer-based tools for clinical decision making. The ONCOCIN research goats are directed both towards the basic science of artificial intelligence and towards the development of clinically useful oncology consultation tools. We have undertaken AI research with the following aims: (1) to implement and evaluate recently developed techniques designed to make computer technology more natural and acceptable to physicians; E. A. Feigenbaum 192 Privileged Communication Section 9.1.5 . MYCIN Project” (2) to extend the methods of rule-based consultation systems to interact with a large database of clinical information; and (3} to continue basic research into the following problem areas: mechanisms for handling time relationships, techniques for quantifying uncertainty and interfacing such measures with a production rule methodology, approaches to acquiring knowledge interactively from clinical experts, assessment of knowledge base completeness and consistency. Our simultaneous clinical goal is to develop and implement a protocol management system, for use in the oncology day care center, with the following capabilities: (1) to assist with identification of current protocols that may apply to a given patient; (2) to assist with determining a patient's eligibility for a given protocol; (3) to provide detailed information on protocols in response to questions from clinic personnel; (4) to assist with chemotherapy dose selection and attenuation for a given patient; (5) to provide reminders, at appropriate intervals, of follow-up tests and films required by the protocol in which a given patient is enroijiled; (6) to reason about managing current patients in light of stored data from previous visits of (a) the individual patients, or (b) the aggregate of all "Similar" patients. Buring the first year of our research, it has been our aim to develop a prototype of the ONCOCIN consultation system, drawing from the programs and capabilities of EMYCIN. We have also analyzed carefully the day-to-day activities of the Stanford oncology clinic in order to determine how to introduce ONCOCIN with minimal disruption of an operation which is already running smoothly. Finally, we have spent much of our time considering the most appropriate mode of interaction with physicians in order to optimize the chances for ONCOCIN to become a useful and accepted tool in this specialized clinical environment. We chose the series of protocols for Hodgkin's and non-Hodgkin's lymphoma as the first detailed knowledge to be encoded in the ONCOCIN system. These were selected because they were developed at Stanford, because they are among our most commonly used protocols in light of our position as a major lymphoma treatment center, and because the protocols are complicated, with many subtle details depending upon the stage of disease, concomitant or preceding radiotherapy, and evidence for drug toxicity. Privileged Communication 193 E. A. Feigenbaum MYCIN Project Section 9.1.5 Although the program will eventually be used on a high-speed terminal with a specially designed interface (see below), we decided that the initial prototype should be a self-contained consultation system that would be modeled on the form of interaction used for EMYCIN consultation systems, We chose not to use EMYCIN itself to build the system, however, because we quickly encountered several special needs that were better handled using alternate representation and control schemes. Therefore, although there are portions of the EMYCIN code that we have been able to borrow, ONCOCIN is an entirely new program in which production rules are only one of several types of knowledge representation used. Both our own experience, plus evidence in the medical computing literature, have suggested that physicians will be unlikely to use consultation systems if they fail to fit smoothly in the day's normal routine. With this in mind, we have carefully studied the current organization and flow of information within Stanford's oncology clinic. A detaited document has been prepared which describes the current clinic organization and the ways in which our system will interact with the current routine. Two principal concerns have been: (1) that ONCOCIN should initially have minimal impact on the current daily routine: record-keeping systems should not be altered, patient flow within the clinic should be unchanged, and the physicians working there should not be forced to depend on an operational computer system in order to get their work done; (2) that it should not take any EXTRA effort on the physicians' part for them to use the ONCOCIN system (other than the initial time required while they are trained how to use it); this implies that the use of ONCOCIN should replace some task that the physicians are currently doing. Currently the clinic physicians are asked to fill out, by hand, the time-oriented flowsheets that are kept in the patient clinic records. These sheets are the basis for data analysis of all the clinical research that is based on chemotherapy protocols in the oncology clinic. Al} information needed by ONCOCIN is entered on this flowsheet. Thus we intend to capture the data needed for an ONCOCIN consultation by having the physician fill out the flowsheet at a computer terminal rather than by hand. The actual mechanics of computer terminal interaction is as important to a clinical system's acceptance as the quality of the program's advice. If a system is slow or cumbersome, physicians will tend to reject it. With this in mind, we have sought to develop an optimal interactive mechanism that will not unreasonably tax the budget of the project. First we have decided to use high-speed CRT terminals (approximately 9600 baud) with auxiliary hard-copy devices. This will permit almost instantaneous screen filling and aliow greater flexibility in the design of what is actually displayed. However, a program written in a powerful but Stow language like INTERLISP is not able to service a high-speed terminal E. A. Feigenbaum 194 Privileged Communication Section 9.1.5 MYCIN Project adequately. For this reason, our interface program will be written in a faster compiled language (we are using PASCAL), and this program will need to communicate in turn with the INTERLISP reasoning program that comprises the rest of ONCOCIN. The design of this interprogram interaction is largely complete, but actual implementation of the ideas is just beginning. Second, we want to minimize typing by the physician. EMYCIN systems have required a typewriter-compatible keyboard, but we do not feel this is reasonable if ONCOCIN is to be used on a daily basis by a large number of oncologists. Initially we examined light-pen and touch-screen technologies, but feel that these are either too expensive or too unreliable. Ultimately, working closely with experts in human factors, we developed a customized 21-character keypad which has been interfaced with a Datamedia terminal similar to those we have used for other development work. This keypad can be used by the physician to fill out the patient's flowsheet (which will be disptayed on the screen at high speed), and there should be minimal if any need to use the terminal keyboard itself. Finally, we want to maintain the explanation and justification capabilities which we have argued are crucial to the acceptance of clinical consultation systems. A specialized split-screen display has been designed which will enable the physician to enter patient data entries in one region while pertinent explanations are displayed in another. D. Publications Since January 1979 Kunz, J.C., Fallat, R.J., Mcclung, D.H., Votteri, B.A., Aikins, J.S., Nii, H.P., Fagan, L.M, Feigenbaum, E.A. Physiological rule-based system for interpreting pulmonary function test resuits. Memo HPP~78-154, Stanford Heuristic Programming Project, 1978. Also Proceedings of Computers in Critical Care and Pulmonary Medicine, IEEE Press, 1979. Yu, V.L., Buchanan, B.G., Shortliffe, E.H., Wraith, S.M., Davis, R., Scott, A.C., Cohen, S.N. Evaluating the performance of a computer-based consultant. Comput. Prog. Biomed. 9,95-102 (1979). Clancey, W.J. Tutoring rules for guiding a case method dialogue. Int. Je of Man-Machine Studies 11,25-49 (1979). Clancey, W.J. Dialogue management for rule-based tutorials. Proceedings of the 6th Inti. Joint Conf. on Artificial Intelligence, pp. 155-161, August 1979, Aikins, J.S. Prototypes and production rutes: an approach to knowledge representation for hypothesis formation. Proceedings of the 6th Intl. Joint Conf. on Artificial Intelligence, Tokyo, Japan, August 1979, Fagan, L.M., Kunz, J.C., Feigenbaum, E.A., Osborn, J. J. Representation of dynamic clinical knowledge: measurement interpretation in the intensive care unit. Proceedings of the 6th Intl. Joint Conf. on Artificial Intelligence, Tokyo, Japan, August 1979. Privileged Communication 195 E. A. Feigenbaum MYCIN Project Section 9.1.5 van Melle, W. A domain-independent production-rule system for consultation programs. Proceedings of the 6th IJCAI, August 1979. Shortliffe, E.H., Buchanan, B.G., and Feigenbaum, E.A. Knowledge engineering for medical decision making: a review of computer-based clinical decision aids. Proceedings of the IEEE, 67:1207~1224 (1979). Yu, V.L., Fagan, L.M., Wraith, S.M., Clancey, W.J., Scott, A.C., Hannigan, J.F., Blum, R.t., Buchanan, B.G., Cohen, S.N. Antimicrobial selection by a computer -- a blinded evaluation by infectious disease experts. J. Amer. Med. Assoc. 242:1279-1282 (1979). Shortliffe, E.H. Medical consultation systems: designing for doctors. To appear in Communication With Computers (M. Sime and M. Fitter, eds.), London: Academic Press, 1980. Shortliffe, E.H. The computer as clinical consultant (editorial). Arch. Int. Med, 140:313-314 (1980). Fagan, L.M., Shortliffe, E.H., and Buchanan, B.G. Computer-based medical decision making: from MYCIN to VM, Automedica, March 1980 (in press). Shortliffe, E.H. Clinical knowledge engineering: the MYCIN Project. Proceedings of the First Japanese Conference on Artificial Intelligence in Medicine, pp. 1-8, Tokyo, Japan, August 1979. Clancey, W.J. Transfer of Rule-Based Expertise through a Tutorial Dialogue. Computer Science Doctoral Dissertation, Stanford University, August 1979. Shortliffe, E.H., Buchanan, B.G., and Feigenbaum, E.A. Knowledge engineering for infectious disease therapy selection. Proceedings of the Intl. Conf. on Cybernetics and Society, Denver, Colorado, October 1979. Clancey, W.J., Shortliffe, E.H., and Buchanan, B.G. Intelligent computer- aided instruction for medical diagnosis. Proceedings of the Third Annual Symposium on Computer Applications in Medical Care, Silver Spring, Maryland, October 1979. Fagan, L.M., Kunz, J.C., and Feigenbaum, £.A. Representation of dynamic clinical knowledge: measurement interpretation in the intensive care unit. Proceedings of the Third Annual Symposium on Computer Applications in Medical Care, Silver Spring, Maryland, Cctober 1979. Bennett, S.W., and Scott, A.C. Computer-assisted customized antimicrobial dosages. Amer, J. Hosp. Pharm. 37:523-9 (1980). Shortliffe, Edward H. Consultation systems for physicians: the role of artificial intelligence techniques (invited paper). Proceedings of the 3rd Annual Meeting of the Canadian Society for the Computer Simulation of Intelligence, Victoria, British Columbia, May 1980, E. A. Feigenbaum 196 Privileged Communication Section 9.1.5 MYCIN Project E. Funding Support Grant Title: "Research Program: Biomedical Knowledge Representation" Principal Investigator: Edward A. Feigenbaum Co-Principal Investigator (ONCOCIN Project): Edward H. Shortliffe Agency: National Library of Medicine ID Number: 1 P01 LM 03395 Term: July 1979 to June 1984 Total award: $497,420 Current award (1979-1980): $99,484 Grant Title: "Knowledge-Based Consultation Systems" Principal Investigator: Bruce G. Buchanan Agency: National Science Foundation ID Number: MCS~7903753 Term: Juty 1979 to June 1980 (plus 6 months) Total award: $146,152 Current award (1979-1980): $73,659 Contract Title: "Exploration of Tutoring and Problem-Solving Strategies” Principal Investigator: Bruce G. Buchanan Agency: Office of Naval Research and Advanced Research Projects Agency (joint) ID number: N0Q0014-79-C-0302 Term: March 1979 to March 1982 Total award: $396,326 Grant Title: "Symbolic Computation Methods For Clinical Reasoning" (RCDA) Principal Investigator: Edward H. Shortliffe Agency: National Library of Medicine ID Number: NIH 1K04 LM00048 Term: July 1979 to June 1984 Total award: Dollar amount negotiated annually Current award (1979-1980): $39,285 Grant Title: "Explanatory Patterns In Clinical Medicine” Principal Investigator: Edward H. Shortliffe Agency: Kaiser Family Foundation Term: July 1979 to December 1980 Total award: $20,000 II. Interaction With the SUMEX-AIM Resource A. Medical Collaborations and Program Dissemination Via SUMEX A great deal of interest in both MYCIN and EMYCIN have been shown by the medical and academic communities. For two years in succession we have been invited by the American College of Physicians to demonstrate MfCIN at the organization's annual meeting (San Francisco, March 1979, and New Orleans, April 1980). The physicians have uniformly been enthusiastic Privileged Communication 197 E. A. Feigenbaum MYCIN Project Section 9.1.5 about the program's potential and what it reveals about one current approach to computer-based medical decision making. In both cases, the demonstrations were performed on-line using network access to the SUMEX computer. There has also been significant growing interest in medical AI and MYCIN from colleagues in Japan. We were asked to demonstrate MYCIN from Tokyo during the 6th International Joint Conference on Artificial Intelligence held in August 1979. Access to SUMEX via a trans-Pacific TYMNET link worked very well and permitted large numbers of Japanese and other conference attendees to observe MYCIN demonstrations and experiment with the program themselves. Then, for three weeks in November 1979, Dr. Shortliffe returned to Japan as a visitor at the Tokyo Metropolitan Institute of Medical Sciences. This visit permitted an intensive period of exchange regarding MYCIN, EMYCIN, and the related work being done by the Japanese. Several teachers have aiso asked to use MYCIN in their computer science or medical computing courses. For example, Prof. Carl Page of Michigan State University, Dr. Peter Szolovits of MIT, and Dr. Steven Zucker of McGill University in Montreal have demonstrated the MYCIN program in their university classes. Dr. Harold Goldberger of MIT made extensive use of the MYCIN program in his study of medical AI programs. Dr. Ves Morinov of the Norwegian Computing Center has used the MYCIN program to demonstrate the benefits of using a rule-based representation for consultation systems. Dr. Martin Epstein used MYCIN as one of the representative systems he demonstrated to students who took the clinical elective on medical computing at the NIH during the summer of 1979. GUEST users who have recently requested access to MYCIN have come from such diverse locations around the country as the Brain Research Institute (UCLA), University of. Texas, Stevens Institute of Technology, University of New Mexico, Columbia University, Systems Science Institute {Louisville), Naval Postgraduate Institute (Monterey, Ca.), Texas Women's University, IBM Scientific Labs, and Alta Bates Hospital (Oakland, Ca.). EMYCIN has also generated a great deal of interest in the academic and business communities. We have been in frequent contact with Bud Frawley and Philippe Lacour-Gayet of Schlumberger, Chuck Brodnax and Milt Waxman of the Hughes Aircraft Corporation, and Harry Reinstein from IBM Scientific Research Center. Two students at the Naval Postgraduate School in Monterey, working under the direction of Colonel Ronald J. Roland, have been developing an EMYCIN system in the domain of selecting decision aids for solving problems in business organizations. The CLOT system mentioned earlier was a joint effort involving members of our group but with the idea and domain expertise coming from members of Don Lindberg's group at the University of Missouri. At the University of Illinois, students working under Donald Michie and Alan Levy have used EMYCIN in two ways: one group developed a new EMYCIN application in tax advising, and the other developed a PASCAL implementation of the ideas used in EMYCIN. The latter program is now being used experimentally in an application involving emergency responses on off-shore drilling rigs. Finally, David Stodolsky at the Systems Science Institute at the University of Louisville has begun to experiment with EMYCIN in an application involving the psychology of interactions in large group conferencing. E. A. Feigenbaum 198 Privileged Communication Section 9.1.5 . MYCIN Project B. Sharing and Interaction with Other SUMEX-AIM Projects We have continued collaboration with the EMYCIN-based projects RX, HEADMED and PUFF. Our development of a domain-independent system is facilitated by having a number of very different working systems on which to test our additions and modifications to EMYCIN. All the projects have provided us with useful comments and suggestions. We have also interacted with members of the SECS project on SUMEX who have considered developing a question answering system for SECS similar to the one in wYCIN, The community created on the SUMEX resource has other benefits that go beyond actual shared computing. Because we are able to experiment with other developing systems, such as INTERNIST, and because we frequently interact with other workers (at the AIM Workshop or at other meetings around the country), many of us have found the scientific exchange and stimulation.to be heightened. Several of us have visited workers at other Sites, sometimes for extended periods, in order to pursue further issues which have arisen through SUMEX- or Workshop-based interactions, In this regard, the ability to exchange messages with other workers, both on SUMEX and at other sites, has been crucial to rapid and efficient exchange of ideas. For example, most of the invitations and planning for the 6th AIM Workshop, to be held at Stanford in August 1980, have been accomplished via SUMEX or ARPANET mail. Certainly it is unusual for a small community of researchers with similar scholarly interests to have at their disposal such powerful and efficient communication mechanisms, even among those on opposite coasts of the country. C, Critique of Resource Management The SUMEX facility has maintained the high standards that we have praised in the past. The staff members are always helpful and friendly, and work as hard to please the SUMEX community as to please themselves. As a result, the computer is as accessible and easy to use as they can make it. More importantly, it is a reliable and convenient research tool. We extend special thanks to Tom Rindfleisch for maintaining high professional Standards for all aspects of the facility. Due to the introduction of our ONCOCIN work with its special hardware and communication needs, we are aware that we have taxed the limited resources of SUMEX with regards to technical hardware support. It has been next to impossible for one technical specialist (Nick Veizades) to balance the numerous diverse demands on his time. This is not a problem with management of the Resource but a reflection of the need for additional technical personnel associated with SUMEX. We perceive this to be a particularly important requirement in the future if the Resource undertakes an expanded role in the implementation and testing of new hardware. Special mention should be made of the remarkable role played by Tom Rindfleisch and his staff in helping to organize remote demonstrations of MYCIN and INTERNIST. In March 1979, when the American College of Physicians met in San Francisco, they rented a truck and drove to the City Privileged Communication 199 E. A. Feigenbaum MYCIN Project Section 9.1.5 with terminals and monitors. The installation they arranged worked well and provided a superb demonstration environment for the physicians who attended. In New Orleans in 1980, the greater distance prevented us from installing the equipment ourselves. SUMEX kindly offered to help orchestrate the New Orleans arrangements, though, and literally hours were Spent locating terminals, arranging for telephone hookups, and finding the right kind of slave monitors. We salute SUMEX for their uncomplaining assistance in this regard, but also would like to note the need for a mechanism that is somewhat less ad hoc for facilitating the demonstration of SUMEX systems from remote locations. Finally, we continue to feel the need for more computing power. Most of our research and development takes place in the hours from 7 p.m. to 10 a.m., but it is unreasonable to expect all our collaborators to adjust their own schedules around a computer. The existence of the 20/20 has been helpful in permitting demonstrations with good response time, and it will also allow us to introduce ONCOCIN in a real clinical environment within the next several months, but ongoing R&D on the main machine ramains difficult much of the time. Even the evening hours are now seeing higher Toad averages than was once the case. TIT. Research Plans (8/80-7/886) A. Project Goats and Plans EMYCIN Our current plans call for four principal efforts related to EMYCIN. First, the knowledge acquisition component of the program, derived from the TETRESTAS work of Davis, is being modified and expanded. Gur concerns relate to both the inefficiencies and limited power of the current capabilities. The meetings during which the CLOT knowledge base was developed were recorded on tape and are forming the basis of an analysis of the knowledge acquisition process. Some early work imp}ement ing the ideas derived from those tapes is already under way. We are also planning to prepare EMYCIN for "export" during the coming year. This will involve tightening up the code, maximizing efficiencies in space and time use, and improving the system's documentation. We do not intend to recode EMYCIN in a language other than INTERLISP, but do want to make it a stand-alone system that can be used for system building in a number of LISP environments. A key element of the documentation will be to better define those environments in which EMYCIN can be most effectively applied. Now that the design and capabilities of EMYCIN are essentially fixed, we are also planning to develop a new application. Other EMYCIN systems have been developed in parallel with EMYCIN itself, and have therefore affected the program's design, but it is now appropriate to see how effectively a new system can be built within the current system E. A. Feigenbaum 200 Privileged Communication