GUIDON/NEOMYCIN Project D. Requirements for Additional Computing Resources With the addition of two new D-machines for this work, our computing needs will be adequately met in the coming 1-2 years at least. The D-machine's large address space permits development of the large programs that complex computer-aided instruction requires. Graphics enable us to develop new methods for presenting material to naive users. We also plan to use the D-machine as a reliable, constant "load-average” machine, for Tunning experiments with physicians and students. The development of GUIDON2 on the D-machine will demonstrate the feasibility of running intelligent consultation or tutoring systems on small, affordable machines in physicians’ offices, schools, and other remote sites. E. Recommendations for Future Community and Resource Development As we shift our development of systems to personal Lisp machines, such as the Dolphin, it becomes more difficult to access these programs remotely for access from our homes (so that we may work conveniently during the evenings and weekends) and from remote sites for collaboration and demonstration. This problem will be partly ameliorated by “dial-up” (modem) access to these machines, but the use of bitmapped displays requiring a high bandwidth makes the phone lines inadequate for our purposes. Further technological development of networks, probably involving access over cables, will be necessary. As computer resources become more distributed, the need for a central machine does not diminish. Programs and knowledge bases continue to be shared, requiring high- speed network connections among computers and file servers. SUMEX-AIM'S role will shift slightly over the next few years to accommodate these needs, but its identity as a central resource will only change in kind, not importance. Moreover, sophisticated printing devices, such as the Xerox RAVEN, must necessarily be shared, again using a network. Maintenance of this network and its shared devices will become a key activity for the SUMEX staff. Thus, while computing resources will be provided by the “outboard engines” of personal machines, the community will remain intricately linked and dependent on common, but peripheral, resources. From this perspective, future resource development should focus on improving the capabilities of networks, file servers, and attached devices to respond to individual Tequests. Multi-processing becomes a necessity in such an environment, so a request can be honored while the user returns to continue his programming or editing. Privileged Communication 201 E. H. Shortliffe MOLGEN Project 6.1.2, MOLGEN Project MOLGEN - Applications of Artificial Intelligence to Molecular Biology: Research in Theory Formation, Testing, and Modification Prof. E. Feigenbaum and Dr. P. Friedland Department of Computer Science Stanford University Prof. Charles Yanofsky Department of Biology Stanford University I. SUMMARY OF RESEARCH PROGRAM A, Project Rationale The MOLGEN project has focused on research into the applications of symbolic computation and inference to the field of molecular biology. This has taken the specific form of systems which provide assistance to the experimental scientist in various tasks, the most important of which have been the design of complex experiment plans and the analysis of nucleic acid sequences. Our current research concentrates on scientific discovery within the subdomain of regulatory genetics. We desire to explore the methodologies scientists use to modify, extend, and test theories of genetic regulation, and then emulate that process within a computational system. Theory or model formation is a fundamental part of scientific research. Scientists both use and form such models dynamically. They are used to predict results (and therefore to suggest experiments to test the model) and also to explain experimental results. Models are extended and revised both as a result of logical conclusions from existing premises and as a result of new experimental evidence. Theory formation is a difficult cognitive task, and one in which there is substantial scope for intelligent computational assistance. Our research is toward building a system which can form theories to explain experimental evidence, can interact with a scientist to help to suggest experiments to discriminate among competing hypotheses, and can then revise and extend the growing model based upon the results of the experiments. The MOLGEN project has continuing computer science goals of exploring issues of knowledge representation, problem-solving, discovery, and planning within a real and complex domain. The project operates in a framework of collaboration between the Heuristic Programming Project (HPP) in the Computer Science Department and various domain experts in the departments of Biochemistry, Medicine, and Biology. It draws from the experience of several other projects in the HPP which deal with applications of artificial intelligence to medicine, organic chemistry, and engineering. B. Medical Relevance and Collaboration The field of molecular biology is nearing the point where the results of current research will have immediate and important application to the pharmaceutical and chemical industries. Already, clinical testing has begun with synthetic interferon and human growth hormone produced by recombinant DNA technology. Governmental reports estimate that there are more than 200 new and established industrial firms already undertaking product development using these new genetic tools. E. H. Shortliffe 202 Privileged Communication MOLGEN Project The programs being developed in the MOLGEN project have already proven useful and important to a considerable number of molecular biologists. Currently several dozen researchers in various laboratories at Stanford (Prof. Paul Berg's, Prof. Stanley Cohen's, Prof. Laurence Kedes’, Prof. Douglas Brutlag’s, Prof. Henry Kaplan's, and Prof. Douglas Wailace’s) and over 400 others throughout the country have used MOLGEN programs over the SUMEX-AIM facility. We have exported some of our programs to users outside the range of our computer network (University of Geneva [Switzerland], Imperial Cancer Research Fund [England], and European Molecular Biology Institute [Heidelberg] are examples). The pioneering work on SUMEX has led to the establishment of a separate NIH-supported facility, BIONET, to serve the academic molecular biology research community with MOLGEN-like software. BIONET is now serving many of the computational needs of over 1000 academic molecular biologists in the United States. C. Highlights of Research Progress C.1 Accomplishments The current year has seen the completion of our initial study of the Yanofsky project on genetic regulation in the trp operon. In addition we have tested several models of qualitative simulation of biological systems and begun our design of a theory discovery system. Finally, a new application program for DNA sequence analysis was developed by one of our research collaborators. The highlights of this work are summarized in several categories below. C.1.1 The Scientific Process of Theory Formation, Modification, and Testing The first goal of our work in scientific theory discovery was to extensively study an existing example of the process. Professor Charles Yanofsky's work in elucidating the structure and function of regulation in the trp operon of E. coli provided us with an excellent subject that spanned twelve years of research, dozens of collaborators, and almost one hundred research papers. We have conducted extensive interviews with Professor Yanofsky and many of his former students and collaborators. We have examined most of the relevant research papers. We believe we now have a good understanding of the three major classes of knowledge that were important in the discovery of the theory of regulation in the trp operon: knowledge about the relevant biological objects, knowledge about the techniques used to elicit new information, and discovery heuristics used to build new models. In addition, we have developed an initial model for the inference mechanisms used during the discovery process. This model includes at least four different types of reasoning: data-driven, theory-driven, analogy to closely-related biological systems, and analogy to other systems (railroad engines and tracks, for example). C.1.2 Knowledge-Based Simulation of the Trp Operon The first major programming task of our project was to build a knowledge base representing the initial state of knowledge about the tryptophan operon system at the beginning of the Yanofsky research. This initial knowledge base contains information relevant to genetic regulation in general and to the trp operon System in particular. The information relates both to structure, i.e. the physical characteristics of the biological objects, and to function, i.e. the operational characteristics of the biological objects. In addition, the procedural knowledge needed to relate Structure to function plays an important part in the knowledge base. The goal was to have a knowledge base that can be used "actively" to simulate the result of various possible changes in the underlying regulatory model. For example, a Privileged Communication 203 E. H. Shortliffe MOLGEN Project common experimental method for studying a biological system is to introduce a mutation which destroys the functionality of some piece of the system. The regulatory knowledge base should be able to simulate and describe the results of such a "deletion mutation." As a first experiment, we built the knowledge base using the Unit System (developed under previous MOLGEN work). We were able to successfully model most of the important processes of Jacob-Monod repression, the initial model of genetic regulation used in the Yanofsky research. C.1.3 A Model for Theory Discovery In parallel with our work on knowledge base construction, we designed an initial architecture for theory proposal, extension, and correction. In human scientists we have observed at least four major types of reasoning during the cognitive process. The first ‘is data-driven reasoning when the major goal is to explain individual experimental Tesults. The second is theory-driven reasoning which occurs when a partial theory or model drives its own extension. The third type of reasoning involves looking at closely related biological systems (e.g, noticing a similar behavior in the his operon system). The final type of reasoning relates to more distant analogies; thinking of DNA polymerase moving along a nucleotide sequence as similar to a railroad engine moving along a set of tracks. Our discovery system architecture embraces all of these reasoning types within a blackboard-style hybrid architecture. In addition, we have fit our overall model of simulation and discovery into a framework of research on machine learning. This framework involves interacting performance and learning elements. The performance element, here the knowledge- based system for qualitative simulation of regulatory genetics, is asked to explain observations from the real world. The learning element, here the discovery architecture described above, is able to evaluate the explanations and “tune” the performance element by changing its model (or theory) of the world. C.1.1.4 Simultaneous alignment of DNA sequences--MULTAN Previously, MOLGEN researchers have developed numerous programs to aid in the symbolic analysis of DNA sequences. During the last year Dr. William Bains (a postdoctoral scholar in Professor Kedes’ laboratory), completed a program called MULTAN which allows the facile alignment of three or more DNA sequences. This was a major unsolved problem in sequence analysis and the program is now undergoing final testing on the BIONET resource. In the future, we expect that BIONET will support development of application-oriented programs of this type, while MOLGEN and SUMEX will focus on research-oriented systems with major AI goals. C.2 Research in Progress We have two major goals over the next several months. The first is to convert and enhance our knowledge-based simulation model within the KEE tool from IntelliCorp, Inc. KEE will be a significant improvement over the Unit System in three areas: speed, functionality, and support. IntelliCorp is providing KEE for use in our research without charge. Studies have indicated that using KEE will unable us to produce a reasonable prototype of our discovery system in about half the time or using the Unit System. Our second goal is to more formally define the learning element of our discovery system and to build a first test system that operates upon the simulation system knowledge base. E. H. Shortliffe 204 Privileged Communication MOLGEN Project D. Publications 1. Bach, R., Friedland, P., Brutlag, D. and Kedes, L.: MAXIMIZE, a DNA sequencing strategy advisor. Nucleic Acids Res. 10(1):295-304, January, 1982. 2. Bach, R., Friedland, P., and Iwasaki, Y.: Intelligent computational assistance for experiment design. Nucleic Acids Res. 12(1):11-29, January, 1984, 3. Brutlag, D., Clayton, J., Friedland, P. and Kedes, L.: SEQ: A nucleotide sequence analysis and recombination system. Nucleic Acids Res. 10(1):279-294, January, 1982. 4. Clayton, J. and Kedes, L.: GEL, a DNA Sequencing project management system. Nucleic Acids Res. 10(1):305-321, January, 1982. 5. Feitelson, J. and Stefik, MJ.: A case study of the reasoning in a genetics experiment. Heuristic Programming Project Report HPP-77-18 (working Paper), May, 1977. 6. Friedland, P.: Knowledge-based experiment design in molecular genetics. Proc. Sixth IJCAI, August, 1979, pp. 285-287. 7. Friedland P.: Knowledge-based experiment design in molecular genetics. Stanford Computer Science Report STAN-CS-79-760 (Ph.D. thesis), December, 1979. 8. Friedland, P., Kedes, L. and Brutlag D.: MOLGEN--Applications of symbolic computation and artificial intelligence to molecular biology. Proc. Battelle Conference on Genetic Engineering, April, 1981. 9. Friedland, P.: Acquisition of procedural knowledge from domain experts. Proc. Seventh IJCAI, August, 1981, pp. 856-861. 10. Friedland, P., Kedes, L., Brutlag, D., Iwasaki, Y. and Bach R.: GENESIS, a Knowledge-based genetic engineering simulation system for representation of genetic data and experiment planning. Nucleic Acids Res. 10(1):323-340, January, 1982. 11. Friedland, P., and Kedes, L.: Discovering the secrets of DNA. (To appear in a ea issue of Communications of the ACM and IEEE/Computer, October, 1985). 12. Friedland, P. and Iwasaki Y.: The concept and implementation of skeletal plans. (To appear in Journal of Automated Reasoning, Vol. 1, No. 2, 1985). 13. Friedland, P., Armstrong, P., and Kehler, T.: The role of computers in biotechnology. BIONTECHNOLOGY 565-575, September, 1983. 14, Iwasaki, Y. and Friedland, P.: SPEX: A second-generation experiment design system. Proc. of Second National Conference on Artificial Intelligence, August, 1982, pp. 341-344. 15. Martin, N., Friedland, P., King, J. and Stefik MJ: Knowledge base management for experiment planning in molecular genetics. Proc. Fifth IJCAI, August, 1977, pp. 882-887. 16. Meyers, S. and Friedland, P.: Knowledge-based simulation of regulatory genetics in bacteriophage Lambda. Nucleic Acids Res. 12(1):1-9, January, 1984. Privileged Communication 205 E. H. Shortliffe MOLGEN Project 17, Stefik, M. and Friedland, P.: Machine inference for molecular genetics: Methods and applications. Proc. of NCC, June, 1978. 18. Stefik, MJ. and Martin N.: A review of knowledge based problem solving as a basis for a genetics experiment designing system. Stanford Computer Science Report STAN-CS-77-596, March, 1977. 19. Stefik, M.: Inferring DNA structures from segmentation data: A case study. Artificial Intelligence 11:85-114, December, 1977. 20. Stefik, M.: An examination of a frame-structured representation system. Proc. Sixth IJCAI, August, 1979, pp. 844-852. 21. Stefik, M.: Planning with constraints. Stanford Computer Science Report STAN-CS-80-784 (Ph.D. thesis), March, 1980. E. Funding Support The MOLGEN grant is titled: MOLGEN: Applications of Artificial Intelligence to Molecular Biology: Research in Theory Formation, Testing, and Modification. It is NSF Grant MCS-8310236. Current Principal Investigators are Edward A. Feigenbaum Professor of Computer Science and Charles Yanofsky, Professor of Biology. MOLGEN is currently funded from 11/84 to 10/85 at $131,621 including indirect costs as the first year of a three year grant. II. INTERACTIONS WITH THE SUMEX-AIM RESOURCE SUMEX-AIM continues to provide the bulk of our computing resources. The facility has not only provided excellent support for our programming efforts but has served as a major communication link among members of the project. Systems available on SUMEX-AIM such as INTERLISP, TV-EDIT, and BULLETIN BOARD have made possible the project's programming, documentation and communication efforts. The interactive environment of the facility is especially important in this type of project development. We strongly approve of the network-oriented approach to a programming environment that SUMEX has begun to evolve into. The ability to utilize LISP workstations for intensive computing while still communicate with all of the other SUMEX resources has been very valuable to our work. We see a satisfactory mode of operation where most programming takes place on the workstations and most electronic communications, information sharing, and document preparation takes place within the mature TOPS-20 environment. The evolution of SUMEX has alleviated most of our previous problems with resource loading and file space. Our current workstations are not quite fast nor sophisticated enough, but we are encouraged by the progress that has been made. We have taken advantage of the collective expertise on medically-oriented knowledge- based systems of the other SUMEX-AIM projects. In addition to especially close ties with other projects at Stanford, we have greatly benefited by interaction with other projects at yearly meetings and through exchange of working papers and ideas over the system. The ability for instant communication with a large number of experts in this field has been a determining factor in the success of the MOLGEN project. It has made possible the near instantaneous dissemination of MOLGEN systems to a host of experimental users in laboratories across the country. The wide-ranging input from these users has greatly improved the general utility of our project. We find it very difficult to find fault with any aspect of the SUMEX resource E. H. Shortliffe 206 Privileged Communication MOLGEN Project management. It has made it easy for us to expand our user group, to give demonstrations (through the 20/20 adjunct system as well as the LISP workstations), and to disseminate software to non-SUMEX users overseas. Ill. RESEARCH PLANS A, Project Goals And Plans Our current work has the following major goals: 1, Use the knowledge base to explain observations that are indeed explainable without changes to the current model. For example, "I have observed a mutation that causes constitutive (uncontrolled) production of tryptophan. How can that be explained within the Jacob-Monod model?" This process will be accomplished by some combination of forward simulation and backward rule-chaining. 2. Begin to recognize when observations are "interesting." Interesting here has one of the following broad meanings: a. A seeming direct contradiction to the existing theory. b. A statistically rare occurrence (one that is understandable by the current theory, but should not occur very often). c. A dramatic confirmation of the existing model. d. An observation currently unpredictable by the current model because the model is either not detailed enough or incomplete. The observation in this case must have a relation to the model because an important object of the model is involved or it relates to an effect predicted by the model. 3. Build a mechanism for postulating extensions or corrections to the current theory: a contrained regulatory theory generator. The overall approach to this mechanism is perhaps the most interesting problem in our work. In discussions with other computer scientists, the notion of “or” reasoning where the theory construction process consists of hierarchical refinement of abstract ideas into more detailed ones, and "and" reasoning where the theory is built up in little pieces at many different levels Simultaneously has emerged. We see strong evidence for both types of reasoning within Yanofsky’s project. In fact, as stated above, the global model of Yanofsky's laboratory is a hybrid one. Individual graduate students performed “and” tasks--filling in details of seemingly unrelated pieces of the model. Yanofsky was the master "or" reasoner, slowly building a hierarchical model of the new regulatory mechanism. It is in this area of our research where the greatest discussion with AI colleagues is needed and which may produce the most significant AI benefits. 4. Build a mechanism for evaluating alternative theories. This would include Tating the theories based on plausibility, selectability, completeness, significance, and so on. We hope the evaluation process produces information useful in discriminating among the possible theories. 5. Test the entire structure on the evolving trp operon regulatory system. Experiment with different initial knowledge bases to see how the discovery process is altered by the availability of new techniques, analogous systems, etc. Privileged Communication 207 E. H. Shortliffe MOLGEN Project B. Justification and Requirements for Continued SUMEX Use The MOLGEN project depends heavily on the SUMEX facility. We have already developed several useful tools on the facility and are continuing research toward applying the methods of artificial intelligence to the field of molecular biology. The community of potential users, is growing nearly exponentially as researchers from most of the biomedical-medical fields become interested in the technology of recombinant DNA. We believe the MOLGEN work is already important to this growing community and will continue to be important. The evidence for this is an already large list of pilot exo-MOLGEN users on SUMEX. We support with great enthusiasm the acquisition of satellite computers for technology transfer and hope that the SUMEX staff continues to develop and support these systems. One of the oft-mentioned problems of artificial intelligence research is exactly the problem of taking prototypical systems and applying them to real problems. SUMEX gives the MOLGEN project a chance to conquer that problem and potentially supply scientific computing resources to a national audience of biomedical-medical research scientists. E. H. Shortliffe 208 Privileged Communication ONCOCIN Project 6.1.3. ONCOCIN Project ONCOCIN Project Edward H. Shortliffe, M.D., Ph.D. Departments of Medicine and Computer Science Stanford University I. SUMMARY OF RESEARCH PROGRAM A. Project Rationale The ONCOCIN Project is one of many Stanford research programs devoted to the development of knowledge-based expert systems for application to medicine and the allied sciences. The central issue in this work has been to develop a program that can provide advice similar in quality to that given by human experts, and to insure that the system is easy to use and acceptable to physicians. The work seeks to improve the interactive process, both for the developer of a knowledge-based system, and for the intended end user. In addition, we have emphasized clinical implementation of the developing tool so that we can ascertain the effectiveness of the program's interactive capabilities when it is used by physicians who are caring for patients and are uninvolved in the computer-based research activity. B. Medical Relevance and Collaboration The lessons learned in building prior production rule systems have allowed us to create a large oncology protocol management system much more rapidly than was the case when we started to build MYCIN. We introduced ONCOCIN for use by Stanford oncologists in May 1981. This would not have been possible without the active collaboration of Stanford oncologists who helped with the construction of the knowledge base and also kept project computer scientists aware of the psychological and logistical issues related to the operation of a busy outpatient clinic. C. Highlights of Research Progress C.1 Background and Overview of Accomplishments The ONCOCIN Project is a large interdisciplinary effort that has involved over 35 individuals since the project’s inception in July 1979. With the work currently in its sixth year, we summarize here the milestones that have occurred in the research to date: e Year I: The project began with two programmers (Carli Scott and Miriam Bischoff), a Clinical Specialist (Dr. Bruce Campbell) and students under the direction of Dr. Shortliffe and Dr. Charlotte Jacobs from the Division of Oncology. During the first year of this research (1979-1980), we developed a prototype of the ONCOCIN consultation system, drawing from programs and capabilities developed for the EMYCIN system-building project. During that year, we also undertook a detailed analysis of the day-to-day activities of the Stanford Oncology Clinic in order to determine how to introduce ONCOCIN with minimal disruption of an operation which is already running smoothly. We also spent much of our time in the first year giving careful consideration to the most appropriate mode of interaction with physicians in order to optimize the chances for ONCOCIN to become a useful and accepted tool in this specialized clinical environment. Privileged Communication 209 E. H. Shortliffe ONCOCIN Project ¢ Year 2: The following year (1980-1981) we completed the development of a special interface program that responds to commands from a customized keypad. We also encoded the rules for one more chemotherapy protocol (oat cell carcinoma of the lung) and updated the Hodgkin's Disease protocols when new versions were released late in 1980; these exercises demonstrated the generality and flexibility of the representation scheme we had devised. Software protocols were developed for achieving communication between the interface program and the reasoning program, and we coordinated the Printing routines needed to produce hard copy flow sheets, patient summaries, and encounter sheets. Finally, fines were installed in the Stanford Oncology Day Care Center, and, beginning in May 1981, eight fellows in oncology began using the system three mornings per week for management of their patients enrolled in lymphoma chemotherapy protocols. ¢ Year 3: During our third year (1981 - 1982) the results of our early experience with physician users guided both our basic and applied work. We designed and began to collect data for three formal studies to evaluate the impact of ONCOCIN in the clinic. This latter task required special software development to generate special flow sheets and to maintain the records needed for the data analysis. Towards the end of 1982 we also began new Tesearch into a critiquing model for ONCOCIN that involves “hypothesis assessment” rather than formal advice giving. Finally, in 1982 we began to develop a query system to allow system builders as well as end users to examine the growing complex knowledge base of the program. , Year 4: Our fourth year (1982-1983) saw the departure of Carli Scott, a key figure in the initial design and implementation of ONCOCIN, the promotion of Miriam Bischoff to Chief Programmer, and the arrival of Christopher Lane as our second scientific programmer. At this time we began exploring the possibility of running ONCOCIN on a single-user professional workstation and experimented with different options for data- entry using a "mouse" pointing device. Christopher Lane became an expert on the Xerox workstations that we are using. In addition, since ONCOCIN had grown to such a large program with many different facets, we spent much of our fourth year documenting the system. During that year we also modified the clinic system based upon feedback from the physician-users, made some modifications to the rules for Hodgkin's disease based upon changes to the protocols, and completed several evaluation studies. Year 5: The project's fifth year (1983-1984) was characterized by growth in the size of our staff (three new full-time staff members and a new oncologist joined the group). The increased size resulted from a DRR grant that permitted us to begin a major effort to rewrite ONCOCIN to run on professional workstations. Dr. Robert Carlson, who had been our Clinical Specialist for the previous two years, was replaced by Dr. Joel Bernstein, while Dr. Carlson assumed a position with the nearby Northern California Oncology Group; this appointment permitted him to continue his affiliation both with Stanford and with our research group. In August of 1983, Larry Fagan joined the project to take over the duties of the ONCOCIN Project Director while also becoming the Co-Director of the newly formed Medical Information Sciences Program. Dr. Fagan continues to be in charge of the day-to-day efforts of our research. An additional programmer, Jay Ferguson, joined the group in the fall to assist with the effort required to transfer ONCOCIN from SUMEX to the 1108 workstation. A fourth programmer, Joan Differding, joined the staff to work on our protocol acquisition effort (OPAL). E. H. Shortliffe 210 Privileged Communication ONCOCIN Project e Year 6: During our sixth year (1984-1985) we have further increased the size of our programming staff to help in the major workstation conversion effort. The ONCOCIN and OPAL efforts were greatly facilitated by a successful application for an equipment grant from Xerox Corporation. With a total of 15 Xerox LISP machines now available for our group's research, all full time programmers have dedicated machines, as do several of the senior graduate students working on the project. Christopher Lane took on full-time responsibility for the integration and maintenance of the group's equipment and associated software. Two of our programming staff moved on to jobs in industry (Bischoff and Ferguson) and three new programmers (David Combs, Cliff Wulfman, and Samson Tu) were hired to fill the void created by their departure and by the reassignment of Christopher Lane. With daily coordination by the project's data manager, Janice Rohn, the DEC-20 version of ONCOCIN continues to be used on a limited basis in the Stanford Oncology Clinic. The continued dependence on this time-shared computer, however, has prevented us from using ONCOCIN in in many clinical problem areas (other than the lymphomas where clinics are held three mornings per week, and breast cancer where clinic is held one day per week) because of our inability to assure the system's availability with reasonable response time. It is this latter point that has accounted for our decision not to spend a great deal of time developing new protocols to run on the DEC-20 ONCOCIN prototype. Instead we have pressed our effort to adapt ONCOCIN to run on professional workstations which can eventually be dedicated to full time clinic use. We envision these workstations as the model for eventual dissemination of this kind of technology. In addition to funding from DRR for the workstation conversion effort, we have support from the National Library of Medicine that supports our more basic research activities regarding biomedical knowledge representation, knowledge acquisition, therapy planning, and explanation as it relates to the ONCOCIN task domain. A grant from the NLM to study the therapy planning process was received, and this work (led by Dr. Fagan) is in its second year. This research is investigating how to represent the therapy planning strategies used to decide treatment for patients on the oat cell carcinoma protocol who run into serious problems requiring consultation with the protocol study chairman. Dr. Branimar Sikic, a faculty member from the Stanford University Department of Medicine, and the Study Chairman for the oat cell protocol, is collaborating on this project. C.2 Research in Progress The major efforts of the ONCOCIN project over the last year have fallen into three major categories: (1) conversion of ONCOCIN to run on workstations, (2) development of a knowledge acquisition interface (OPAL) for entering new protocols, and (3) research on modeling the strategic therapy selection process (ONYX). Efforts are also in progress to evaluate the system, to document the results of the Tesearch, and to disseminate the technology to sites beyond Stanford. We summarize these ongoing research efforts below. C.2.1 Transfer of the ONCOCIN system from the DEC-20 to the Xerox 1108 In an effort to improve the efficiency of the reimplemented system (and thereby to improve its response time and make it more acceptable to physicians), we have undertaken a substantial system redesign while transferring it to the new machines. An additional commitment in time and programming effort has resulted, but we are confident that the resulting system will be a substantial improvement over the prototype. There have been several aspects to the system's reimplementation during the current year: Privileged Communication 211 E. H. Shortliffe ONCOCIN Project ¢ Reorganization and recoding of existing programs for improved efficiency. In last year's report, we discussed our first steps in reorganizing the program. A further analysis during the year suggested that we should consider a redesign of the program to take advantage of our experience with the existing program and to respond to advances in artificial intelligence representation methods since ONCOCIN was first designed. In addition, our work during the year on new methods for entering knowledge into the System suggested corresponding improvements in the ways to represent oncologic knowledge in the system (see paper by Musen, et al. for more details on the redesign of the ONCOCIN system). ¢ Redesign of the reasoning component. As a major part of the redesign of the system, we decided to concentrate on methods that would allow for a more efficient search of the knowledge base during the running of a case. We have implemented and are currently debugging a reasoning program that uses a discrimination network to process the cancer protocols. This network allows for a compact representation of information that overlaps elements of multiple protocols, but does not require the program to consider and then disregard information related to protocols that are irrelevant to a particular patient. e Development of a temporal network. The ability to represent temporal information is a key element of programs that must reason about treatment protocols. The earlier version of the ONCOCIN system did not have an explicit structure for reasoning about time oriented events (see the paper by Kahn, et al. for a more detailed description of the temporal network). e Extensions to the user interface. The user interface has been extended so that it can read patient data files of the type that are created by the original ONCOCIN system. This will allow us to transfer currently active patients to the new version of the ONCOCIN system. A detailed description of the user interface is available in the paper by Lane, et al. e Connecting the components of the ONCOCIN system. The reasoning component, user interface, and knowledge acquisition program (described below) have been developed as separate programs. In the final version of the system, the knowledge acquisition program must be able to automatically translate from the graphical input forms into the knowledge base. The Teasoner and user interface components are independent programs that run in parallel while communicating with each other. Each of these connections between components has been tested on a limited basis and will continue to be exercised during the next several months. Knowledge engineering tools. The challenge of coordinating a large software development project, with multiple programmers working in parallel, has necessitated the development of specialized tools to facilitate the process of system construction and maintenance. One area of particular concern has been the need for tools to assist with knowledge base maintenance (see paper by Tsuji and Shortliffe for a discussion of our initial work in this area). e System support for the reorganization. The LISP language that we used to build the first version of ONCOCIN does not explicitly support basic knowledge manipulation techniques (viz. message passing, inheritance techniques, or other object oriented programming structures). These facilities are available in some commercial products, but none of the existing commercial implementations provides the reliability, speed, size, or special memory-manipulation techniques that are needed for our project. E. H. Shortliffe 212 Privileged Communication ONCOCIN Project We have accordingly developed a “minimal” object-oriented system to meet these specifications. The object system is currently in use by each component of the new version of ONCOCIN and in the software used to connect the components. In addition, several student projects are now able to use this programming environment. C.2.2 Interactive Entry of Chemotherapy Protocols by Oncologists (OPAL) A_ major effort in this grant year has been the development of software (termed the OPAL system) that will permit physicians who are not computer programmers to enter protocol information into a structured set of forms on a graphical display. Most early expert systems required tedious (and occasionally erroneous) entry of the system's medical knowledge. Each segment of knowledge was transferred from physician to programmer and then entered into the program by the computer expert. Although many programs allowed for specification of a structure within which to organize the information, only minimal attempts were made to define a description that would be generic enough to provide a basis for a series of related knowledge bases in one medical area. We have taken advantage of the generally well-structured nature of cancer treatment plans to design a knowledge entry program that can be used directly by clinicians. The structure of cancer treatment plans includes: multiple protocols (that may be related to each other), experimental research arms in each protocol, drug combinations, individual drugs, and drug modifications. Using the graphically-oriented workstations, this information is presented to the user as computer-generated forms that appear on the screen. As the protocol is described, new forms are added to the computer display to allow for the specification of the special cases that make the protocols so complicated. Although this design appears to be organized specifically for cancer treatment plans, we believe that the technique can be extended to other clinical trials, and eventually to other structured decision tasks. The key factor is to exploit the regularities in the Structure of the task (e.g., this interface has an extensive notion of how chemotherapy regimens are constructed) rather than to try to build a knowledge entry system that could accept any possible problem specification. Using this program we have entered several versions of a small cell lung cancer protocol, and a complicated lymphoma protocol with several different therapies. We are currently implementing the changes suggested by entering these protocols. C.2.3 Strategic Therapy Planning (ONYX) As mentioned above, we have begun a new research project to study the therapy planning process, and how strategies which are used to plan therapy in difficult cases might be represented on a computer. This project, which we call the ONYX project, has as its goals: to conduct basic research into the possible representations of the therapy planning process; to develop a computer program to represent this process; and eventually to interface the planning program with ONCOCIN. The project members (Fagan, Tu, Langlotz, and Williams) have spent many hours meeting with Dr. Sikic trying to understand how he plans therapy for patients whose special clinical situation precludes following the standard therapeutic plan described in the protocol document. In March of last year, the group spent two days at Xerox Palo Alto Research Center (PARC), working with Mark Stefik, Daniel Bobrow and Sanjay Mittal of PARC on possible representations for the knowledge structures and how such a program might run using the LOOPS knowledge programming system. A prototype version of this program is currently being tested. The prototype program has been designed as two components: the strategic planning program and the qualitative simulation builder. The strategic planning program is capable of turning the patient's medical data and knowledge of the Privileged Communication 213 E. H. Shortliffe ONCOCIN Project intent of the protocol into a small number of plausible protocol modifications for the current point in time, and conditional modifications for the near future. Another component of the system is capable of building simulation models using the graphical abilities of the 1108 workstation. The first test of this component is the construction of a model of the effects of chemotherapy drugs on the bone marrow of the patient. During the next year of research this type of qualitative simulation model will be integrated into the strategic planning program. C.2.4 Evaluations of ONCOCIN's performance We have completed our first three formal studies of ONCOCIN'’s DEC-20 version (see papers by Kent et al. and Hickam et al. for results of two of these; written reports on the third is in preparation). Lessons learned in these initial studies have led to revisions both in the design of ONCOCIN and in our plans for evaluation studies of the 1108 version of the system when it is implemented at non-Stanford sites in later years. C.2.5 Documentation We have developed a videotape that discusses and demonstrates our research on the workstation version of our system. This tape has been shown at national meetings and has been extensively distributed to researchers internationally who have shown an interest in our work. The publication list that accompanies this report further documents the design decisions we have made in developing the new version of ONCOCIN. C.2.6 Dissemination In anticipation of completion of the workstation version of ONCOCIN, we are beginning to plan for an experiment in which we will install ONCOCIN workstations in private oncology offices in San Jose and Fresno. An application proposing this work is current under review. D. Publications Since January 1984 1. (*) Buchanan, B.G. and Shortliffe, EH: Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project. Addison-Wesley, Reading, MA., 1984, [book] 2. (*) Clancey, WJ. and Shortliffe, EH: Readings in Medical Artificial thookpe The First Decade. Addison-Wesley, Reading, MA., 1984. book 3. Clancey, WJ. and Shortliffe, EH: Strategies for medical knowledge engineering: Lessons from the first decade. To appear in the Proceedings of the AAMSI Congress 85, San Francisco, CA., May 1985. 4. Differding, J.C. The OPAL interface: General Overview. Working paper. August 1984. 5. (*) Fagan, L.: New Directions for Expert Systems: Examples from the ONCOCIN Project. To appear in the Proceedings of AAMSI Congress 85, San Francisco, CA., May 1985. 6. (*) Hickam, D.H., Shortliffe, E.H., Bischoff, M.B., Scott, A.C., Jacobs, C.D.: A study of the treatment advice of a computer-based cancer chemotherapy protocol advisor. Submitted for publication, May 1985. 7. (*) Kahn, M.G., Ferguson, J., Shortliffe, E.H., Fagan, L.: An approach for structuring temporal information in the ONCOCIN system. To appear in the E. H. Shortliffe 214 Privileged Communication ONCOCIN Project Proceedings of the Symposium on Computer Applications in Medical Care, Baltimore, MD., November 1985. 8. (*) Kent, D.L., Shortliffe, E.H. Carlson, R.W., Bischoff, M.B., Jacobs, C.D.: Improvements in data collection through physician use of a computer~ a chemotherapy treatment consultant. Submitted for publication, March 85. 9. (*) Lane, C.D., Differding, J.C., Shortliffe, EH: Design of a graphic interface for a medical expert system. (Memo KSL-85-15). Working paper. 10. (*) Langlotz, C., Fagan, L., Tu, S. Williams, J., Sikic, B: ONYX: An architecture for planning in uncertain environments. To appear in the Proceedings of International Joint Conference on Artificial Intelligence, Los Angeles, CA., August 1985. 11. (*) Langlotz, C.P. and Shortliffe, E.H.: Adapting a consultation system to critique user plans. In Developments in Expert Systems, (M. Coombs, ed.), pp. 77-94, London: Academic Press, 1984. 12. (*) Musen, M., Langlotz, C., Fagan, L. Shortliffe, EH: Rationale for knowledge base redesign in a medical advice system. To appear in the Proceedings of AAMSI Congress 85, San Francisco, CA., May 1985. 13. Shortliffe, E.H.: The science of biomedical computing.Medical Informatics, Vol.9, Nos. 3/4, 185-193 (1984). 14. (*) Shortliffe, E.H.:Reasoning methods in medical consultation systems: artificial intelligence approaches (tutorial). Computer Programs in Biomedicine 18:5-14 (1984). 15. Shortliffe, E. H.: Explanation capabilities for medical consultation systems (tutorial). Proceedings of AAMSI Congress 84 (D. Lindberg and M. Collen, Eds.), pp. 193-197, San Francisco, May 1984. 16. Shortliffe, EH. and Fagan, L.M.: Artificial intelligence: the expert systems approach to medical consultation. Proceedings of the 6th Annual International Symposium on Computers in Critical Care and Pulmonary Medicine, Heidelberg, Germany, June 1984. 17. (*) Shortliffe, EH. Update on ONCOCIN: A chemotherapy advisor for clinical oncology. Proceedings of the Symposium on Computer Applications in Medical Care, November 1984. 18. (*) Tsuji, S. and Shortliffe, E.H.: Graphics for knowledge engineers: a window on knowledge base management (Memo KSL-85-11). Submitted for publication, April 1985. E. Funding Support Grant Title: “Studies in the Dissemination of Consultation Systems” Principal Investigator: Edward H. Shortliffe Agency: Biomedical Research Technology Program, Division of Research Resources ID Number: RR 01613 Term: July 1983 to June 1986 Total award: $624,455 Privileged Communication 215 E. H. Shortliffe ONCOCIN Project Current award: (7/84-6/85): $222,511 (Direct costs) Grant Title: "Therapy-planning strategies for consultation by computer” Principal Investigator: Edward H. Shortliffe Agency: National Library of Medicine ID Number: LM-04136 Term: August 1983 to July 1986 Total award: $211,851 Current award: (8/84-7/85) $69,875 (Direct costs) Grant Title: “Postdoctoral Training in Medical Information Science" Principal Investigator: Edward H. Shortliffe Agency: National Library of Medicine ID Number: 1 T32 LM07033 Term: July 1, 1984 - June 30, 1989 Total award: $903,718 Current award: (7/84-6/85) $79,059 (Direct costs) Grant Title: Explanation of Computer-Assisted Therapy Plans" Principal Investigator: Lawrence M. Fagan Agency: National Library of Medicine (New Investigator Grant) ID Number: 1 R23 LM04316 Term: February 1985-January 1988 Total award: $107,441 Current award: (2/85-1/86) $37,500 (Direct Costs) Grant Title: Henry J. Kaiser Faculty Scholar in General Internal Medicine Principal Investigator: Edward H. Shortliffe Agency: Henry J. Kaiser Family Foundation Term: July 1983 to June 1986, renewable until June 1988 Total award: $150,000 ($50,000 annually). Grant Title: Information structure and use in knowledge-based expert systems Principal Investigator: Bruce G. Buchanan Co-Principal Investigator: Edward H. Shortliffe Agency: National Science Foundation - IST83-12148 Term: March 1, 1984 - February 28, 1987 Total award: $330,000 (includes indirects) Il. INTERACTIONS WITH THE SUMEX-AIM RESOURCE A. Medical Collaborations and Program Dissemination via SUMEX A great deal of interest in ONCOCIN has been shown by the medical, computer science, and lay communities. We are frequently asked to demonstrate the program to Stanford visitors (both the prototype system running in the clinic and the newer work transferring the system to professional workstations). We also demonstrated our developing workstation code in the Xerox exhibit in the trade show associated with AAAI-84 in Austin, Texas. Physicians have generally been enthusiastic about ONCOCIN's potential. The interest of the lay community is reflected in the frequent requests for magazine interviews and television coverage of the work. Articles about MYCIN and ONCOCIN have appeared in such diverse publications as Time and Fortune, whereas ONCOCIN has been featured on the "NBC Nightly News", the PBS E. H. Shortliffe 216 Privileged Communication ONCOCIN Project “Health Notes” series, and "The MacNeil-Lehrer Report.” Due to the frequent requests for ONCOCIN demonstrations, we have produced a videotape about the ONCOCIN research which includes demonstrations of our the professional workstation research projects and the 2020-based clinic system. The tape has been shown at several national meetings, including the 1984 Workshop on Artificial Intelligence in Medicine, the 1984 meeting of the Society for Medical Decision Making, and the 1985 meeting of the Society for Research and Education in Primary Care Internal Medicine. The tape has also been shown to both national and international researchers in biomedical computing. Our group also continues to oversee the MYCIN program (not an active research project since 1978) and the EMYCIN program. Both systems continue to be in demand as demonstrations of expert systems technology. MYCIN been demonstrated via networks at both national and international meetings in the past, and several medical school and computer science teachers continue to use the program in. their computer science or medical computing courses. Researchers who visit our laboratory, often start out by experimenting with the MYCIN/EMYCIN systems. We also have made the MYCIN program available to researchers around the world who access SUMEX using the GUEST account. EMYCIN has been made available to interested researchers developing expert systems who access SUMEX via the CONSULT account. One such consultation system for psychopharmacological treatment of depression, called Blue-Box, developed by two French medical students, Benoit Mulsant and David Servan-Schreiber, was reported on in July of 1983 in Computers and Biomedical Research. B. Sharing and Interaction with Other SUMEX-AIM Projects The community created on the SUMEX resource has other benefits that go beyond actual shared computing. Because we are able to experiment with other developing systems, such as INTERNIST/CADUCEUS, and because we frequently interact with other workers (at AIM Workshops or at other meetings), many of us have found the scientific exchange and stimulation to be heightened. Several of us have visited workers at other sites, sometimes for extended periods, in order to pursue further issues which have arisen through SUMEX- or Workshop-based interactions. In this regard, the ability to exchange messages with other workers, both on SUMEX and at other sites, has been crucial to rapid and efficient exchange of ideas. Certainly it is unusual for a small community of researchers with similar scholarly interests to have at their disposal such powerful and efficient communication mechanisms, even among those on opposite coasts of the country. C. Critique of Resource Management Our community of researchers has been extremely fortunate to work on a facility that has continued to maintain the high standards that we have praised in the past. The staff members are always helpful and friendly, and work as hard to please the SUMEX community as to please themselves. As a result, the computer is as accessible and easy to use as they can make it. More importantly, it is a reliable and convenient research tool. We extend special thanks to Tom Rindfleisch for maintaining such high professional standards. As our computing needs grow, we have increased our dependence on special SUMEX skills such as networking and communication protocols. III. RESEARCH PLANS A, Project Goals and Plans In the coming year, there are several areas in which we expect to expend our efforts on the ONCOCIN System: Privileged Communication 217 E. H. Shortliffe ONCOCIN Project 1. To transfer the oncology prototype from its current research computer to a professional workstation that provides a model for cost-effective dissemination of clinical consultation systems. To meet this specific aim we will we will continue the basic and applied programming efforts (ONCOCIN, OPAL, and ONYX) described earlier in this report. 2.To encode and implement for use by ONCOCIN the commonly used chemotherapy protocols from our oncology clinic. In the coming year, we will: e Complete our OPAL protocol entry system e Continue entry of additional protocols, hopefully at the rate of one protocol/month (including testing) « Place a version of the OPAL protocol entry system into the clinic for use by physicians as a graphical reference guide to the protocols. 3. To introduce ONCOCIN gradually for ongoing use so that by mid-1986 two professional workstations will be available in the oncology clinic to assist in the management of cancer patients. During the next year, we will: e Implement the first workstation-based ONCOCIN system for use by physicians in the oncology clinic by the end of the calendar year 1985, adding a second workstation within a few months thereafter e Continue to operate the DEC-2020 version to maintain continuity of Support in the clinic setting until the workstation version is fully Operational. B. Justification and Requirements for Continued SUMEX Use All the work we are doing (ONCOCIN plus continued -use of the original MYCIN program) continues to be dependent on daily use of the SUMEX resource. Although much of the ONCOCIN work is shifting to Xerox workstations, the SUMEX 2060 and the 2020 continue to be key elements in our research plan. The programs all make assumptions regarding the computing environment in which they operate, and the ONCOCIN prototype currently used in the clinic depends upon proximity to the DEC 2020 which enables us to use a 9600 baud interface. In addition, we have long appreciated the benefits of GUEST and network access to the programs we are developing. SUMEX greatly enhances our ability to obtain feedback from interested physicians and computer scientists around the country. Network access has also permitted high quality formal demonstrations of our work both from around the United States and from sites abroad (e.g., Finland, Japan, Sweden, Switzerland), The main development of our project will continue to take place on Dandelion lisp machines that we have purchased or have been donated by XEROX corporation. We also have special needs for more computing power for our ONYX therapy planning research, and have been able to share an upgraded Dandelion loaned by SUMEX for this work. C. Requirements for Additional Computing Resources The acquisition of the DEC 2020 by SUMEX was crucial to the growth of our research work. It has insured high quality demonstrations and has enabled us to develop a system (ONCOCIN) for real-world use in a clinical setting, As we have begun to develop systems that are potentially useful as stand-alone packages (i.e., an exportable E. H. Shortliffe 218 Privileged Communication ONCOCIN Project ONCOCIN), the addition of personal workstations has provided particularly valuable new resources. We have made a commitment to the smaller Interlisp-D machines (Dandelions) produced by Xerox, and our work will increasingly transfer to them over the next several years. Our current funding supports our effort to implement ONCOCIN on workstations in the Stanford oncology clinic (and eventually to move the program to non-Stanford environments) but we will simultaneously continue to require access to Interlisp on upgraded workstations for extremely CPU intensive tasks. Although our dependence on SUMEX for workstations has decreased due to a recent gift from XEROX, our requirements for network Support of the machines has drastically increased. Individual machines do not provide sufficient space to store all of the software used in our project, nor to provide backup or long term storage of work in progress. It is the networks, file storage devices, protocol converters, and other parts of the SUMEX network that hold our project together. In addition, with a research group of about 20 people, we are taking advantage of file sharing, electronic mail, and other information coordinating activities provided by the DEC 2060. We hope that with systems support and research by SUMEX staff, we will be able to gradually move away from a need for the central coordinating machine over the next five years. The acquisition of the DEC 2060, coupled with our increasing use of workstations, has greatly helped with the problems in SUMEX Tesponse time that we had described in previous annual reports. We are extremely grateful for access both to the central machine and to the research workstations on which we are currently building the new ONCOCIN prototype. The D-machine's address Space is permitting development of the large knowledge base that ONCOCIN requires. The graphics capability of the workstations has also enabled us to develop new methods for presenting material to naive users. In addition, the D-machines have provided a reliable, constant "load- average" machine for running experiments with physicians and doing development work. The development of ONCOCIN on the Dandelion will demonstrate the feasibility of running intelligent consultation systems on small, affordable machines in physicians’ offices and other remote sites. D. Recommendations for Future Community and Resource Development SUMEX is providing an excellent research environment and we are delighted with the help that SUMEX staff have provided implementing enhanced system features on the 2060 and on the workstations. We feel that we have a highly acceptable research environment in which to undertake our work. Workstation availability is becoming increasingly crucial to our research, and we have found over the past year that workstation access is at a premium. The SUMEX staff has been very helpful and understanding about our needs for workstation access, allowing us Dandelion use wherever possible, and providing us with systems-level support when needed. We look forward to the arrival of additional advanced workstations and the development of a more distributed computing environment through SUMEX-AIM. Privileged Communication 219 E. H. Shortliffe PROTEAN Project 6.1.4. PROTEAN Project PROTEAN Project Oleg Jardetzky Nuclear Magnetic Resonance Lab, School of Medicine Stanford University Bruce Buchanan, Ph.D. Computer Science Department Stanford University I. SUMMARY OF RESEARCH PROGRAM A. Project Rationale The goals of this project are related both to biochemistry and artificial intelligence: (a) use existing AI methods to aid in the determination of the 3-dimensional structure of proteins in solution (not from x-ray crystallography proteins), and (b) use protein structure determination as a test problem for experiments with the Al problem solving structure known as the Blackboard Model. Empirical data from nuclear magnetic resonance (NMR) and other sources may provide enough constraints on structural descriptions to allow protein chemists to bypass the laborious methods of crystallizing a protein and using X-ray crystallography to determine its structure. This problem exhibits considerable complexity. Yet there is reason to believe that AI programs can be written that reason much as experts do to resolve these difficulties [34]. B. Medical Relevance The molecular structure of proteins is essential for understanding many problems of medicine at the molecular level, such as the mechanisms of drug action. Using NMR data from proteins in solution will speed up the determination. C. Highlights of Progress We have constructed a prototype of such a program, called PROTEAN, designed on the blackboard- model [16], [26]. It is implemented in BB1 [27], a framework system for building blackboard systems that control their own problem-solving behavior [28](see discussion of BB1 above). We have coupled the reasoning program with an IRIS graphics terminal (shared with SUMEX) which displays protein structures at different levels of detail. This provides a visual understanding of how the program is behaving, which is essential for this problem. PROTEAN embodies the following experimental techniques for coping with the complexities of constraint satisfaction: 1. The problem-solver partitions each problem into a network of loosely- coupled sub-problems. PROTEAN partitions the problem of positioning all of a protein's constituent structures within a global coordinate system into sub-problems of positioning individual pieces of structures and their immediate neighbors within local coordinate systems. It subsequently composes the most constrained partial solutions developed for these sub- problems in a complete solution for the entire protein. This partitioning and composition technique reduces the combinatorics of search. It also E. H. Shortliffe 220 Privileged Communication PROTEAN Project introduces additional constraints in the global characteristics of internally constrained partial solutions. For example, the conformations of partial protein solutions constrain their composability with other partial solutions. ad The problem-solver attempts to solve sub-problems and coordinate solutions at multiple levels of abstraction, where lower levels of abstraction partition solution elements with finer granularity. For example, PROTEAN operates at three levels of abstraction. At the "Solid" level, it positions elements of the protein's secondary structure: alpha-helices, beta-sheets, and random coils. At the “Blob” level, it positions elements of the protein's primary structure of amino acids: peptide units and side-chains. At the "Atom" level, it positions the protein's individual atoms. Partial solutions at higher levels of abstraction reduce the combinatorics of search at lower levels. Conversely, tightly constrained partial solutions at lower levels introduce new constraints on higher-level solutions. 3. The problem-solver forbears hypothesizing specific partial solutions for a sub-problem in favor of preserving the "family" of solutions consistent with all constraints applied thus far. For example, in positioning a helix within a partial solution, PROTEAN does not attempt to identify a unique spatial position for the helix. Instead, it identifies the entire Spatial volume within which the helix might lie, given the constraints applied thus far. Preserving the family of legal solutions accommodates problems with incomplete constraints; the solution is only as constrained as the data are constraining. It also accommodates incompatible constraints by permitting disjunctive sub- families. For PROTEAN, disjunctive sub-volumes imply that the associated Structure lies within any one of the sub-volumes or, if the structure is mobile, that it may move from one sub-volume to another. 4. The problem-solver applies constraints one at a time, successively restricting the family of solutions hypothesized for different sub-problems. PROTEAN successively applies constraints on the positions of protein structures, successively restricting the spatial volumes within which they may lie. Independent application of different constraints finesses the problem of integrating qualitatively different kinds of constraints by simply integrating their results. In addition, successive restriction of the family of solutions obviates guessing which specific solutions within a family are likely to be consistent with subsequently applied constraints and the otherwise inevitable back-tracking. 5. The problem-solver tolerates overlapping solutions for different sub- problems, For example, in identifying the volume within which structure-a might lie in partial solution 1, PROTEAN may include part of the volume identified for structure-b. Toleration of overlapping partial solutions is another accommodation of incomplete or incompatible constraints and potentially dynamic solutions. For PROTEAN, overlapping volumes for two protein structures indicate either: (a) that the two structures actually occupy disjoint sub-volumes that cannot be distinguished within the larger, overlapping volumes identified for them because the constraints are incomplete; or (b) that the two structures are mobile and alternately occupy the shared volume. 6. The problem-solver reasons explicitly about control of its own problem- solving actions: which sub-problems it will attack, which partial solutions it will expand, and which constraints it will apply. Control reasoning guides the problem-solver to perform actions that minimize computation, while maximizing progress toward a complete solution (see section 3.2.1). It also Privileged Communication 221 E. H. Shortliffe PROTEAN Project The current version of PROTEAN has six knowledge sources that demonstrate the reasoning techniques described above. These knowledge sources develop partial solutions that position multiple helices at the Solid level and refine those helices at the Blob level. Proposed work will introduce knowledge sources that operate on other protein structures at the Solid level, as well as knowledge sources that apply the reasoning techniques at the Blob and Atom levels. We also will investigate emergent constraints entailed in reliable partial solutions, composition of partial solutions into complete provides a foundation for the problem-solver's explanation of problem- solving activities and intermediate partial solutions (see section 3.2.2) and for its learning of new control heuristics (see section 5.5). solutions, and intelligent control. D. Relevant Publications 1. Erman, L.D., Hayes-Roth, B., Lesser, V.R., Reddy, D.R.:The HEARSAY-IT Speech Understanding System: Integrating Knowledge to Resolve Uncertainty. ACM Computing Surveys 12(2):213-254, June, 1980. - Hayes-Roth, B: The Blackboard Architecture: A General Framework for Problem Solving? Report HPP-83-30, Department of Computer Science, Stanford University, 1983. - Hayes-Roth, B: BBI: An Environment for Building Blackboard Systems that Control, Explain, and Learn about their own Behavior. Report HPP-84-16, Department of Computer Science, Stanford University, 1984. . Hayes-Roth, B.A Blackboard Architecture for Control. Artificial Intelligence In Press, 1985. . Hayes-Roth, B. and Hewett, M.: Learning Control Heuristics in BB1. Report HPP-85-2, Department of Computer Science, 1985. . Jardetzky, O.: A Method for the Definition of the Solution Structure of Proteins from NMR and Other Physical Measurements: The LAC- Repressor Headpiece. Proceedings of the International Conference on the Frontiers of Biochemistry and Molecular Biology, Alma Alta, June 17-24, 1984, October, 1984. E. Funding Support Title: Interpretation of NMR Data from Proteins Using AI Methods PI's: Oleg Jardetzky and Bruce G. Buchanan Agency: National Science Foundation Total Amount: $100,000 Dates: Nov 1, 1984/Oct 31 1986 E. H. Shortliffe 222 Privileged Communication PROTEAN Project Il. INTERACTIONS WITH THE SUMEX-AIM RESOURCE A, Medical Collaborations Several members of Prof. Jardetzky's research group are involved in this research. B. Interactions with other SUMEX-AIM projects Robert Langridge was visiting at Stanford last year, and informal discussions with him and his group have continued in this year. C. Critique of Resource Management The SUMEX staff has continued to be most cooperative in getting this project started. Without their persistence, we would not have been able to obtain Ethernet software for the IRIS graphics terminal from Xerox. Ill. RESEARCH PLANS A. Goals & Plans Our long-range goal is to build an automatic interpretation system similar to CRYSALIS (which worked with x-ray crystallography data). In the shorter term, we are building interactive programs that aid in the interpretation of NMR data on small proteins. The current version of PROTEAN has six knowledge sources that demonstrate the reasoning techniques described above. These knowledge sources develop partial solutions that position multiple helices at the Solid level and refine those helices at the Blob level. The proposed research would expand PROTEAN to include knowledge sources that: 1. construct partial solutions combining helices, beta sheets, and random coils at the Solid level; 2. merge highly constrained partial solutions at the Solid level; 3. refine Solid level solutions in terms of the relative positions of constituent peptide units and side chains at the Blob level: 4. further restrict the relative locations of peptide units and side chains relative to one another at the Blob level; 5. propagate emergent constraints at the Blob level back up to the Solid level to further restrict the relative positions of superordinate helices, beta sheets, and random coils; 6. refine Blob level solutions at the Atom level: 7. further restrict the relative locations of atoms relative to one another; 8. propagate emergent constraints at the Atom level back up to the Blob level to further restrict the relative positions of superordinate peptide units and side chains. The research will also develop a set of control knowledge sources to guide PROTEAN's application of constraints to identify the family of legal protein conformations as efficiently as possible. And we expect to improve the graphics interface to provide more functionality and options for viewing partial structures. Privileged Communication 223 E. H. Shortliffe PROTEAN Project B. Justification for continued SUMEX use We will continue to use SUMEX for developing parts of the program before integrating them with the whole system. We are using Interlisp to implement the Blackboard model and knowledge structures most flexibly and quickly. C. Need for other computing resources In this stage of development we need more computer cycles and hope to have access to additional D-machines. We expect to upgrade the Silicon Graphics IRIS terminal to a workstation for more efficiency in the subprograms doing computational geometry. E. H. Shortliffe 224 Privileged Communication RADIX Project 6.1.5. RADIX Project The RADIX Project: Deriving Medical Knowledge from Time-Oriented Clinical Databases Robert L. Blum, M.D., Ph.D. Department of Computer Science Stanford University Gio C. M. Wiederhold, Ph.D. Departments of Computer Science and Medicine Stanford University I. SUMMARY OF RESEARCH PROGRAM A, Technical Goals - Introduction Medical and Computer Science Goals -- The long-range objectives of our project, called RADIX (formerly RX), are 1) to increase the validity of medical knowledge derived from large time-oriented databases containing routine, non-randomized clinical data, 2) to provide knowledgeable assistance to a research investigator in studying medical hypotheses on large databases, 3) to fully automate the process of hypothesis generation and exploratory confirmation. For system development we have used a subset of the ARAMIS database. Computerized clinical databases and automated medical records systems have been under development throughout the world for at least a decade. Among the earliest of these endeavors was the ARAMIS Project, (American Rheumatism Association Medical Information System) under development since 1969 in the Stanford Department of Medicine. ARAMIS contains records of over 17,000 patients with a variety of theumatologic diagnoses. Over 62,000 patient visits have been recorded, accounting for 50,000 patient-years of observation. The ARAMIS Project has now been generalized to include databases for many chronic diseases other than arthritis. The fundamental objective of the ARAMIS Project and many other clinical database projects is to use the data that have been gathered by clinical observation in order to study the evolution and medical management of chronic diseases. Unfortunately, the process of reliably deriving knowledge has proven to be exceedingly difficult. Numerous problems arise stemming from the complexity of disease, therapy, and outcome definitions, from the complexity of causal relationships, from errors introduced by bias, and from frequently missing and outlying data. A major objective of the RADIX Project is to explore the utility of symbolic computational methods and knowledge-based techniques at solving some of these problems. The RADIX computer program is designed to examine a time-oriented clinical database such as ARAMIS and to produce a set of (possibly) causal relationships. The algorithm exploits three properties of causal relationships: time precedence, correlation, and nonspuriousness. First, a Discovery Module uses lagged, nonparametric correlations to generate an ordered list of tentative relationships. Second, a Study Module uses a knowledge base (KB) of medicine and statistics to try to establish nonspuriousness by controlling for known confounders. The principal innovations of RADIX are the Study Module and the KB. The Study Privileged Communication 225 E. H. Shortliffe