ONCOCIN Project 5P41-RR00785-14 Thus, in addition to providing an information resource about protocols, the use of a graphically-oriented program provided a way to learn about the software style and hardware used in the workstation version of ONCOCIN. We discontinued the mainframe version of ONCOCIN, and began using the workstation version exclusively. The performance of the mainframe version of ONCOCIN was documented in two evaluation papers that appeared in clinical journals (see Hickam and Kent's papers). We continued our basic research in the design of advanced therapy-planning programs: the ONYX project. We developed a model for planning which includes techniques from the fields of artificial intelligence, simulation, and decision analysis. Artificial intelligence techniques are used to create a small number of possible plans given the ideal therapy and the patient's past treatment history. Simulation techniques and decision analysis are used to examine and order the most promising plans. Our goal is to allow ONCOCIN to give advice in a wider range of situations; in particular, the system should be able to recommend plans for patients who have an unusual Tesponse to chemotherapy. During this year, Stephen Rappaport, M.D. joined us as a programmer on the therapy planning research. Clinical expertise for ONCOCIN was provided by Richard Lenon, M.D. and Robert Carlson, M.D. + Year 8: This year (1986-87) concentrated on two diverse tasks: 1) scaling up the use of the workstation version of ONCOCIN in the clinic, and 2) generalization of each of the components. The latter task is described in the core research sections of this report(see page 19). In 1986, we placed the workstation version of ONCOCIN into the Oncology Day Care clinic. This version is a completely different program from the version of ONCOCIN that ran on the DECsystem 20--using protocols entered through the OPAL program, with a new graphical data entry interface, and a revised knowledge representation and reasoning component. One of the Oncology Clinical Fellows (Andy Zelenetz) became responsible for verifying how well our design goals for ONCOCIN had been accomplished. His suggestions have included the addition of key protocols and the ability to have the program used as a data management tool if the complete treatment protocol had not yet been entered into the system. Both of these suggestions were carried out during this year, and the program has achieved wider use in the clinic setting. In addition, laser-printed flowsheets and progress notes have been added to the clinic system. The process of entering a large number of treatment protocols in a short period of time led to other research topics including: design of an automated system for producing meaningful test cases for each knowledge base, modification of the design and access methods for the time-oriented database, and the development of methods for graphically viewing multiple protocols that are combined into one large knowledge base. These research efforts will continue into the next year. In addition, some of the treatment regimens developed for the original mainframe version are still in use and can be transferred to the new version of ONCOCIN. The process of converting this knowledge will also be undertaken in the next year. As the knowledge base grows, additional mechanisms will be needed for the incremental update and retraction of protocols. Additional changes in the reasoning and interface components of the system are described below. E. H. Shortliffe 126 S5P41-RR00785-14 ONCOCIN Project A new research project related to ONCOCIN was started this last year. We are exploring the use of continuous speech recognition as an alternate entry method for communicating with ONCOCIN. This project requires the connection of speech recognition equipment produced by Speech Systems, Inc. of Tarzana to the ONCOCIN interface module. Christopher Lane has already developed a prototype network connection and command interpreter between the speech module (running on a Sun with special hardware added) and the Xerox 1186 computer that runs ONCOCIN. Clifford Wulfman has designed a series of modifications to the ONCOCIN user interface to allow for verbal commands. Graduate student Danielle Fafchamps has helped to design experiments to elicit how clinicians would like to phrase their requests to ONCOCIN. Janice Rohn is creating a new version of the Librarian program which facilitates the physician's initial communication with the ONCOCIN system (based on the original version by Cliff Wulfman). We continue to collaborate with Andy Zelenetz, Richard Lenon, Robert Carlson, and Charlotte Jacobs on the design and implementation of ONCOCIN in the clinic. Stephen Rappaport has started a residency program to continue his medical education. C.2 Research in Progress Our research in the ONCOCIN project over the last year comprised three major categories: (1) conversion of ONCOCIN to the workstation version, (2) development of a knowledge acquisition interface (OPAL) for entering new protocols, and (3) modeling of the strategic therapy selection process (ONYX). We are now able to explore ways to test the system beyond the Stanford environment. A summary of our current research endeavors follows. C.2.1 Transfer of the ONCOCIN system from the DEC-20 to the Xerox 1100 Series machines During the process of converting to the workstation version of ONCOCIN, we redesigned segments of the program. We have completed the major poruon of that work, and our experience with the new version has suggested additional areas for improving the reasoning techniques and knowledge representation of ONCOCIN. e Redesign of the reasoning component. A major impetus for the redesign of the system was to develop more efficient methods to search the knowledge base during the running of a case. We have implemented a reasoning program that uses a discrimination network to process the cancer protocols. This network provides for a compact representation of information which is common to many protocols but does not require the program to consider and then disregard information related to protocols that are irrelevant to a particular patient. We continue to improve portions of the reasoning component that are associated with reasoning over time; e.g., modeling the appropriate timing for ordering tests and identifying the information which needs to be gathered before the next clinic visit. In general, we are concentrating on improving the representation of the knowledge regarding sequences of therapy actions specified by the protocol. Our experience with adding a large number of protocols has !ed to the evaluation of the design of the internal structure of the knowledge base (e.z., the way we describe the relationships between chemotherapies, drugs, and treatment visits). We will continue ‘co improve the method for traversing 127 E. H. Shortliffe ONCOCIN Project 5P41-RRO0785-14 the plan structure in the knowledge base, and consider alternative arrangements for representing the structure of chemotherapy plans. Currently, the knowledge base of treatment guidelines and the patient database are separated. We propose to tie these two structures closer together. Additional work is anticipated on turning ONCOCIN into a critiquing system, where the physician enters their therapy and ONCOCIN provides suggestions about possible alternatives to the entered therapy. Although we have concentrated our review of the ONCOCIN design primarily on the data provided by additional protocols, we know that non- cancer therapy problems may also raise similar issues. The E-ONCOCIN effort is designed to produce a domain-independent therapy planning system that includes the lessons learned from our oncology research. Samson Tu is primarily responsible for continued improvement of the reasoning component of ONCOCIN. - Development of a temporal network. The ability to represent temporal information is a key element of programs that must reason about treatment protocols. The earlier version of the ONCOCIN system did not have an explicit structure for reasoning about time-oriented events. We are experimenting with different configurations of the temporal network, and with the syntax for querying the network. We are also adapting this network so that it can interface with the ONYX therapy-planning systems. This research on temporal reasoning is part of Michael Kahn's Ph.D. thesis. Michael is a student in the Medical Information Sciences Program at University of California at San Francisco. » Extensions to the user interface. We continue to experiment with various configurations of the user interface. Many of the changes have been in response to requests for a more flexible data management environment. We are occasionally faced with data that becomes available corresponding to a time before the current visit. This can happen if a laboratory result is delayed, or a patient's electronic flowsheet is started in the middle of the treatment. We have added the ability to create new columns of data, and are designing the changes to the temporal processing components of ONCOCIN to allow for data that is inserted out of order. We have also extended the flowsheet to allow for patient specific parameters (e.g., special test tesults or symptoms) that the physician wishes to follow over time. The flowsheet layouts have been modified to create protocol specific flowsheets, eg., lymphoma flowsheets have a different configuration than lung cancer flowsheets. The basic structure of the interface has been modified to use object-oriented methods, which allows for more flexible interaction between different components of the flowsheet and the onerations performed on the flowsheet. A continuing area of research concerns how to guide the user to the most appropriate items to enter (based on the needs of the reasoning program) without disrupting the fixed layout of the flowsheet. The mainframe version of ONCOCIN modified the order of items on the flowsheet to extract necessary information from the user. In the workstation version, we have developed a guidance mechanism which alerts the user to items that are needed by the reasoning program. The user is not required to deviate from a preferred order of entry nor required to respond to a question for which no current answer is available. Cliff Wulfman is primarily responsible for improvements to the user interface of ONCOCIN. » System support for the reorganization. The LISP language, which we used to E. H. Shortliffe 128 5P41-RR00785-14 ONCOCIN Project build the first version of ONCOCIN, does not explicitly support basic knowledge manipulation techniques (such as message passing, inheritance techniques, or other object-oriented programming structures). These facilities are available in some commercial products, but none of the existing commercial implementations provide the reliability, speed, size, or special memory-manipulation techniques that are needed for our project. We have therefore developed a "minimal" object-oriented system to meet our specifications. The object system is currently in use by each component of the new version of ONCOCIN and in the software used to connect these components. In addition, all ONCOCIN student projects are now based on this programming environment. Christopher Lane created and is responsible for modifications to the object-oriented system. C.2.2 Interactive Entry of Chemotherapy Protocols by Oncologists (OPAL) A major effort in this grant year has been the continued development and testing of software (the OPAL system) that will permit physicians who are not computer programmers to enter protocol information on a structured set of forms presented on a graphics display. Most expert systems require tedious entry of the system's knowledge. In many other medical expert systems, each segment of knowledge is transferred from the physician to the programmer, who then enters the knowledge into the expert system. We have taken advantage of the generally well-structured nature of cancer treatment plans to design a knowledge entry program that can be used directly by clinicians. The structure of cancer treatment plans includes: » choosing among multiple protocols (that may be related to each other); describing experimental research arms in each protocol; « specifying individual drugs and drug combinations; e setting the drug dosage level; - and modifying either the choice of drugs or their dosage. Using the graphics-oriented workstations, this information is presented to the user as computer-generated forms which appear on the screen. After the user fills in the blanks on the forms, the program generates the rules used to drive the reasoning process. As the user describes more detailed aspects of the protocol, new forms are added to the computer display; these allow the user to specify the special cases that make the protocols so complicated. Although the user is unaware of the creation of the knowledge base from the interaction with OPAL, a complex set of transiations are taking place. The user's entries are mapped into an intermediate data structure (IDS) that is common for all protocots. From the IDS, a translation program generates rules for creating and modifying treatment, and integrates them with the existing ONCOCIN knowledge base. Improving the design of the IDS and the rule translation programs will be a major research effort of this year. Although the “forms” were specifically designed for cancer treatment plans, the techniques used to organize data can be extended to other clinical trials, and eventually to other structured decision tasks. The key factor is to exploit the regularities in the structure of the task (e.g., this interface has an extensive notion of how chemotherapy regimens are constructed) rather than to try to build a knowledge-entry system that can accept any possible problem specification. The OPAL program is based upon a domain-independent forms creation package designed and implemented by David Combs. This program will provide the basis for our extension of OPAL to other application areas. 129 E. H. Shortliffe ONCOCIN Project 5P41-RRO0785-14 We have now entered thirty-five protocols covering many different organ systems and styles of protocol design (increased from 6 in last year's annual report). Based on this experience, we are modifying: OPAL to increase the percentage of the protocol that can be entered directly by our clinical collaborators. One direction in which we have extended the OPAL program is in providing a graphical interface of nodes and arcs to specify the procedural knowledge about the order of treatments and important decision points within the treatments. This work is described in several papers by Musen. C.2.3 Strategic Therapy Planning (ONYX) As mentioned above, we have continued our research project (ONYX) to study the therapy-planning process and to determine how clinical strategies are used to plan therapy in unusual situations. Our goals for ONYX are: (1) to conduct basic research into the possible representations of the therapy-planning process, (2) to develop a computer program to represent this process, and (3) eventually to interface the planning program with ONCOCIN. We have worked with our clinical collaborators to determine how to create therapy plans for patients whose special clinical situation preclude following the standard therapeutic plan described in the protocol document. The prototype program design has four components: (1) to review the patient's past record and recognize emerging problems, (2) to formulate a small number of revised therapy plans based on existing problems, (3) to determine the results of the generated plans by using simulation, and (4) to weight the results of the simulation and rank order the plans by performing decision analysis. This model is described in the papers by Langlotz. We have built an expert system based on decision analytic techniques as part of the solution to the fourth step of the ONYX planning problem. The program carries oul a dialogue with the user concerning the particular treatment choices to be compared, potential problems with the treatments, and the patient-specific utilities corresponding to the possible outcomes. A decision tree is automatically created, displayed on the screen, and solved. The solution is presented to the user, and is compatible with a explanation program for decision trees being developed as part of the Ph.D. research of Curtis Langlotz. C.2.4 Documentation In 1986, we videotaped a lecture and demonstration of the ONCOCIN and OPAL systems at the XEROX Palo Alto Research Center. This videotape is available for loan from our offices. Our previous videotapes have been shown at scientific meetings and have been distributed to many researchers in other countries. The publications described below further document our recent wotk on ONCOCIN. C.2.5 Dissemination We are planning experimental installation of ONCOCIN workstations in private oncology offices in San Jose and San Francisco. An application proposing this project is currently under review. D. Publications Since January, 1986 1. Musen, M.A., Rohn, J.A., Fagan, L.M., and Shortliffe, E.H. Knowledge engineering for a clinical trial advice system: Uncovering errors in protocol specification (Memo KSL-85-51). Proceedings of AAMSI Congress 86 (A. Levy and B. Williams, eds.), pp. 24-27, Anaheim, 8-10 May 1986. 2. Langlotz, C.P., Fagan, L.M., and Shortliffe, ELH. Overcoming limitations of E. H. Shortliffe 130 5P41-RR00785-14 ONCOCIN Project artificial intelligence planning techniques. Memo KSL-85-52. Proceedings of AAMSI Congress 86 (A. Levy and B. Williams, eds.), pp. 92-96, Anaheim, 8-10 May 1986. 3. Musen, M.A., Fagan, L.M., and Shortliffe, E.H. Graphical specification of procedural knowledge for an expert system. Memo KSL-85-53. Presented at the Second IEEE Computer Society Workshop on Visual Languages, pp. 167-178, Dallas, TX, June 1986. Reprinted in Expert Systems: The User Interface (J. Hendler, ed.). Norwood, NJ: Ablex Publishing Company, 1987. 4. Langlotz, C.P., Fagan, L.M., Tu, S.W., Sikic, B.I., and Shortliffe, E.H. A therapy planning architecture that combines decision theory and artificial intelligence techniques. KSL-85-55. Submitted for publication, November 1986. 5. Combs, D.M., Musen, M.A., Fagan, L.M., and Shortliffe, E.H. Graphical entry of procedural and inferential knowledge. Memo KSL~-85-56. Proceedings of AAMSI Congress 86 (A. Levy and B. Williams, eds.), pp. 298-302, Anaheim, 8-10 May 1986. 6. Lane, C.D., Frisse, M.E., Fagan, L.M., and Shortliffe, E.H. Object-oriented graphics in medical interface design. Memo KSL-85-58. Proceedings of AAMSI Congress 86 (A. Levy and B. Williams, eds.), pp. 293-297, Anaheim, 8-10 May 1986. 7. Musen, M.A., Fagan, L.M., Combs, D.M., and Shortliffe, E.H. Facilitating knowledge entry for an oncology therapy advisor using a model of the application area. Memo KSL-86-1. Proceedings of MEDINFO-86, pp. 46-50, Washington, D.C., October 1986. 8. Langlotz, C.P., Fagan, L.M., Tu, S.W., Sikic, B1., and Shortliffe, E.H. Combining artificial intelligence and decision analysis for automated therapy planning assistance. Memo KSL-86-3. Proceedings of MEDINFO-86, pp. 794-798, Washington, D.C., October 1986. . 9. Kahn, M.G., Fagan, L.M., and Shortliffe, E.H. Context-specific interpretation of patient records for a therapy advice system. Memo KSL-86-4. Proceedings of MEDINFO-86, pp. 175-179, Washington, D.C., October 1986. 10. Musen, M.A., Fagan, L.M., Combs, D.M., and Shortliffe, E.H. Use of a domain model to drive an interactive knowledge-editing tool. Memo KSL-86-24. To appear in the International Journal of Man-Machine Studies, 1987. 11. Langlotz, C.P., Shortliffe, E.H., and Fagan, L.M. Using decision theory to justify heuristics. Memo KSL-86-26. Proceedings of AAAI-86, pp. 215-219, Philadelphia, August 1986. 12. Shortliffe, E.H. Artificial Intelligence in Management Decisions: ONCOCIN. Memo KSL-86~39. Proceedings of a Conference on Medical [Information Sciences, University of Texas Health Sciences Center at San Antonio, July 1985. To appear in Frontiers of Medical Information Sciences, Praeger Publishing, 1986. 13. Lane, C. The Ozone (O3) Reference Manual. KSL-86-40, July 1986. 131 E. H. Shortliffe ONCOCIN Project 5P41-RRO00785-14 14. Musen, M.A., Combs, D.M., Walton, J.D., Shortliffe, E.H., and Fagan, L.M. OPAL: Toward the computer-aided design of oncology advice systems. Memo KSL-86-49. Proceedings of the Tenth Annual Symposium on Computer Applications in Medical Care, pp. 43-52, Washington, D.C., October 1986. Reprinted in Topics in Medical Artificial Intelligence (P.L. Miller, ed.), New York: Springer-Verlag, 1987. 15. Shortliffe, ELH. Medical expert systems: Knowledge tools for physicians. Memo KSL-86-52. Special issue on Medical Informatics, West. J. Med. 145:830-839, 1986. 16. Shortliffe, E.H. Medical expert systems research at Stanford University. Memo KSL-86-53. Presented at the Twentieth IBM Computer Science Symposium, Shizuoka, Japan, October 1986. 17. Langlotz, C.P., Shortliffe, E.H., and Fagan, L.M. A methodology for computer-based explanation of decision analysis. Working paper, KSL~-86-57, November 1986. 18. Shortliffe, ELH. Computers in support of clinical decision making. Memo KSL-87-25, 1986. To appear in Lippincott’s forthcoming Textbook of Internal Medicine (W.N. Kelley, ed.). 19. Langlotz, C.P. and Shortliffe, EH. The relationship between decision theory and default reasoning. Working paper KSL-87-17, 1987. 20. Shortliffe, ELH. Computer programs to support clinical decision making. Memo KSL-87-30. To appear in JAMA, July 1987. E. Funding Support Grant Title: "Therapy-planning strategies for consultation by computer” Principal Investigator: Edward H. Shortliffe Project Management: Lawrence M. Fagan Agency: National Library of Medicine ID Number: LM-04136 Term: April 1987 to March 1990 Total award: $380,123 Grant Title: "Knowledge Management for Clinical Trial Advice Systems” Principal Investigator: Edward H. Shortliffe Project Management: Lawrence M. Fagan Agency: National Library of Medicine ID Number: 1 RO] LM04420-01 Term: September 1985 through August 1988 Total award: $314,707 Grant Title: Postdoctoral Training in Medical Information Science Principal Investigator: Edward H. Shortliffe Project Management: Edward H. Shortliffe Agency: National Library of Medicine ID Number: 1 T32 LM07033 Term: July 1, 1984 - June 30, 1989 Total award: $903,718 Grant Title: Henry J. Kaiser Faculty Scholar in General Internal Medicine E. H. Shortliffe 132 5P41-RR00785-14 ONCOCIN Project Principal Investigator: Edward H. Shortliffe Agency: Henry J. Kaiser Family Foundation Term: July 1983 to June 1988 Total award: $250,000 ($50,000 annually). Grant Title: Explanation of Computer-assisted therapy plans Principal Investigator: Lawrence M. Fagan Agency: National Institutes of Health ID Number: 1 R23 LM04316 Term: 2/1985-1/1988 Total award: $107,441 Il. INTERACTIONS WITH THE SUMEX-AIM RESOURCE A. Medical Collaborations and Program Dissemination via SUMEX A great deal of interest in ONCOCIN has been shown by the medical, computer science, and lay communities. We are frequently asked to demonstrate the program to Stanford visitors. We also demonstrated our developing workstation code in the Xerox exhibit in the trade show associated with AAAI-84 in Austin, Texas, IJCAI-85 in Los Angeles, AAAI-86 in Philadelphia, and Medinfo 86. Physicians have generally been enthusiastic about ONCOCIN’s potential. The interest of the lay community is reflected in the frequent requests for magazine interviews and television coverage of the work. Articles about MYCIN and ONCOCIN have appeared in such diverse publications as Time and Fortune, and ONCOCIN has been featured on the "NBC Nightly News,” the PBS "Heaith Notes” series, and "The MacNeil-Lehrer Report.” Most recently it appeared in a special on Artificial Intelligence for TV Ontario (Canadian PBS station). Due to the frequent requests for ONCOCIN demonstrations, we have produced a videotape about the ONCOCIN research which includes demonstrations of our professional workstation research projects and the 2020-based clinic system. The tape has been shown at several national meetings, including the 1984 Workshop on Artificial Intelligence in Medicine, the 1984 meeting of the Society for Medical Decision Making, and the 1985 meeting of the Society for Research and Education in Primary Care Internal Medicine. The tape has also been shown to both national and international researchers in biomedical computing. We have also completed an updated tape. Our group also continues to oversee the MYCIN program (not an active research project since 1978) and the EMYCIN program. Both systems continue to be in demand as demonstrations of expert systems technology. MYCIN has been demonstrated via networks at both national and international meetings in the past. and several medical school and computer science teachers continue to use the program in their computer science or medical computing courses. Researchers who visit our laboratory often begin their introduction by experimenting with the MYCIN/EMYCIN systems. We also have made the MYCIN program available to researchers around the world who access SUMEX using the GUEST account. EMYCIN has been made available to interested researchers developing expert systems who access SUMEX via the CONSULT account. One such consultation system for psychopharmacological treatment of depression, called Blue-Box (developed by two French medical students, Benoit Mulsant and David Servan-Schreiber), was reported in July of 1983 in Computers and Biomedical Research. B. Sharing and Interaction with Other SUMEX-AIM Projects The community created on the SUMEX resource has other benefits which go beyond actual shared computing. Because we are able to experiment with other developing systems, such as INTERNIST/CADUCEUS, and because we frequently interact with 133 E. H. Shortliffe ONCOCIN Project 5P41-RR00785-14 other workers (at AIM Workshops or at other meetings), many of us have found the scientific exchange and stimulation to be heightened. Several of us have visited workers at other sites, sometimes for extended periods, in order to pursue further issues which have arisen through SUMEX- or workshop-based interactions. In this regard, the ability to exchange messages with other workers, both on SUMEX and at other sites, has been crucial to rapid and efficient dissemination of ideas. Certainly it is unusual for a small community of researchers with similar scholarly interests to have at their disposal such powerful and efficient communication mechanisms, even among those researchers on opposite coasts of the country. During this past two years, we have had extensive interactions with Randy Miller at Pittsburgh. Via floppy disks and SUMEX, we have experimented with several versions of the QMR program. The interaction was very much facilitated by the availability of SUMEX for communication and data transmission. C. Critique of Resource Management Our community of researchers has been extremely fortunate to work on a facility that has continued to maintain the high standards that we have praised in the past. The staff members are always helpful and friendly, and work as diligently to please the SUMEX community as to please themselves. As a result, the computer is as accessible and easy-to-use as they can make it. More importantly, it is a reliable and convenient research tool. We extend special thanks to Tom Rindfleisch for maintaining such high professional standards. As our computing needs grow, we have increased our dependence on special SUMEX skills such as networking and communication protocols. Ill. RESEARCH PLANS A. Project Goals and Plans In the coming year, there are several areas in which we expect to expend our efforts on the ONCOCIN System: l. Development of a workstation model for cost-effective dissemination of clinical consultation systems. To >>2et this specific aim we will continue the basic and applied programmin: -fforts (ONCOCIN, OPAL, and ONYX) described earlier in this report. 2.To encode and implement for use by ONCOCIN the commonly used chemotherapy protocols from our oncology clinic. In the upcoming year, we will: « Extend the OPAL protocol entry system - Continue entry of additional protocols at the rate of one protocol/month (including testing) 3. To continue testing of the workstation version of ONCOCIN. 4. To generalize the reasoning and interaction components of the ONCOCIN system for other applications. B. Justification and Requirements for Continued SUMEX Use All the work we are doing (ONCOCIN plus continued use of the original MYCIN program) continues to be dependent on daily use of the SUMEX resource. Although much of the ONCOCIN work has shifted to Xerox workstations, the SUMEX 2060 and E. H. Shortliffe 134 5P41-RR00785-14 ONCOCIN Project the 2020 continue to be key elements in our research plan. The programs all make assumptions regarding the computing environment in which they operate. In addition, we have long appreciated the benefits of GUEST and network access to the programs we are developing. SUMEX greatly enhances our ability to obtain feedback from interested physicians and computer scientists around the country. Network access has also permitted high quality format demonstrations of our work both from around the United States and from sites abroad (e.g., Finland, Japan, Sweden, Switzerland). The main development of our project will continue to take place on LISP machines which we have purchased or which have been donated by the XEROX Corporation. C. Requirements for Additional Computing Resources The acquisition of the DEC 2020 by SUMEX was crucial to the growth of our research work. It ensured high quality demonstrations and has enabled us to develop a system (ONCOCIN) for real-world use in a clinical setting. As we have begun to develop systems that are potentially useful as stand-alone packages (ie, an exportable ONCOCIN), the addition of personal workstations has provided particularly valuable new resources. We have made a commitment to the smaller Interlisp-D machines ("D- machines”) produced by Xerox, and our work will increasingly transfer to them over the next several years. Our current funding supports our effort to implement ONCOCIN on workstations in the Stanford oncology clinic (and eventually to move the program to non-Stanford environments), but we will simultaneously continue to require access to Interlisp. on upgraded workstations for extremely CPU-intensive tasks. Although our dependence on SUMEX for workstations has decreased due to a recent gift from XEROX, our requirements for network support of the machines has drastically increased. Individual machines do not provide sufficient space to store all of the software used in our project, nor to provide backup or long-term storage of work in progress. It is the networks, file storage devices, protocol converters, and other parts of the SUMEX network that hold our project together. In addition, with a research group of about 20 people, we are taking advantage of file sharing, electronic mail, and other information coordinating activities provided by the DEC 2060. We hope that with systems support and research by SUMEX staff, we will be able to gradually move away from a need for the central coordinating machine over the next five years. The acquisition of the DEC 2060, coupled with our increasing use of workstations, has greatly helped with the problems in SUMEX response time that we had described in previous annual reports. We are extremely grateful for access both to the centrai machine and to the research workstations on which we are currently building the new ONCOCIN prototype. The D-machine's greater address space is permitting development of the large knowledge base that ONCOCIN requires. The graphics capability of the workstations has also enabled us to develop new methods for presenting material to naive users. [n addition, the workstations have provided a reliable, constant “load- average” machine for running experiments with physicians and for development work. The development of ONCOCIN on the D-machine will demonstrate the feasibility of running intelligent consultation systems on small, affordable machines in physicians’ offices and other remote sites. D. Recommendations for Future Community and Resource Development SUMEX is providing an excellent research environment and we are delighted with the help that SUMEX staff have provided implementing enhanced system features on the 2060 and on the workstations. We feel that we have a highly acceptable research environment in which to undertake our work. Workstation availability is becoming increasingly crucial to our research, and we have found over the past year that workstation access is at a premium. The SUMEX staff has been very helpful and understanding about our needs for workstation access, allowing us D-machine use 135 E. H. Shortliffe ONCOCIN Project 5P41-RRO00785-14 wherever possible, and providing us with systems-level support when needed. We look forward to the arrival of additional advanced workstations and the development of a more distributed computing environment through SUMEX-AIM. E. Responses to Questions Regarding Resource Future 1. "What do you think the role of the SUMEX-AIM resource should be for the period after 7/86, e.g., continue like it is, discontinue support of the central machine, act as a communications crossroads, develop software for user community workstations, etc.?” We believe that the trend towards distributed computing that characterized the early 1980's will continue during the second half of the decade. Although we have begun this process by moving much of our research activity to LISP machines, the SUMEX DEC-20 continues to be a major source of support for all communication, collaboration, and administrative functions. It also continues to provide a quality LISP environment for rapid prototyping, student projects in the early stages before workstations are made available, and for demonstrating system features to people at a distance. These latter functions are still not well handled by distributed machines, and we believe that a logical role for the resource in the future is to develop software and communications techniques that will allow us to further decrease our dependence on the large central machine. 2. "Will you require continued access to the SUMEX-AIM 2060 and if so, for how long?" As indicated above, our needs could still be met with a gradual phaseout of the 2060 over the next 3-5 years, provided that current services such as file handling and backup, mail, document preparation, and advanced network support are available from other machines (e.g., SAFE file server plus the Medical Computer Science file server). This implies maintenance of an ARPANET connection, connections to other campus machines, and facilities for linking together the heterogeneous collection of computing equipment upon which our research group depends. SUMEX would need to concentrate on providing software support for networks and systems software for workstations if it were to provide the same level of service we now experience while moving to a fully distributed environment. 3. "What would be the effect of imposing fees for using SUMEX resources (computing and communications) if NIH were to require this?” Since all our research is NIH-supported, we see nothing but administrative headaches without benefits if there were to be a move to require fee-for- service billing for access to shared SUMEX resources. The net effect would simply be a transfer of funds from one arm of NIH to another (assuming that the agencies that currently fund our work could supplement our grants to cover SUMEX charges), and there would be a simultaneous restraining effect on the research environment. The current scheme permits experimentation and flexibility in use that would be severely inhibited if all access incurred an incremental charge. 4. "Do you have plans to move your work to another machine workstation and if so, when and to what kind of system?” E. H. Shortliffe 136 SP41-RRO00785-14 ONCOCIN Project As mentioned above, and described in greater detail in our annual report, we are making a major effort to move much of our research activity to LISP machines (currently Xerox 1108's, 1186's and HP-9836's). Our familiarity with this technology, and our commitment to it, have resulted solely from the foresight of the SUMEX resource in anticipating the technology and providing for it at the time of their last renewal. However, for the reasons mentioned above, we continue to depend upon the central communication node for many aspects of our activities and could effectively adapt to its demise only if the phaseout were gradual and accompanied by improved support for a totally distributed computing environment. 137 E. H. Shortliffe ONCOCIN Project 5P41-RRO00785-14 IV.A.4. PROTEAN Project PROTEAN Project Oleg Jardetzky Nuclear Magnetic Resonance Lab, School of Medicine Stanford University Bruce Buchanan, Ph.D. Computer Science Department Stanford University I. SUMMARY OF RESEARCH PROGRAM A, Project Rationale The goals of this project are related both to biochemistry and artificial intelligence: (a) use existing AI methods to aid in the determination of the 3-dimensional structure of proteins in solution (not from x-ray crystallography proteins), and (b) use protein structure determination as a test problem for experiments with the AI problem solving structure known as the Blackboard Model. Empirical data from nuclear magnetic resonance (NMR) and other sources may provide enough constraints on structural descriptions to allow protein chemists to bypass the laborious methods of crystallizing a protein and using X-ray crystallography tc. determine its structure. This problem exhibits considerable complexity, yet there is reason to believe that AI programs can be written that reason much as experts do to resolve these difficulties [12]. B. Medical Relevance The molecular structure of proteins is essential for understanding many problems of medicine at the molecular level, such as the mechanisms of drug action. Using NMR data from proteins in solution will allow the study of proteins whose structure cannot be determined with other techniques, and will decrease the time needed for the determination. C. Highlights of Progress During the past year, we have expanded our initial prototype program, called PROTEAN, designed on the biackboard model. It is implemented in BB1 (discussed in the Core AI Research section of this report), a framework system for building blackboard systems that control their own problem-solving behavior. The reasoning component of PROTEAN directs the actions of the Geometry System (GS), a set of programs that performs the computationally intensive task of positioning portions of a molecule with respect to each other in three dimensions. The GS runs in the UNIX environment on a Silicon Graphics IRIS 3020 graphics workstation, which provides computing performance comparable to a VAX 11/780 for our task. The teasoning program (in Lisp in BB1) is coupled to the GS by a local area computer network, maintained by SUMEX. Pictures of the results of GS computations are displayed on the graphics screen of the IRIS workstation, using a locally developed program called DISPLAY to draw the evolving protein structures at several levels of detail, The DISPLAY program can be used to view structures generated by the GS either under the direct control of the user or as directed by the reasoning system running in BBl. MIDAS and MMS are two E. H. Shortliffe 138 5P41-RR00785-14 ONCOCIN Project other molecular modeling and display systems to manipulate protein structures, particularly those obtained from crystallographic techniques as found in the Protein Data Bank. The ability to observe structures in three dimensions is essential to understanding the behavior of the PROTEAN's reasoning and geometry systems and provides essential insights on the problem solving process. PROTEAN embodies the following experimental techniques for coping with the complexities of constraint satisfaction: 1. The problem-solver partitions each problem into a network of loosely- coupled sub-problems. PROTEAN first positions individual pieces of structures and their immediate neighbors within local coordinate systems. It subsequently composes the most constrained partial solutions developed for these sub-problems in a complete solution for the entire protein. This partitioning and composition technique reduces the combinatorics of search. 2. The problem-solver attempts to solve sub-problems and coordinate solutions at multiple levels of abstraction. For example, PROTEAN operates at two levels of abstraction. At the “Solid” level, it positions elements of the protein's secondary structure: alpha-helices, beta-sheets, and coils. At the “Atom” level, it positions the protein's individual atoms. Partial solutions at the solid level reduce the combinatorics of search at the lower level. Conversely, tightly constrained partial solutions at the lower level introduce new constraints on solid level solutions. 3. The problem-solver preserves the "family" of solutions consistent with ail constraints applied thus far. For example, in positioning a helix within a partial solution, PROTEAN does not attempt to identify a unique spatial position for the helix. Instead, it identifies the entire spatial volume within which the helix might lie, given the constraints applied thus far. Preserving the family of legal solutions accommodates problems with incomplete constraints; the solution is constrained only as the data indicate. It also accommodates incompatible constraints by permitting disjunctive sub- families, which may be necessary for flexible proteins. 4. The problem-solver applies constraints one at a time, successively restricting the family of solutions hypothesized for different sub-problems. PROTEAN successively applies constraints on the positions of protein structures, restricting spatial volumes within which they may lie. This allows the different kinds of constraints to be applied by integrating their effects on a family of solutions. 5. The problem-solver tolerates overlapping solutions for different sub- problems. For example, in identifying the volume within which structure-a might lie in partial solution 1, PROTEAN may include part of the volume identified for structure-b. Overlapping volumes for two structures indicate either: (a) that the two structures actually occupy disjoint sub-volumes that cannot be distinguished within the larger, overlapping volumes identified for them because the constraints are incomplete; or (b) that the two structures are mobile and alternately occupy the shared volume. 6. The problem-solver reasons explicitly about control of its own problem- solving actions: which sub-problems it will attack, which partial solutions it will expand, and which constraints it will apply. Control reasoning guides the problem-solver to perform actions that minimize computation, while maximizing progress toward a complete solution. It also provides a foundation for the problem-soiver's explanation of problem-solving 139 E. H. Shortliffe ONCOCIN Project 5P41-RRO0785-14 activities and intermediate partial solutions and for its learning of new control heuristics. Multiple blackboards in PROTEAN allow several sets of knowledge to be used. A biochemical knowledge base stores information about proteins and secondary structures, amino acids, and atoms. A concept blackboard describes a concept hierarchy of natural types, object types, role types, contexts, constraint types, and problem solving methods. The ACCORD language blackboard explicitly represents the actions that can be taken in the language for arrangement assembly problems. The problem blackboard describes the protein to be solved and all experimental data observed for the molecule. Finally, the evolving solution of the protein structure is built on a third solution blackboard. PROTEAN determines the structure of a protein by assembling the protein from components at several levels of detail. Initially, the major secondary structures of the protein are positioned relative to each other by considering them as solid structures, ignoring the side chains of the amino acids and representing constraints with respect to atoms of the protein backbone. This solid level approximation is sufficient to determine the overall shape of the molecule, but leaves details of the structure indistinct. Second, an atomic level representation of the protein including side chains is used with more precise distance, bond length, and bond angle constraints to remove chemically infeasible structures generated at the solid level. The atomic level description allows a more detailed description of the structure, at the cost of larger numbers of components to consider and increased computation time. The reasoning component of PROTEAN includes domain and control knowledge sources for the assembly of a protein. Each domain knowledge source directs a smail portion of the construction of the molecule. These knowledge sources develop partial solutions that position alpha helices, beta strands, and coils at the solid /evel and refine the resulting state families using all available distance constraints. Control knowledge sources determine which of the possible assembly actions is the best to perform at each stage of the problem solving. We have built a first extension to PROTEAN that assembles a protein at the level of the atomic backbone. The facilities available include programs to manipulate protein data bank files and generate test data automatically, use atomic level constraints to prune solid level solutions, generate example instances of the protein backbone from the solid level structures, and generate candidate structures for unstructured coil segments of a protein. Work is in progress to combine the atomic level of assembly with the solid level to provide additional constraints at the more abstract level of assembly. The PROTEAN system has been used to construct a complete solution at the solid level of detail for the Lac-repressor headpiece, a protein with fifty-one amino acids consisting of four coil sections and three alpha helices. In this work, the constraints were determined experimentally from NMR studies. In addition to the Lac-repressor headpiece protein, we have applied PROTEAN to sperm whale myoglobin, T4 lysozyme, and cytochrome B. Each of these latter proteins has a known crystal structure. In each case, we extracted features of the protein structure and distance constraints from the crystal structure to build data sets for PROTEAN. We then applied the PROTEAN system to the resulting data sets to determine the behavior of the system with different kinds of input. To determine the correctness and capabilities of the PROTEAN method, we have applied PROTEAN to sperm whale myoglobin, a molecule whose crystal structure is known. In this test, we used distance constraints that would be measured as NOEs, overall size information, and the interaction between the heme group and the amino acids. We also systematically explored the dependence of the precision and accuracy of E. H. Shortliffe 140 5P41-RR00785-14 ONCOCIN Project the solutions on the quality of the input data available. In all cases, the solutions obtained from PROTEAN enclose the actual structure of the molecule, with the best results coming from data that includes many short range constraints. We have also defined representations for structures such as the heme group in myoglobin and other cofactors that can be used in constraint satisfaction operations to further restrict the positions of the secondary structures in the protein. The PROTEAN system takes the secondary structure as input. For molecules in solution, the extent of the helical, sheet, and unstructured coil segments of a protein is derived largely from NMR data between backbone and side chain hydrogen atoms. We have developed a knowledge-based system called ABC that uses heuristic knowledge and NMR data to automate this important step in protein structure determination. ABC is implemented using the BBl1 blackboard architecture. In addition to solving the secondary structure classification problem, ABC provides a flexible and extensible framework for experimenting with identification methods for secondary structures as well as for data interpretation and pattern recognition techniques. Work is proceeding on several aspects of the protein structure problem, including assembly of several partial arrangements and integration of these pieces of solution into larger structures, using atomic level volume exclusion of atoms and information on sidechain packing to produce more precise atomic level solutions, and developing more appropriate representations for unstructured coil sections of proteins. D. Relevant Publications 1. Altman, R. and Jardetzky, O.. New strategies for the determination of macromolecular structures in solution. Journal of Biochemistry (Tokyo), Vol. 100, No. 6, p. 1403-1423, 1986. 2. Altman, R. and Buchanan, B.G.: Partial Compilation of Control Knowledge. To appear in Proceedings of the AAAI 1987. 3. Brinkley, J., Cornelius, C., Altman, R., Hayes-Roth, B. Lichtarge, O., Duncan, B., Buchanan, B.G., Jardetzky, O. Application of Constraint Satisfaction Techniques to the Determination of Protein Tertiary Structure. Report KSL-86-28, Department of Computer Science, 1986. 4. Brinkley, James F., Buchanan, Bruce G., Altman, Russ B., Duncan, Bruce S., Cornelius, Craig W.: A Heuristic Refinement Method for Spatial Constraint Satisfaction Problems. Report KSL 87-05, Department of Computer Science. 5. Buchanan, B.G., Hayes-Roth, B., Lichtarge, O., Altman, A., Brinkley, J., Hewett, M., Cornelius, C., Duncan, B., Jardetzky, O.:.The Heuristic Refinement Method for Deriving Solution Structures of Proteins, Report KSL-85-41. October 1985. 6. Garvey, Alan, Cornelius, Craig, and Hayes-Roth, Barbara: Computational Costs versus Benefits of Control Reasoning. Report KSL 87-11, Department of Computer Science. 7. Hayes-Roth, B: The Blackboard Architecture: A General Framework for Problem Solving? Report HPP-83-30, Department of Computer Science, Stanford University, 1983. 8. Hayes-Roth, B: BBI: An Environment for Building Blackboard Systems that Control, Explain, and Learn about their own Behavior. Report HPP-84-16, Department of Computer Science, Stanford University, 1984. 141 E. H. Shortliffe ONCOCIN Project 5P41-RROO785-14 9. Hayes-Roth, B: A Blackboard Architecture for Control. Artificial Intelligence 26:251-321, 1985. 10. Hayes-Roth, B. and Hewett, M.: Learning Control Heuristics in BBI. Report HPP-85-2, Department of Computer Science, 1985. 11. Hayes-Roth, B., Buchanan, B.G., Lichtarge, O., Hewett, M., Altman, R., Brinkley, J., Cornelius, C., Duncan, B., and Jardetzky, O: PROTEAN: Deriving protein structure from constraints. Proceedings of the AAAI, 1986, p. 904-909. 12. Jardetzky, O.. A Method for the Definition of the Solution Structure of Proteins from NMR and Other Physical Measurements: The LAC-Repressor Headpiece. Proceedings of the International Conference on the Frontiers of Biochemistry and Molecular Biology, Alma Alta, June 17-24, 1984, October, 1984. 13. Lichtarge, Olivier: Structure determination of proteins in solution by NMR. Ph.D. Thesis, Stanford University, November, 1986. 14. Lichtarge, Olivier, Cornelius, Craig W., Buchanan, Bruce G., Jardetzky, Oleg: Validation of the First Step of the Heuristic Refinement Method for the Derivation of Solution Structures of Proteins from NMR Data., April 1987. Submitted to Proteins: Structure, Function, and Genetics. E. Funding Support Title: Interpretation of NMR Data from Proteins Using AI Methods PI's: Oleg Jardetzky and Bruce G. Buchanan Agency: National Science Foundation Grant identification number: DMB-8402348 Total Award Period and Amount: 2/1/87 - 9/30/89 $120,000 (includes direct and indirect costs) Current award period and amount: 2/1/87 - 9/30/89 $120,000 (includes direct and indirect costs) The following grants and contracts each provide partial funding for PROTEAN personnel. Title: Modeling Exper Control PI: Bruce G. Buchanan Agency: Office of Naval Research Grant Identification Number: ONR NQ0014-86-K-0652 Total award period and amount: 6/1/85 - 5/31/85, $96,879 (direct and indirect) E. H. Shortliffe 142 5P41-RR00785-14 ONCOCIN Project Current award period and amount: 6/1/85 - 5/31/85, $96,879 (direct and indirect) PROTEAN component is $48,440 (direct & indirect) or 50% of grant Title: Research on Blackboard Problem-Solving Systems PI's: Edward A. Feigenbaum and Bruce G. Buchanan Agency: Boeing Computer Services Corporation Grant identification number: W-271799 Total award period and amount: 8/1/86 - 7/31/87, $245,432 (direct and indirect) Current award period and amount: 8/1/86 - 7/31/87, $245,432 (direct and indirect) PROTEAN component is $12,730 (direct & indirect) or 5% of grant Title: Knowledge-Based Systems Research Pi: Edward A. Feigenbaum Agency: Defense Advanced Projects Research Agency Grant identification number: N0Q0039-86-0033 Total award period and amount: 10/1/85 - 9/30/88 $4,130,230 (in negotiation) (direct and indirect) Current award period and amount: 10/1/86 - 9/30/87 $1,549,539 (direct and indirect) PROTEAN component is $29031, or 1.9 % of grant total lf. INTERACTIONS WITH THE SUMEX-AIM RESOURCE A. Medical Collaborations Several members of Prof. Jardetzky's research group are involved in this research. B. Interactions with other SUMEX-AIM projects We are occasionally in contact with researchers at Robert Langridge’s laboratory at the University of San Francisco. C. Critique of Resource Management The SUMEX staff has continued to be most cooperative in supporting PROTEAN tesearch. The SUMEX computer facility is well maintained and managed for effective support of our work. The computer network and Lisp workstations are supported very effectively by the SUMEX staff. 143 E. H. Shortliffe ONCOCIN ‘Project 5P41-RR00785-14 Ill. RESEARCH PLANS A. Goals & Plans Our long-range goal is to build an automatic interpretation system similar to CRYSALIS (which worked with x-ray crystallography data). In the shorter term, we are building interactive programs that aid in the interpretation of NMR data on small proteins. The current version of PROTEAN has domain and control knowledge sources that implement the reasoning techniques described above to build a solution using a dynamically created strategic plan. These knowledge sources develop partial solutions that position multiple alpha helices, coils, and beta structures at the Solid level and refine those helices using distance, surface, and volume constraints, PROTEAN also includes programs that use atomic level representations of the amino acid backbone and side chains. These routines use more precise atomic level distance constraints to prune the solutions obtained by the more abstract solid level geometry computations. Programs are also available to find acceptable backbone segments for unstructured coil segments between alpha helices and beta structures. The proposed research would expand PROTEAN to include knowledge sources that: 1. merge highly constrained partial solutions at the Solid level. 2. propagate emergent constraints at the atomic level back up to the solid level to further restrict the relative positions of superordinate helices, beta sheets, and coils. 3. further restrict the relative locations of atoms relative to one another. 4. select instances of structures to be used as starting points for other kinds of refinement procedures, such as the solution of the Bloch equations, which define the NMR spectrum that can possibly arise from a given structure. These equations provide a very strong test of the correctness of our method, as well as providing an additional constraint on proposed structures. 5. develop efficient and effective control strategies for the solution of intermediate and large molecules. 6. reason about mobility of structures when the data indicate that mobility is possible. We have built an effective strategy for automatically determining the families of solid level solutions for small proteins, such as the Lac-repressor headpiece. We will extend the current work to develop control strategies to guide PROTEAN’s constraint satisfaction in medium and large protein to identify the family of legal protein conformations as efficiently as possible. B. Justification for continued SUMEX use We will continue to use SUMEX for developing parts of the program before integrating them with the whole system. We are using Interlisp to implement PROTEAN within the Blackboard model flexibly and quickly. In addition, the local area network that SUMEX maintains is crucial to the communications between our reasoning system in BB1, running on Xerox Lisp machines, and our geometry programs and display systems, running on the IRIS 3020 workstation. E. H. Shortliffe 144 5P41-RR00785-14 ONCOCIN Project C. Need for other computing resources At this time our computational resources are almost adequate. However, access to Lisp machines for program development is often a limiting factor in our ability to continue the research. In addition, faster computation of the operations of the GS would be facilitated by a special-purpose array processor or an additional workstation for computing. 145 E. H. Shortliffe RADIX Project 5P41-RRO0785-14 IV.A.5. RADIX Project The RADIX Project: Deriving Medical Knowledge from Time-Oriented Clinical Databases Robert L. Blum, M.D., Ph.D. Department of Computer Science Stanford University Gio C. M. Wiederhold, Ph.D. Departments of Computer Science and Medicine Stanford University 1. SUMMARY OF RESEARCH PROGRAM A. Technical Goals - Introduction Medical and Computer Science Goals -- The objectives of the RADIX project are 1) Discovery: to provide knowledgeable assistance to a research investigator in studying medical hypotheses on large databases, and to automate the process of hypothesis generation and exploratory confirmation, 2) Summarization: to develop a program and set of techniques for automated summarization of patient records, and 3) Peer Review: to develop a program to assist physician reviewers examine case databases for medical peer review and quality assurance. For system development we have used a subset of the ARAMIS database. We will first describe our work on discovery, followed by summarization and peer review. RADIX Discovery Module Computerized clinical databases and automated medical records systems have been under development throughout the world for at least a decade. Among the earliest of these endeavors was the ARAMIS Project, (American Rheumatism Association Medical Information System) under development since 1969 in the Stanford Department of Medicine. ARAMIS contains records of over 17,000 patients with a variety of rheumatologic diagnoses. Over 62,000 patient visits have been recorded, accounting for 50,000 patient-years of observation. The ARAMIS Project has now been generalized to include databases for many chronic diseases other than arthritis. The fundamental objective of the ARAMIS Project and many other clinical database projects is to use the data that have been gathered by clinical observation in order to study the evolution and medical management of chronic diseases. Unfortunately, the process of reliably deriving knowledge has proven to be exceedingly difficuit. Numerous problems arise stemming from the complexity of disease, therapy, and outcome definitions, from the complexity of causal relationships, from errors introduced by bias, and from frequently missing and outlying data. A major objective of the RADIX Project is to explore the utility of symbolic computational methods and knowledge-based techniques at solving some of these problems. The RADIX computer program is designed to examine a time-oriented clinical database such as ARAMIS and to produce a set of (possibly) causal relationships. The algorithm exploits three properties of causal relationships: time precedence, correlation, and nonspuriousness. First, a Discovery Module uses lagged, nonparametric correlations to generate an ordered list of tentative relationships. Second, a Study Module uses a E. H. Shortliffe 146 S5P41-RR00785-14 RADIX Project knowledge base (KB) of medicine and statistics to try to establish nonspuriousness by controlling for known confounders. The principal innovations of RADIX are the Study Module and the KB. The Study Module takes a causal hypothesis obtained from the Discovery Module and produces a comprehensive study design, using knowledge from the KB. The study design is then executed by an on-line statistical package, and the results are automatically incorporated into the KB. Each new causal relationship is incorporated as a machine-readable record specifying its intensity, distribution across patients, functional form, clinical setting, ‘validity, and evidence. In determining the confounders of a new hypothesis the Study Module uses previously “learned” causal relationships. In creating a study design the Study Module follows accepted principles of epidemiological research. It determines study feasibility and study design: cross- sectional versus longitudinal. It uses the KB to determine the confounders of a given hypothesis, and it selects methods for controlling their influence: elimination of patient records, elimination of confounding time intervals, or statistical control. The Study Module then determines an appropriate statistical method, using knowledge stored as production rules. Most studies have used a longitudinal design involving a multiple regression model applied to individual patient records. Results across patients are combined using weights based on the precision of the estimated regression coefficient for each patient. Morte recently, we have undertaken a new component to the RADIX program: a knowledge-based discovery module. The goal of the knowledge-based discovery module is to overcome some of the limitations of the original, statistics-based, RX discovery module. In creating disease hypotheses, researchers make extensive use of notions of causation, mechanism of action, tempo, 2d quantitative sufficiency, as well as detailed knowledge of pathophysiology. We are <«ceking to automate this process of hypothesis formation by replicating selected discoveries in rheumatology using data from the ARAMIS database. RADIX Summarization Module The management of inpatients and outpatients is often complicated by the size and disorganization of patient charts. The current paper chart is ill-suited to serve as the major means of communication among health care providers. In recognition of this problem, computerized patient records are becoming increasingly available. While computerization of records at least renders them legible and available, it does not solve the problem of information overload. The ability to automatically create patient summaries would represent a useful adjunct to a patient record for rapid review of a case, for clinical decision making and patient monitoring, and for surveillance of quality of care. The goal of the RADIX summarization program is to infer a summary of a patient's clinical history from lengthy on-line medical records. The RADIX summarization program is a knowledge-based system which produces intelligent summaries from a time-oriented data base of Systemic Lupus Erythematosus patients. Medical concepts in the system are represented by three entities of increasing complexity: abnormal primary attributes, abnormal states and diseases. Abnormal states and diseases are derived from the abnormal primary attributes by the Reasoner using a combination of model-driven and data-driven algorithms. Uncertainty associated with the derived states is handled with a Bayesian approach supplemented by boolean predicates, using likelihood ratios obtained from a transformation of the INTERNIST knowledge base. After summarizing the data, the system generates interactive, graphical displays with optional explanation windows. The prototypes we have implemented have shown that intelligent summarization of medical records is feasible and that interactive graphical display is of great heip in 147 E. H. Shortliffe RADIX Project 5P41-RRO0785-14 conveying complex medical information. However, the system is still under development and has not been formally evaluated. There is much work remaining to be done in the process of creating a complete, clinically useful summary. The knowledge base must be tested and enlarged, the temporal aspect of the reasoning must be improved and more sophisticated displays must be developed. Finally, although our program currently works only with the ARAMIS data base, we hope to extend it and produce a General Summarization System that could be interfaced with any time- oriented medical data base. This general system would include other data base dictionaries and would allow the user to enter medical knowledge tailored to his data base. RADIX Peer Review Program We have begun design of a program to assist physician reviewers with medical peer review and quality assurance. This work builds on the Summarization module, and extends it with a new Screening module. The Summarization module, described above, will allow a reviewer to rapidly scan a detailed, longitudinal record. It will summarize major events in the record by displaying them as labels on a time line. The new Screening module will take as input a reviewer's specification of rules of practice that he is interested in checking in the records. The module will transform these rules into an internal form in which they will be matched against the patient records. The output will be a set of episodes in the patient record in which apparent violations of the rules of practice have occurred. The reviewer will then be able to interactively examine each of these episodes using the Summarization module to determine whether a violation was substantiated by the context in which the medical decision was made. B. Medical Relevance and Collaboration As a test bed for system development, our focus of attention has been on the records of patients with systemic lupus erythematosus (SLE) contained. in the Stanford portion of the ARAMIS Data Bank. SLE is a chronic rheumatologic disease with a broad spectrum of manifestations. Occasionally the disease can cause profound renal failure and lead to an early death. With many perplexing diagnostic and therapeutic dilemmas, it is a disease of considerable medical interest. In the future we anticipate possible collaborations with other project users of the TOD System such as the National Stroke Data Bank, the Northern California Oncology Group, and the Stanford Divisions of Oncology and of Radiation Therapy. We believe that this research project is broadly applicable to the entire gamut of chronic diseases that constitute the bulk of morbidity and mortality in the United States. Consider five major diagnostic categories responsible for approximately two thirds of the two million deaths per year in the United States: myocardial infarction, stroke, cancer, hypertension, and diabetes. Therapy for each of these diagnoses is fraught with controversy concerning the baiance of benefits versus costs. 1. Myocardial Infarction: Indications for and efficacy of coronary artery bypass graft vs. medical management alone. Indications for {fong-term antiarrhythmics ... long-term anticoagulants. Benefits of cholesterol-lowering diets, exercise, and so forth. 2. Stroke: Efficacy of long-term anti-platelet agents, long-term anticoagulation. Indications for revascularization. 3. Cancer: Relative efficacy of radiation therapy, chemotherapy, surgical excision - singly or in combination. Optimal frequency of screening procedures. Prophylactic therapy. E. H. Shortliffe 148 5P41-RRO0785-14 RADIX Project 4. Hypertension: Indications for therapy. Efficacy versus adverse effects of chronic antihypertensive drugs. Role of various diagnostic tests such as renal arteriography in work-up. 5. Diabetes: Influence of insulin administration on microvascular complications. Role of oral hypoglycemics. Despite the expenditure of billions of dollars over recent years for randomized controlled trials (RCT’s) designed to answer these and other questions, answers have been slow in coming. RCT’s are expensive in terms of funds and personnel. The therapeutic questions in clinical medicine are too numerous for each to be addressed by its own series of RCT"s. On the other hand, the data regularly gathered in patient records in the course of the normal performance of health care delivery are a rich and largely underutilized resource. The ease of. accessibility and manipulation of these data afforded by computerized clinical databases holds out the possibility of a major new resource for acquiring knowledge on the evolution and therapy of chronic diseases. The goal of the research that we are pursuing on SUMEX is to increase the reliability of knowledge derived from clinical data banks with the hope of providing a new tool for augmenting knowledge of diseases and therapies as a supplement to knowledge derived from formal prospective clinical trials. Furthermore, the incorporation of knowledge from both clinical data banks and other sources into a uniform knowledge base should increase the ease of access by individual clinicians to this knowledge and thereby facilitate both the practice of medicine as well as the investigation of human disease processes. The medical relevance of the automated summarization program is readily apparent. A practicing physician or medical researcher, faced with a patient chart, often with dozens of visits and scores of attributes, rarely has time to read the entire chart. He (or she) would like a succinct summary of the important events in that patient’s record to assist his decision making. The use of computerized medical records improves the quality of information but does not solve the problem of information overload. For this reason, it would be useful to have the ability to automatically summarize patient records into meaningful clinical events. C. Highlights of Research Progress C.L April 1986 to April 1987 Our primary accomplishments in this period have been the following: 1) Design and implementation of a second generation of the automated summarization ptogram. 2) Design and implementation of a bit-mapped display program for chronic patient data. 3) Development of algorithms for transforming the Internist knowledge base into standard Bayes forma. 4) Design of a Peer Review program based on the Summarization program. 5) Publication of papers on automated discovery and automated summarization, and presentation of results at medical conferences. 6) Training post-doctoral researchers, participants in RADIX, in methods of medical artificial intelligence research. 149 E. H. Shortliffe RADIX Project 5P41-RRO0785-14 C.1.1 Design and implementation of a second generation of the prototype automated Summarization program We have designed and implemented a second generation of our prototype automated summarization program. This work is described in Dezegher-Geets, 1987, noted in the publications section. The current program improves upon a prototype implemented by Downs (Downs 1986); the knowledge base has been substantially enlarged, the inference mechanisms refined and enhanced for temporal reasoning, and the graphical display capability has been expanded. The summarization program produces intelligent summaries from a time-oriented data base of Systemic Lupus Erythematosus patients. Medical concepts in the system are represented by three entities of increasing complexity: abnormal primary attributes, abnormal states and diseases. Abnormal states and diseases are derived from the abnormal primary attributes by the Reasoner using a combination of model-driven and data-driven algorithms. Uncertainty associated with the derived states is handled with a Bayesian approach supplemented by boolean predicates, using likelihood ratios obtained from a transformation of the INTERNIST knowledge base. After summarizing the data, the system generates interactive, graphical displays with optional explanation windows. C.1.2 Design and implementation of a bit-mapped display program for chronic patient data The new display program provides graphic, synoptic, intelligent displays of chronic patient data. The goals of our implementation are: 1) Provide a good approximation of what each user actually wants and needs to see, without excess data. 2) Provide “intelligent” grouping of attributes based on knowledge of groups of related attributes, for example related to organ system, differential diagnoses, manifestations, and evidence. 3) Provide “intelligent” selection of attributes by prioritizing and selecting attributes by their clinical importance for the patient. 4) Provide interactive, editable displays, with choices available immediately through menus for the common displays. The architecture is designed so that the Display Module sits "on top” of the Al components. It is designed to interact with a separate knowledge base or “expert system". The Display is separated from the knowledge base specifically to make it transportable and generalizable. The knowledge based component contains knowledge of diseases, disease hierarchies, causal relations, equivalence relationships (eg. proteinuria is part of Nephrotic syndrome), and so on. The display module has information that such relationships exist in medicine, and when to request specific information from the knowledge base. The Display module's knowledge of general medical concepts that are relevant for display includes the severity, belief, import, differential of a manifestation, complications of a disease, manifestations, organ system or user-specified attribute groupings, causal relationships, and equivalence relationships. C.1.3 Development of algorithms for transforming the Internist knowledge base into standard Bayes form INTERNIST-1 is an expert system for diagnosis across a broad spectrum of disease. Over twenty man-years of effort have gone into the construction of its knowledge base which contains relationships between approximately 600 diseases and 4,000 manifestations of disease. A major limitation of INTERNIST-1 is that the quantities E. H. Shortliffe 150