5 P41 RRO0785-16 Description of Program Activities II. Description of Program Activities This section corresponds to the predefined forms required by the Division of Research Resources to provide information about our resource activities for their computerized retrieval system. These forms have been submitted separately and are not reproduced here to avoid redundancy with the more extensive narrative information about our resource and progress provided in this report. ILA. Scientific Subprojects Our core research and development activities are described starting in section III.A.2, our training activities are summarized in section IJ].A.2.7, and the progress of our collaborating projects is detailed starting in section IV. II.B. Books, Papers, and Abstracts The list of recent publications for our core research and development work is given in section ITI.A.2.5 and those for the collaborating projects are in the individual reports starting in section IV. ILC. Resource Summary Table The details of resource usage, including a breakdown by the various subprojects, is given in the tables starting in section III.A.2.8 3 E. H. Shortliffe 5 P41 RROO785-16 Resource Overview Ill. Narrative Description III.A. Summary of Research Progress III.A.1 Resource Overview This is an annual report for year 16 of the SUMEX-AIM resource (grant RR- 00785), the third year of a 5-year renewal period to support further research on applications of artificial intelligence in biomedicine. For the technical and administrative reasons discussed in earlier reports, the SUMEX project now includes the continuation of work on the development and dissemination of medical consultation systems (ONCOCIN) that had been supported before 1986 as resource-related research under grant RR-01631. Progress on core ONCOCIN research is therefore now reported here as well. The originally proposed research program (June 1985 renewal application) included an ambitious plan to: - Continue our long-range core research efforts on knowledge-based systems, aimed at developing new concepts and methodologies needed for biomedical applications. * Substantially extend ONCOCIN research on developing and disseminating clinical decision support systems. « Develop the core systems technology to move the national SUMEX-AIM community from a dependence on the central SUMEX DEC 2060 to a fully distributed, workstation-based computing environment. - Introduce these systems technologies into the SUMEX-AIM community with appropriate communications and managerial assistance to responsibly phase out the central resource and DEC 2060 mainframe in a manner that will support community efforts to become self-sustaining and to continue scientific interactions through fully distributed means. + Maintain our aggressive efforts at training and dissemination to help exploit the research potential of this field. IIT.A.1.1 SUMEX-AIM as a Resource SUMEX and the AIM Community Since the SUMEX-AIM resource was established in late 1973, computing technology and biomedical artificial intelligence research have undergone a remarkable evolution and SUMEX has both influenced and responded to these changing technologies. It is widely recognized that our resource has fostered highly influential work in biomedical AI — work from which much of the expert systems field emerged — and that it has simultaneously helped define the technological base of applied AI research. The focus of the SUMEX-AIM resource continues to emphasize research on artificial intelligence techniques that guide the design of computer programs 5 E. H. Shortliffe Resource Overview 5 P41 RRO00785-16 that can help with the acquisition, representation, management, and . utilization of the many forms of medical knowledge in diverse biomedical research and clinical care settings — ranging from biomolecular structure determination and analysis, to molecular biology, to clinical decision support, to medical education. Nevertheless, we have long recognized that the ultimate impact of this work in biomedicine will be realized through its synthesis with the full range of methodologies of medical informatics, such as data bases, biostatistics, human-computer interfaces, complex instrument control, and modeling. From the start, SUMEX-AIM work has been grounded in real-world applications, like systems for the interpretation of mass spectral information about biomolecular structures, chemical synthesis, interpretation of x-ray diffraction data on crystals, cognitive modeling, infectious disease diagnosis and therapy, DNA sequence analysis, experiment planning and interpretation in molecular biology, and medical instruction. Our current work extends this emphasis in application domains such as oncology protocol management, clinical decision support, protein structure analysis, and data base information retrieval and analysis. All of these research efforts have demanded close collaborations with diverse parts of the biomedical research community and the integration of many computational methods from those domains with knowledge-based approaches. Even though in the beginning the "AlI-in-medicine" community was quite small, it is perforce no longer limited and easily-defined, but rather is spreading and is inextricably linked with the many biomedical applications communities we have collaborated with over the years. Driven both by the on-going diffusion of AI and by the development of personal computer workstations that signal the practical decentralization of computing resources, we must develop new resource communication and distributed computing technologies that will continue to facilitate wider intra- and inter-community communication, collaboration, and sharing of biomedical information. The SUMEX Project has demonstrated that it is possible to operate a computing research resource with a national charter and that the services providable over networks were those that facilitate the growth of Al-in- Medicine. SUMEX now has a reputation as a model national resource, pulling together the best available interactive computing technology, software, and computer communications in the service of a national scientific community. Planning groups for national facilities in cognitive science, computer science, and biomathematical modeling have discussed and studied the SUMEX model and new resources, like the BIONET resource for molecular biologists, are closely patterned after the SUMEX example. The projects SUMEX supports have generally required substantial computing resources with excellent interaction. Today, with the dramatic explosion of high-performance workstations that are more and more generally available, the need for a central source of raw computing cycles has significantly diminished. In place of being a distributor of CPU cycles, SUMEX has become a communications cross-roads and a source of AJ and computer systems software and expertise. E. H. Shortliffe 6 5 P41 RROO785-16 Resource Overview SUMEX has demonstrated that a computer resource is a useful "linking mechanism" for bringing together electronically teams of experts from different disciplines who share a common problem focus. AI concepts and software are among the most complex products of computer science. Historically it has not been easy for scientists in other fields to gain access to and mastery of them. Yet the collaborative outreach and dissemination efforts of SUMEX have been able to bridge the gap in numerous cases. Over 40 biomedical AI application projects have developed in our national community and have been supported directly by SUMEX computing resources over the years — many more have benefitted indirectly through access to the software, information, and advice offered by the SUMEX resource. The integration of AI ideas with other parts of medical informatics and their dissemination into biomedicine is happening largely because of the development in the 1970's and early 1980's of methods and tools for the application of AI concepts to difficult professional-level problem solving. Their impact was heightened because of the demonstration in various areas of medicine and other life sciences that these methods and tools really work. Here SUMEX has played a key role, so much so that it is regarded as "the home of applied AI." SUMEX has been the home of such well-known AI systems as DENDRAL (chemical structure elucidation), MYCIN (infectious disease diagnosis and therapy), INTERNIST (differential diagnosis), ACT (human memory organization), MOLGEN/BIONET (tools for DNA sequence analysis and molecular biology experiment planning), ONCOCIN (cancer chemotherapy protocol advice), SECS (chemical synthesis), EMYCIN (rule-based expert system tool), and AGE (blackboard-based expert system tool). Since 1980, our community has published a fifteen books that give a scholarly perspective on the scientific experiments we have been performing. These volumes, and other work done at SUMEX, have played a seminal role in structuring modern AI paradigms and methodology. The Future of SUMEX-AIM Given this background, what is the future need and course for SUMEX asa resource — especially in view of the on-going revolution in computer technology and costs and the emergence of powerful single-user workstations and local area networking? The answers remain clear. Basic Research on Al in Biomedicine At the deepest research level, despite our considerable success in working on medical and biological applications, the problems we can attack are still sharply limited. Our current ideas fall short in many ways against today's important health care and biomedical research problems brought on by the explosion in medical knowledge and for which AI should be of assistance. Just as the research work of the 70's and 80's in the SUMEX-AIM community 7 E. H. Shortliffe Resource Overview 5 P41 RRO0785-16 fuels the current practical and commercial applications, our work of the late 80's will be the basis for the next decade's systems. The report of the panel on medical informatics!, convened late in 1985 by the National Library of Medicine to review and recommend twenty-year goals for the NLM, listed among its highest priority recommendations the need to greatly expand and aggressively pursue an interdisciplinary research program to develop computational methods for acquiring, representing, managing, and using biomedical knowledge of all sorts for health care and biomedical research. Similar recommendations have been stated recently by the panel on Information Technology and the Conduct of Research of the National Academy of Science?. These are precisely the problems which the SUMEX-AIM community has been working on so successfully and which will require work well beyond the five year funding period we have requested. It is essential that this line of research in the SUMEX-AIM community, represented by our core AI research, the ONCOCIN research, and our collaborative research groups, be continued. The Changing Role of the Central Resource At the resource level, there are changing, but still growing, needs for computing resources for the active AIM research community to continue its work over the next five years. The workstations to which we directed our attention in 1980 have now demonstrated their practicality as research tools and, increasingly, as mechanisms for disseminating AI systems as cost- effective decision aids in clinical settings such as private offices. The era of highly centralized general machines for AI research is nearly at an end and is being replaced by networks of distributed but heterogeneous single-user machines sharing common information resources and communication paths among members of the biomedical research community. Most of our community groups have been able to take advantage of local computing facilities, with SUMEX-AIM providing a central cross-roads for communications and the sharing of programs and knowledge. In its core research and development role, SUMEX-AIM has its sights set on the hardware and software systems of the next decade. We expect major changes in the distributed computing environments that are just now emerging in order to make effective use of their power and to adapt them to the development and dissemination of biomedical AI systems for professional user communities. In its training role, SUMEX is a crucial resource for the education of badly needed new researchers and professionals to continue the development of the biomedical AI field. The "critical mass" of the existing Long Range Plan. Report of the Board of Regents, National Library of Medicine. National Institutes of Health. January 1987. Information Technology and the Conduct of Research — The User's View. Report of the Panel on Information Technology and the Conduct of Research, National Academy of Sciences. National Academy Press. 1989. E. H. Shortliffe 8 5 P41 RROO785-16 Resource Overview physical SUMEX resource, its development staff, and its intellectual ties with the Stanford Knowledge Systems Laboratory (KSL — see Appendix A for a summary of current KSL research activities), make this an ideal setting to integrate, experiment with, and export these methodologies for the rest of the AIM community. We will continue our experimental approach to distributed systems, learning to build and exploit distributed networks of these machines and to build and manage graceful software for these systems. Since decentralization is central to our future, we must learn its technical characteristics. Resource Sharing An equally important function of the SUMEX-AIM resource is an exploration of the use of computer communications as a means for interactions and sharing between geographically remote research groups engaged in biomedical computer science research and for the dissemination of AI technology. This facet of scientific interaction is becoming increasingly important with the explosion of complex information sources and the regional specialization of groups and facilities that might be shared by remote researchers!. Another of the key recommendations of the NLM medical informatics planning panel? was that high-speed network communication links be established throughout the biomedical research community so that knowledge and information can be shared across diverse research groups and that the required interdisciplinary collaborations can take place. Recent efforts to establish a national NSF Net’, largely to support the supercomputer projects funded by NSF but also to replace and upgrade part of the national research community linkage that the now aging ARPANET has supported, have made important progress. Still, these efforts do not encompass the broad range of biomedical research groups that need national network access and to date, the NIH has not played an aggressive role in the interagency Research Internet coordination efforts. We must work to build a stronger institutional support for a National Research Network. SUMEX continues to be an important pathfinder to develop the technology and community interaction tools needed to expand community system and communication resources. Our community building effort is based upon the developing state of distributed computing and communications technology 1 Lederberg, J. “Digital Communications and the Conduct of Science: the New Literacy." Proc. IEEE, 66(11):1314-1319, 1978. Coulter, C. L. "Research Instrument Sharing." Science, 201(4854), 1978. Newell, A., and Sproull, R. F. "Computer Networks - Prospects for Scientists." Science, 215(4534):843, 1982. NLM Long Range Plan: Medical Informatics. NLM Planning Panel 4. National Library of Medicine, National Institutes of Health. January 1987. Marshal, E. "NSF Opens High-Speed Computer Network." Science. 248: 22-23, 1989. 2 3 9 E. H. Shortliffe Resource Overview 5 P41 RRO0785-16 and we have therefore turned our core systems research to actively supporting the development of distributed computing and communications resources to facilitate collaborative project research and continued inter- group communications. Summary of Long-term Goals Maintain the synergistic relationship between SUMEX core system development, core AI research, our experimental efforts at disseminating clinical decision-making aids, and new applications efforts. Continue to serve the national AIM research community, less and less as a source of raw computing cycles and more and more as a transfer point for new technologies important for community research and communication. We will also continue our coordinating role within the community through electronic media and periodic AIM workshops Maintain our connections to national networks (e.g., NSFNet, ARPANET, and TELENET) and our local Ethernet and assist other community members to establish similar links by example, by integrating and providing enabling software, and by offering advice and support within our resources. . Focus new computing resource developments on more effective exploitation of distributed workstations through better communication and cooperative computing tools, using transparent digital networking schemes. Enhance the computing environments of workstations so that only minimal dependency on central, general-purpose computing hosts remains and these mainframe time-sharing systems can be phased out eventually. Remaining central resources will include servers for communications, community information resources, and special computing architectures (e.g., shared- or distributed-memory symbolic multiprocessors) justified by cost-effectiveness and unique functionality. Incrementally phase in, disseminate, and evaluate those aspects of the local distributed computing resource that are necessary for continuing national AIM community support within this distributed paradigm. This will ultimately point the way towards the distributed computing resource model that we believe will interlink this community well into the next decade. Responsibly phase out the existing DEC 2060 machine as effective distributed computing alternatives become widely available. Because of severe budget pressures, the 2060 was taken out of routine service during this past year in a much more accelerated fashion than was planned or was comfortable for AIM users to acclimate to the new UNIX operating system environment. We are still finishing up a number of interim systems alternatives to discontinued 2060 services not available in standard UNIX environments. E. H. Shortliffe 10 5 P41 RROO0785-16 Resource Overview - Continue the central staff and management structure, essentially unchanged in function during the five-year transition period, except for the merging of the core part of the ONCOCIN research with the SUMEX resource. III.A.1.2 Significance and Impact in Biomedicine Artificial intelligence is the computer science of representations of symbolic knowledge and its use in symbolic inference and problem-solving processes. Projects in the SUMEX-AIM community are concerned in some way with the application of AI to biomedical research and the resource has given strong impetus and support to knowledge-based system research in biomedicine. For computer applications in medicine and biology, this research path is crucial. Medicine and biology are not presently mathematically-based sciences; unlike physics and engineering, they are seldom capable of exploiting the mathematical characteristics of computation. They are essentially inferential, not calculational, sciences. If the computer revolution is to affect biomedical scientists, computers will be used as inferential aids. The growth in medical knowledge has far surpassed the ability of a single practitioner to master it all, and the computer's superior information processing capacity thereby offers a natural appeal. Furthermore, the reasoning processes of medical experts are poorly understood; attempts to model expert decision-making necessarily require a degree of introspection and a structured experimentation that may, in turn, improve the quality of the physician's own clinical decisions, making them more reproducible and defensible. New insights that result may also allow us more adequately to teach medical students and house staff the techniques for reaching good decisions, rather than merely to offer a collection of facts which they must independently learn to utilize coherently. Perhaps the larger impact on medicine and biology will be the exposure and refinement of the hitherto largely private heuristic knowledge of the experts of the various fields studied. The ethic of science that calls for the public exposure and criticism of knowledge has traditionally been flawed for want of a methodology to evoke and give form to the heuristic knowledge of scientists. AI methodology is beginning to fill that need. Heuristic knowledge can be elicited, studied, critiqued by peers, and taught to students. The importance of AI research and its applications is increasing in general, without regard for the specific areas of biomedical interest. AI has been one of the principal fronts along which university computer science groups are expanding. The pressure from student career-line choices is great. Federal and industrial support for AI research and applications is vigorous, although support specifically for biomedical applications continues to be limited. All of the major computer manufacturers (e.g., IBM, DEC, TI, UNISYS, HP, Apple, and others) are using and marketing AI technology aggressively and many software companies are putting more and more products on the market. Many other parts of industry are also actively pursuing AI applications in 11 KE. H. Shortliffe Resource Overview 5 P41 RROO785-16 their own contexts, including defense and aerospace companies, manufacturing companies, financial companies, and others}. Despite the limited research funding available, there is also an explosion of interest in medical AI. The American Association for Artificial Intelligence (AAAI), the principal scientific membership organization for the AI field, has over 7000 members, several thousand of whom are members of the medical special interest group known as the AAAI-M. Speakers on medical AI are prominently featured at professional medical meetings, such as the American College of Pathology and American College of Physicians meetings; a decade ago, the words artificial intelligence were never heard at such conferences. And at medical computing meetings, such as the annual Symposium on Computer Applications in Medical Care (SCAMC) and the international MEDINFO conferences, the growing interest in AI and the rapid increase in papers on AI and expert systems are further testimony to the impact that the field is having. Al is beginning to have a similar effect on medical education. Such diverse organizations as the National Library of Medicine, the American College of Physicians, the Association of American Medical Colleges, and the Medical Library Association have all called for sweeping changes in medical education, increased educational use of computing technology, enhanced research in medical computer science, and career development for people working at the interface between medicine and computing. They all cite evolving computing technology and (SUMEX-AIM) AI research as key motivators. At Stanford, we have a vigorous special graduate program in Medical Information Sciences for student training and research in AI. This program has many more applicants than available slots. Demand for these graduates, in both academic and industrial settings, is so high that students typically begin to receive solicitations one or two years before completing their degrees. III.A.1.3 Summary of Current Resource Goals The following outlines the specific objectives of the SUMEX-AIM resource during the current five-year award period begun in August 1986. It provides an overall research plan for the resource and the backdrop against which specific progress is reported. Note that these objectives cover only the resource nucleus; objectives for individual collaborating projects are discussed in their respective reports in Section IV. Specific aims are broken into five categories: 1) Technological Research and Development, 2) Collaborative Feigenbaum, E. A., McCorduck, P., and Nii, H. P. The Rise of the Expert Company: How Visionary Companies are Using Artificial Intelligence to Achieve Higher Productivity and Profits. Times Books, New York, NY, 1988. Winston, P. H., and Prendergast, K. A. The AI Business: Commercial Uses of Artificial Intelligence. The MIT Press, Cambridge, MA, 1984. E. H. Shortliffe 12 5 P41 RROO785-16 Resource Overview Research, 3) Service and Resource Operations, 4) Training and Education, and 5) Dissemination. Technological Research and Development NIH-based SUMEX funding and computational support for core research is complementary to similar funding from other agencies (including DARPA, NASA, NSF, NLM, private foundations, and industry) and contributes to the long-standing interdisciplinary effort at Stanford in basic AI research and expert system design. We expect this work to provide the underpinnings for increasingly effective consultative programs in medicine and for more practical adaptations of this work within emerging microelectronic technologies. Specific aims include: Basic research on AI techniques applicable to biomedical problems. Over the next term we will emphasize work on very large multi-use knowledge bases, blackboard problem-solving frameworks and architectures, knowledge acquisition or learning, constraint satisfaction, and qualitative simulation. Investigate methodologies for disseminating application systems such as clinical decision-making advisors into user groups. This will include generalized systems for acquiring, representing and reasoning about complex treatment protocols such as are used in cancer chemotherapy and which might be used for clinical trials in other domains. Support community efforts to organize and generalize AI tools and architectures that have been developed in the context of individual application projects. This will include retrospective evaluations of systems like the AGE blackboard experiment and work on new systems such as BB1, CARE, EONCOCIN, EOPAL, Meta-ONYX, and architectures for concurrent symbolic computing. The objective is to evolve a body of software tools that can be used to more efficaciously build future knowledge-based systems and explore other biomedical AI applications. Develop more effective workstation systems to serve as the basis for research, biomedical application development, and dissemination. We seek to coordinate basic research, application work, and system development so that the AI software we develop for the next 5-10 years will be appropriate to the hardware and system software environments we expect to be practical by then. Our purchases of new hardware will be limited to experimentation with state-of-the-art workstations as they become available for our system developments. Collaborative Research Encourage the exploration of new applications of AI to biomedical research and improve mechanisms for inter- and intra-group collaborations and communications. While AI is our defining theme, we may consider exceptional applications justified by some other unique feature of SUMEX-AIM essential for important biomedical research. We 13 E. H. Shortliffe Resource Overview 5 P41 RR0O0785-16 will continue to exploit community expertise and sharing in software development. - Minimize administrative barriers to the community-oriented goals of SUMEX-AIM and direct our resources toward purely scientific goals. We will retain the current user funding arrangements for projects working on SUMEX facilities. User projects will fund their own manpower and local needs; actively contribute their special expertise to the SUMEX-AIM community; and receive an allocation of system resources under the control of the AIM management committees. We will progressively charge core SUMEX-AIM operations costs to Stanford users as DRR support for the central system (initially a DEC 2060) is phased out. Fees to national users will be delayed as long as financially possible. - Provide effective and geographically accessible communication facilities to the SUMEX-AIM community for remote collaborations, communications among distributed computing nodes, and experimental testing of AI programs. We will retain the current ARPANET and TELENET connections for at least the initial term and will actively explore other advantageous connections to new communications networks and to dedicated links. Service and Resource Operations SUMEX-AIM does not have the computing or manpower capacity to provide routine service to the large community of mature projects that has developed over the years. Rather, their computing needs are better met by the appropriate development of their own computing resources when justified. Thus, SUMEX-AIM has the primary focus of assisting new start-up or pilot projects in biomedical AI applications in addition to its core research in the setting of a sizable number of collaborative projects. We do offer continuing support, when appropriate, for projects through the lengthy process of obtaining funding to establish their own computing base. Training and Education « Provide documentation and assistance to interface users to resource facilities and systems. - Exploit particular areas of expertise within the community for assisting in the development of pilot efforts in new application areas. - Accept visitors in Stanford research groups within limits of manpower, space, and computing resources. « Support the Medical Information Science and other student programs at Stanford to increase the number of research personnel available to work on biomedical AI applications. ¢ Support workshop activities including collaboration with other community groups on the AIM community workshop and with individual projects for E. H. Shortliffe 14 5 P41 RRO00785-16 Resource Overview more specialized workshops covering specific research, application, or system dissemination topics. Dissemination While collaborating projects are responsible for the development and dissemination of their own AI systems and results, the SUMEX resource will work to provide community-wide support for dissemination efforts in areas such as: . Encourage, contribute to, and support the on-going export of software systems and tools within the AIM community and for commercial development. - Assist in the production of video tapes and films depicting aspects of AIM community research. * Promote the publication of books, review papers, and basic research articles on all aspects of SUMEX-AIM research. 15 E. H. Shortliffe Progress Summary 5 P41 RROO785-16 III.A.2 Details of Technical Progress This section gives an overview of progress for the nucleus of the SUMEX-AIM resource. A more detailed discussion of our progress in specific areas and related plans for further work are presented beginning in section III.A.2.2. Objectives and progress for individual collaborating projects are discussed in their respective reports in section IV. These collaborative projects collectively provide much of the scientific basis for SUMEX as a resource and our role in assisting them has been a continuation of that evolved in the past. Collaborating projects are autonomous in their management and provide their own manpower and expertise for the development and dissemination of their AI programs. IiI.A.2.1 Key Areas of Progress In this section we summarize highlights of SUMEX-AIM resource activities over the past year (May 1988 - April 1989), focusing on the resource nucleus. We have made continued significant progress in all of our areas of core research, including the ONCOCIN research on dissemination of clinical trial management tools, basic AI research, and distributed systems development. Core ONCOCIN Research « Our work has proceeded well along three main lines of research: 1) ONCOCIN, the therapy planning program and its graphical interface; 2) OPAL, a graphical knowledge entry system for ONCOCIN; and 3) ONYX, a strategic planning program designed to give advice in complex therapy situations. Each of these research components has in turn split into two parts: continued development of the cancer therapy versions of the system, and a generalization of each of the components for use in other areas of medicine (the prefix "E-" is added to the program names for the generalized versions). In addition, we have continued development of a generalized knowledge acquisition tool, named PROTEGE, designed to encode descriptions of clinical trials. The system was the Ph. D. thesis work of Mark Musen, (who joined our faculty this year). The output of PROTEGE is an OPAL-like input system designed for a target clinical area such as hypertension. - Based on the success of our earlier ONCOCIN work, strong interest has developed, from such diverse quarters as the National Cancer Institute and the Stanford Hospital, for developing a fully operational version of ONCOCIN that can be broadly used in oncology clinics outside our research laboratory. This past year, the Stanford Hospital started a program to assist in the transfer of innovative medical technology out of the laboratory to patient care, committing approximately $750K per year to seed this effort. ONCOCIN was selected as one of 10 projects to be funded from a large group of competing proposals. This presents a dilemma for the project that is still unresolved, namely, how to maintain a cohesiveness between ongoing research work to extend the various parts of ONCOCIN and generalize it for applications to other domains and at E. H. Shortliffe 16 5 P41 RROO785-16 Progress Summary the same time, meet the operational needs of a widely disseminated practical system. Much thought has gone into this problem this past year, including issues such as which of the modern workstation alternatives to select (Lisp machine, IBM PC, Apple Macintosh, SUN or NeXT UNIX workstation, ...), what language to pick (C, Lisp, ...), and can the research and operational systems really be consistent versions of a single system? In order to understand the scope and practical issues involved, we have begun an experiment to port ONCOCIN to a TI microExplorer running inside of a Mac II during the last six months. We have completed the translation of the Ozone object-oriented system, the temporal network and most of the reasoner. We will next approach the design of the user interface, which must be rewritten anew, since the current interface depends heavily on the graphical capabilities of the Xerox workstations. We are also starting a study of the overall design and specification of an "integrated" oncologist's workstation, under NCI sponsorship, that will lead to an attempt to coordinate federal, academic, and industrial efforts to implement such a system. * Our E-ONCOCIN research has concentrated on understanding how protocols in medicine vary across subspecialties. We are examining several application areas: the intensive care unit, insulin treatment for diabetes, hypertension protocols, and both standard and complex cancer treatment problems. The diagnosis and therapy selection for patients in the intensive care unit is a natural application area because it is based on changing data and the need to determine the response to therapy interventions. We also felt that the area of insulin treatment for diabetes would be a good area to explore. Like cancer chemotherapy, the treatments for diabetes continues over a long period of time and has been the area of intensive protocol development. Unlike cancer chemotherapy, the treatment plan must handle multiple treatments in one day and deemphasises the use of multiple drugs (although there are a variety of types of insulin). Our initial experiments have shown that many of the elements of the ONCOCIN design are sufficiently general for other application areas, but that some specific elements (particularly the representation of temporal events) will have to be redesigned or extended. Another extension is to modify the framework so that it can work with established data base tools instead of the hand-tailored data base currently in use. In this work, we must be able to describe the changing clinical context and event intervals that show up in many diverse application areas. An example of a new area that we are exploring is the treatment of AIDS patients on clinical protocols. AIDS patients do not always follow the type of strict temporal schedules (e.g., regular visits to outpatient clinics) seen with oncology patients. They have a chronic disease with acute exacerbations of opportunistic infections. Furthermore, the medication schedule is interrupted by frequent hospitalizations and confounded by taking drugs not on the protocol. 17 E. H. Shortliffe Progress Summary 5 P41 RR00785-16 Together, these factors will require a much more flexible model of the temporal dimension of treatment planning. - We continued development of the OPAL system for graphical knowledge acquisition to facilitate protocol definition and knowledge base entry for the ONCOCIN oncology application area. A major accomplishment of this last year was to experimentally combine the OPAL and ONCOCIN programs into one working program, and to completely enter knowledge from OPAL using both the high level tools and lower level rule editors, but without needing to make changes at the ONCOCIN side of the system. Our experiments with OPAL, and our intention to generalize OPAL use outside of oncology protocols, suggests that we reorganize the OPAL program to use a relational data base to store its knowledge. We continue to explore the appropriate avenue for the connection of our knowledge acquisition systems to data bases, and have concentrated on the SQL query language to a relational data base using the client-server model (e.g., the physical data base may exist on a different machine than the knowledge acquisition tool — transmitting the query and the response over the network). ¢« With the current uncertainties in what workstation environment to use for future work, we began to explore alternative platforms for developing the interface for OPAL-like systems. We have begun experiments using HyperCard on the Mac II and Interface Builder on the NeXT machine. In order to build experience with the each of these possible platforms, we have re-implemented portions of OPAL system, and are analyzing the results. It is particularly hard to determine the best platform since the NeXT machine software is still in a rudimentary stage, and HyperCard on the Mac II has significant limitations including small "card size" and the inability to display multiple cards simultaneously. We continue to work on the integration of speech-recognition technology into the interface to ONCOCIN. The project uses a commercially available continuous speech recognition product and a prototype ONCOCIN adaptation. The system uses the location of the cursor on the screen to provide a context for choosing candidate grammars with which to attempt recognition of a user's utterance. The system dynamically re-orders the list of candidate recognition grammars based on the dialog history. Albeit with limitations on the legal grammars, it is now possible to carry on most of the ONCOCIN data acquisition steps using speech alone or speech plus pointing with the mouse. We are also exploring a second medical record- keeping task — the creation of portions of a progress note that describes in textual form the changes in the patients status from week to week. We have also mounted the CMU SPHINX speech understanding system on our NeXT machines and are comparing its performance against the SSI hardware-based system we have been using. E. H. Shortliffe 18 5 P41 RROO785-16 Progress Summary Core AI Research In the last year, research has progressed on several fundamental issues of AI. As in the past, our research methodology is experimental, concentrating on building and analyzing actual systems. We have continued to explore the design and use of very large, multi-use knowledge bases with the hypothesis that both the problems of brittleness and over- specialization in current knowledge-based systems can be overcome. Some of the key directions for this work include knowledge representation, knowledge compilation, knowledge justification, model-based reasoning, and case-based reasoning. During the past year we have been exploring a variety of representations and the systems which employ them, including CYC from MCC, CLASS from Schlumberger, and QPE from Univ. of Illinois. In the study of knowledge compilation techniques, we note that effective problem solving is not typically carried out at the level of first principles, but rather at the level of more compact, efficient forms of knowledge, compiled from experience with specific tasks. We are developing an integrated scheme for using “first principles” knowledge of the physical world for simulation. Given a description of the structure of a device in terms of its constituent objects and their relations, the system identifies applicable physical laws, processes, types of matter, etc. and produces a set of equations to describe the behavior of the device. The equation model is then analyzed using the method of causal ordering to produce a model that reveals the dependency relations among the parameters of the model. Research has also progressed on our study of blackboard frameworks, especially as they relate to adaptive intelligent systems. Important questions for this work include: how can we design flexible control structures for powerful problem solving programs? How can we use these structures effectively in many problem domains? How can we represent processes and reason about their behavior, and perform intelligent actions under real-time requirements? This past year, we have begun or continued work on five domain-independent BB1 modules: the Focus module (provides a dynamic focus of attention); the ReAct module (provides time-sensitive problem detection and response capabilities); the ICE module (provides reasoning from first principles to handle complex or unfamiliar problems); the TPlan module (provides time-sensitive planning of coherent courses of action); and the TDB module (provides a temporally organized database of observed, expected, and intended models of external entities, and associated temporal reasoning functions). We have built upon earlier results in our parallel symbolic computing architectures project, including the SIMPLE CAD (Computer Aided Design) system for hierarchical, multiple level specification of computer architectures and the CARE parameterized, multiprocessor array emulator (specified in SIMPLE's specification languages and running on SIMPLE's simulator). These systems are in use by several research 19 E. H. Shortliffe Progress Summary 5 P41 RROO785-16 groups at Stanford and have been ported to several external sites, including NASA Ames Research Center. A videotaped tutorial was held in June, 1988, attended by representatives from industry and government, which described the CARE/SIMPLE system, as well as the LAMINA programming interface. The attendees received instruction in use of the system for making measurements of the performance of various simulated multiprocessor applications. Due to rapidly growing interest in the SIMPLE/CARE system, a major effort is now underway to port it to wider class of hardware platforms. The system is currently being reimplemented in Common Lisp and the X window system, with Sun workstation as the initial target. During the past year, the research effort associated with SIMPLE/CARE has largely focussed on investigations of communication protocols and techniques for monitoring concurrent object-based applications . In other areas of our parallel architectures work, we have studied the measured speed-up of two different expert system applications, ELINT (a system for interpreting electronic intelligence signals) and AIRTRAC (a system for identifying and tracking aircraft based on diverse radar data). Our preliminary conclusions are that for relatively simple and well-structured applications such as ELINT, two (or possibly more) orders of magnitude speedup via parallel execution are possible. However, for complex and ill-structured applications such as AIRTRAC Path Association, speedup over a well-tuned serial program by using parallel execution is probably limited, at best, to an order of magnitude. Experiments are continuing to verify this preliminary conclusion. The machine learning work has focussed this past year on explanation- based generalization and chunking work in the SOAR framework and inductive rule learning. This area of research is winding down due to the departures of Profs. Buchanan and Rosenbloom. During the past year finishing students extended the RL induction program to learn incrementally, that is from small sets of examples presented in sequence without benefit of looking at them all together. A front-end program was written to assist in the definition of RL's starting knowledge, the so-called "half-order theory". In our SOAR research, we completed a set of experiments evaluating a representational restriction on productions that guarantees an absence of expensive chunks, with encouraging results. We have applied our domain-independent abstraction mechanism to a set of problems in two domains (mobile robot and computer configuration), and evaluated its ability to reduce problem solving time, reduce learning time, and increase the generality of the rules learned. We have run a set of experiments which evaluate the ability of rules learned in medical diagnosis to transfer to related problems (done in a reduced-size version of NEOMYCIN-SOAR). In the area of theoretical developments and system building, we have extended our work on declarative learning to allow indexing off of arbitrary features, but in the process uncovered a new issue concerned with how to deal with multiple retrieval and discrimination. E. H. Shortliffe 20 5 P41 RROO785-16 Progress Summary Core System Development - Because of budget cuts in our award, this has been a particularly busy and chaotic year in terms of changes to the orderly progression we had planned for the transition to a distributed environment. There were two immediate consequences of this cut: a) reducing our systems staff by two people and b) taking the DEC 2060 off of contract maintenance early in the grant year, thereby forcing us to close it down for routine use. These steps have had substantial impacts in forcing us to devote full energy to the 2060-to-SUN-4 transition mechanics approximately a year before we expected to be ready for it and in diverting staff from work on longer-term distributed computing problems. In spite of all this unplanned redirection of our energies, we have made substantial progress this past year as summarized below. - Because of the necessary preoccupation of most of our staff with the premature 2060 transition this past year, we were not able to convene the visiting advisory group as was recommended by BRTP to help guide our long-term research efforts. As we finally close out the 2060 chapter this summer, we will plan to assemble such a group in the early fall (September or October) to reassess our plans for the coming two years. - As detailed in our report last year, we have chosen Apple Macintosh IT workstations as the general computing environment for researchers and staff, TI Explorer Lisp machines (including the microExplorer Macintosh coprocessor) as the near-term high-performance Lisp research environment, and a SUN-4 as the central network server replacement for the DEC 2060. We outlined there the many tasks facing us in making the transition from the central 2060 environment to the new distributed model, including selecting and integrating tools for text processing (editing, graphics, formatting, and bibliographic references), presentation graphics, printing, help facilities and distributed information access, interpersonal communication tools (EMail and BBoards), file management (storage, access, backup, and archiving), and system building tools (languages, development environments, and integration tools). Because of the high maintenance cost of the DEC 2060, we could not afford to continue its coverage in light of the large budget cut. Since this old- technology machine quickly becomes unreliable without regular maintenance, this forced us to transfer nearly all of our AIM community usage to the SUN-4/280 in October and November of 1988. The DECSystem-20 had been our major computer resource since February of 1983 and this machine, in turn, had replaced a KI-TENEX system in use for nine years earlier. Thus, our conversion to the UNIX based SUN-4/280 represented a major departure from a long-established approach to computing and for many, converting to the use of UNIX was a difficult transition. A significant and urgent effort went into developing a UNIX Users Guide for TOPS-20 Users which has provided substantial help in navigating through the most common of commands. In the process of 21 E. H. Shortliffe Progress Summary 5 P41 RROO785-16 converting, we had to transfer about 400 user accounts. Most of the immediately-needed working files from the 2060 system were dumped to tape and loaded into the SUN-4. Most of this transfer was done during a four week period of intensive work. In addition, we had to orchestrate the transfer of "SUMEX-AIM" name from the 2060 to the SUN-4, and provide effective continuation of facilities such as EMail, BBoard. text processing, etc. Continuous and nearly compatible AIM community mail services were maintained through the transition by installing the Columbia University MM-C mail program on the SUN-4. This program closely duplicates the functions of the TOPS-20 COMAND JSYS under UNIX and presents the user with a mail reader/composer interface very similar to that of TOPS-20 MM. This system, coupled with a UNIX version of the EMACS text editor, called GNUEMACS, provided a relatively familiar setting for the most common computing functions used by AIM community members. In the succeeding months, we added bulletin board functionality to MM-C so that, from the user's perspective, mail access was nearly identical to the former system. « Another major issue in the 2060-to-SUN-4 transition has been the need to provide our users with continued access to the their large collection of archived files and to a set of permanent annual backup dumps (done January of each year) which have been collected and maintained since 1975. This has required very careful planning as the directory information for these tapes resides in Archive-Directory files (for TENEX) and the on-line File Descriptor Blocks (FDB's) of the TOPS-20 file system. These two sets of information must be converted to simple UNIX- compatible text files to provide users with continued facilities to review and access their collections of archived files. This work has been a major undertaking and is still in progress. « In the move from TOPS-20 to UNIX, we have had to ensure continued access to "standard" services, such as file backup, archiving, a flexible and intuitive naming facility, and data interchange services (e.g., file transfer). UNIX has many of the needed facilities, e.g., backup, long names, hierarchical directory structure, some file property attributes, data conversion, and limited archival tools. We have worked on adapting a commercial system developed by UniTech to allow users to manage large file collections by moving files not needed on-line to and from off-line tape storage. This system also maintains a historical archive of files. The system is in beta test now and will be released to the entire community early this summer. « Electronic mail continues as a primary means of communication for the widely spread SUMEX-AIM community. As reported last year, the move to workstations has forced a significant rethinking of the mechanisms employed to manage such mail in order to ensure reliable access, to make user addressing understandable and manageable, and to facilitate keeping the mail software distributed to workstations as simple, stable, and E. H. Shortliffe 22 5 P41 RROO785-16 Progress Summary maintainable as possible. We are following a strategy of having a shared mail server machine which handles mail transactions with mail clients running on individual user workstations. The mail server can be used from clients at arbitrary locations, allowing users to read mail across campus, town, or country. We have made significant progress this year in developing a Mac II version of the graphics-based MM-D/IMAP mail client reported on last year, including a complete rewrite of the InterLisp system into C and modifying the user reading and composing interface to be compatible with the Mac "look and feel". This system is nearly ready for alpha test starting early this summer. One of the key issues in selecting the systems for our distributed computing environment was the performance of Common Lisp and to help make this evaluation, we have continued to expand an informal survey of the performance of two KSL AI software packages, SOAR and BB1, ona wide variety of machines. This study was completed this past year anda "final" report written (see Appendix B), recognizing that each month new workstations are announced that deserve additional evaluation.. Within a factor of two of the best performance, a considerable range of workstations based on stock microprocessor chips as well as specially microprogrammed Lisp chips have comparable performance. Even though performance gaps between microprogrammed Lisp systems and stock workstation implementations are narrowing, there still remains a significant difference in the quality of the development environments. We have attempted to distill the key features of the Lisp machine environments that would be needed in stock machine implementations in order to make them attractive in a development setting. - This past year we acquired 2 NeXT workstations, primarily to understand and evaluate the power of the NeXT Interface Builder for AI software development. Integrating these prototype systems took a significant effort and they are now being used in several of our core research and applications projects. In addition, we have continued to support a limited number of other "standard" workstations for our work, including Mac II's, TI Explorers, and SUN's. We have continued to work toward a complete phase-out of our old Xerox and Symbolics Lisp machines. - As reported last year, major changes in ARPANET service have been underway as ARPA has responded to its own budget pressures to reduce operating subsidy of the ARPANET. Starting late last spring, sections of the ARPANET serving university users were being shut down and replaced by connection to the NSFNET. Our own connection is just in the process of being decommissioned with our IMP scheduled to be removed sometime this summer. Our Internet access is now implemented through the Bay Area Regional Research Network (BARRNet) and the NSF Net. We continue to operate the Develcon gateway between our Ethernet environment and the TELENET network by which many AIM users gain access to SUMEX. 23 | E. H. Shortliffe Progress Summary 5 P41 RROO785-16 Other Resource Activities We have continued the dissemination of SUMEX-AIM technology through various media. The distribution system for our AI software tools (EMYCIN, AGE, and BB1) to academic, industrial, and federal research laboratories continues to work effectively. We have also continued to distribute the video tapes of some of our research projects including ONCOCIN, and an overview tape of Knowledge Systems Laboratory work to outside groups. Our group has continued to publish actively on the results of our research, including more than 45 research papers per year in the Al literature and a dozen books in the past 7 years on various aspects of SUMEX-AIM AI research. We assisted and participated actively in the AIM Workshop sponsored by AAAI and held at Stanford in 1988 and hosted a number of AIM community visitors at our Stanford research laboratory. Members of the Medical Computer Science group are participating in the early organization phases of another workshop during the spring of 1990. The Medical Information Sciences program, begun at Stanford in 1983 under Professor Shortliffe as Director, has continued its strong development over the past year. The specialized curriculum offered by the MIS program focuses on the development of a new generation of researchers able to support the development of improved computer-based solutions to biomedical needs. The feasibility of this program resulted in large part from the prior work and research computing environment provided by the SUMEX-AIM resource. As already reported, it has recently received enthusiastic endorsement from the Stanford Faculty Senate for an additional five years and has been awarded renewed post- doctoral training support from the National Library of Medicine with high praise for the training and contributions of the SUMEX-AIM environment from the reviewing study section. This past year, MIS students have published many papers, including several that have won conference awards. . We have continued to recruit new user projects and collaborators to explore further biomedical areas for applying AI. A number of these projects are built around the communications network facilities we have assembled, bringing together medical and computer science collaborators from remote institutions and making their research programs available to still other remote users. At the same time we have encouraged older mature projects to build their own computing environments thereby facilitating the transition to a distributed AIM community. A substantial number of projects have already moved to their own computing resources. SUMEX user projects have made good progress in developing and disseminating effective consultative computer programs for biomedical research. These systems provide expertise in areas like cancer chemotherapy protocol management, clinical diagnosis and decision- E. H. Shortliffe 24 5 P41 RROO785-16 Progress Summary making, and molecular biology. We have worked hard to meet their needs and are grateful for their expressed appreciation (see Section IV). 25 E. H. Shortliffe