Dissemination Efforts P41 RROO785-09 The remaining users rarely use the system. They have logged in a few times, but for one reason or another they never become regular users of the system. Quite often this is because a lab group will settle on having one or two graduate students or post-doctoral associates become the "computer experts" of the group, and as a result, the computer use by the other people in the lab drops to an almost non-existent level. Unfortunately, an equally prevalent reason for users to stop using the GENET account is a lack of resource time. Probably the major complaint that we get from GENET users is concerning the lack of compute time and availability of the system. One account just is not enough for that many people to share, especially when it is restricted to 2 jobs at one time. We constantly remind the GENET users to use there resources wisely. We encourage them to use the BATCH system to run job in the wee hours of the morning, and we remind them to be prepared to do their work quickly when they log in to the system, but their efforts do not seem to help the problem very much. Most GENET users use only a small set of programs. These consists of text editors, which are used to set up the data files that for the MOLGEN analysis programs; XSEARCH, which GENET users use to effectively search through our database for sequences that can assist them in their research; and the electronic mail facilities. Very few of our GENET users actually feel comfortable using programs other than the ones that we maintain, not because the other programs would not be useful, but instead because the users do not have the computer time to experiment with what is available. There are three note-worthy programs that we provide for GENET users that are used extensively. SEQ. a DNA-RNA sequence analysis program, is the most widely used. MAP, a program that assists in the construction of restriction maps from restriction enzyme digest data, is also used a great deal. Finally, a new program, MAPPER (written and maintained by William Pearson from Johns Hopkins University), is a simplified version of the MOLGEN MAP program that is somewhat more efficient than the MOLGEN version. The MOLGEN UE program and special molecular genetics knowledge bases are not available to the general GENET user at this time for two reasons. First of all, the UE program is quite costly to use (in terms of computer cycles), and secondly, we feel that the knowledge base is not quite ready for the computer novice to learn and use without a significant amount of initial assistance. A few GENET users (mostly Stanford associates) that have had a significant interest in the knowledge base have become EXO- MOLGEN users and are developing knowledge bases on their own which we hope will eventually be added to the ones that MOLGEN is developing and maintaining. GENET Usage Statistics Following are plots of the monthly GENET CPU usage, connect time, and file usage data. The consumption of CPU time has continued to grow despite our rather stringent controls as seen in Figure 14. In fact, the cumulative GENET usage this past year is approximately 50% higher than the largest AI research project consumer as seen in Figure 11. E. A. Feigenbaum 76 P41 RROO785-09 Dissemination Efforts 200 GENET CPU Usage Hours/Month 100 0 T T T T T T Jan Jan Jan 1980 1981 1982 eae GENET Connect Time . Hours/Month \ 600 400 200 0 T T T | T Jan Jan Jan 1980 1981 1982 2500 GENET File Usage — 2000-4 Disk Pages “ Ne VO 1500 1000 500 4 oe 0 T T T T Jan Jan Jan 1980 1981 1982 Figure 14. Monthly Resource Usage by GENET Community 77 E. A. Feigenbaum Comments on the Biotechnology Resources Program P41 RROO785-09 I.F Comments on the Biotechnology Resources Program Resource Organization We continue to believe that the Biotechnology Resources Program is one of the most effective vehicles for developing and disseminating technological tools for biomedical research. The goals and methods of the program are well-designed to encourage building of the necessary multi- a@isciplinary groups and merging appropriate technological and medical disciplines. In our experience with the SUMEX-AIM resource, several elements of this approach seem to emerge as Key to the development and management of an effective resource: 1) Effective Management Framework - there needs to be an explicit agreement between the BRP and the resource principal investigator that sets out a clear mandate for the resource and its allocation, provides worthwhile incentives for the host institution and investigator to invest the necessary substantial professional career time to develop and manage the resource, and ensures equitable distribution of resource services to its target community. 2) Close Working Relationship with NIH - a resource is a major and often long-term investment of money and human energy. A close and mutually supportive working relationship between resource management, its advisory committees, and the NIH administration is essential to assure healthy development of the resource and its relationship to its user community. We at SUMEX-AIM have benefited immensely from such a relationship with Dr. William R. Baker, Jr. in the evolution of the SUMEX-AIM community. 3) Freedom to Explore Resource Potential - a resource, by its nature, operates at the “cutting edge" in developing its characteristic technology and learning how to effectively disseminate it to the biomedical community at large. BRP should not impose artificial constraints on the resource for commercializing its efforts (fees for service) or developing its potential (funding duration limits or annual budget ceilings). Such artificial policy impositions can serve to undermine the very goals central to BRP’s reason for existence. Satisfactory policies in this regard have been worked out recently and should be retained. Electronic Communications SUMEX-AIM has pioneered in developing more effective methods for facilitating scientific communication. Whereas face to face contacts continue to play a key role, in the longer term we feel that computer-based communications will become increasingly important to NIH and the biomedical community. We would like to see BRP take a more active role in promoting these tools within NIH and its grantee community. A concrete step would be to become a sponsoring agency for the ARPANET which remains the most effective means for a very broad spectrum of services to promote good E. A. Feigenbaum 78 P41 RROO785-09 Comments on the Biotechnology Resources Program communications. This could serve as a base for interconnecting sponsored machines and offering a broader range of services and promoting broader collaboration among the biomedical community at large. 79 E. A. Feigenbaum P41 RROO785-09 Description of Scientific Subprojects II Description of Scientific Subprojects II.A Scientific Subprojects The following subsections report on the AIM community of projects and "pilot" efforts including local and national users of the SUMEX-AIM facility at Stanford. Those using the Rutgers-AIM facility are annotated with "(Rutgers-AIM]*. In addition to these detailed progress reports, we have included briefer summary abstracts of the fully authorized projects in Appendix C on page 277. The collaborative project reports and comments are the result of a solicitation for contributions sent to each of the project Principal Investigators requesting the following information: I. SUMMARY OF RESEARCH PROGRAM A. Project rationale B. Medical relevance and collaboration C. Highlights of research progress --Accomplishments this past year -~Research in progress D. List of relevant publications E. Funding support (see details below) II. INTERACTIONS WITH THE SUMEX-AIM RESOURCE A. Medical collaborations and program dissemination via SUMEX B. Sharing and interactions with other SUMEX-AIM projects (via computing facilities, workshops, personal contacts, etc.) Cc. Critique of resource management (community facilitation, computer services, communications services, capacity, etc.) III. RESEARCH PLANS (8/80-7/86) A. Project goals and plans --Near-term ~-Long-range B. Justification and requirements for continued SUMEX use C. Needs and plans for other computing resources beyond SUMEX-AIM D. Recommendations for future community and resource development We believe that the reports of the individual projects speak for themselves as rationales for participation; in any case the reports are recorded as submitted and are the responsibility of the indicated project leaders. 81 E. A. Feigenbaum Stanford Projects P4i RROO785-09 II.A.1 Stanford Projects The following group of projects is formally approved for access to the Stanford aliquot of the SUMEX-AIM resource. Their access is based on review by the Stanford Advisory Group and approval by Professor Feigenbaum as Principal Investigator. E. A. Feigenbaum 82 P41 RROO785-09 AGE - Attempt to Generalize II.A.1.1 AGE - Attempt to Generalize AGE - Attempt to Generalize H. Penny Nii and Edward A. Feigenbaum Computer Science Department Stanford University ABSTRACT: Isolate inference, control, and representation techniques from previous knowledge-based programs; reprogram them for domain independence; write an interface that will help a user understand what the package offers and how to use the modules; and make the package available to other members of the AIM community and labs doing knowledge-based programs development, and the general scientific community. I. SUMMARY OF RESEARCH PROGRAM A. Project Rationale The general goal of the AGE project is to demystify and make explicit the art of knowledge engineering. It is an attempt to formulate the knowledge that knowledge engineers use in constructing knowledge-based programs and put it at the disposal of others in the form of a software laboratory. The design and implementation of the AGE program is based primarily on the experience gained in building knowledge-based programs at the Stanford Heuristic Programming Project in the last decade. The programs that have been, or are being, built are: DENDRAL, meta-DENDRAL, MYCIN, HASP, AM, MOLGEN, CRYSALIS [Feigenbaum 1977, 1980], and SACON [Bennett 1978]. Initially, the AGE program will embody artificial intelligenca methods and techniques used in these programs. However, the long-range aspiration is to integrate those developed at other AI laboratories. The final product is to be a collection of building-block programs combined with an "intelligent front-end" that will assist the user in constructing knowledge-based programs. It is hoped that AGE will speed up the process of building knowledge-based programs and facilitate the dissemination of AI techniques by: (1) packaging common AI software tools so that they need not be reprogrammed for every problem; and (2) helping people who are not knowledge engineering specialists write knowledge-based programs. B. Medical Relevance and Collaboration AGE is relevant to the SUMEX-AIM Community in two ways: as a vehicle for disseminating cumulated knowledge about the methodologies of knowledge engineering and as a tool for reducing the amount of time needed to develop knowledge-based programs. (1). Dissemination of Knowledge: The primary strategy for conducting AI research at the Stanford Heuristic Programming Project is to build 83 E. A. Feigenbaum AGE - Attempt to Generalize P41 RROO785-09 complex programs to solve carefully chosen problems and to allow the problems to condition the choice of scientific paths to be explored. The historical context in which this methodology arose and summaries of the programs that have been built over the last decade at HPP are discussed in [Feigenbaum 1977, 1980]. While the programs serve as case studies in building a field of "knowledge engineering," they also contribute to a cumulation of theory in representation and control paradigms and of methods in the construction of knowledge-based programs. The cumulation and concomitant dissemination of theory occur through scientific papers. Over the past decade we have also cumulated and disseminated methodological knowledge. In Computer Science, one effective method of disseminating knowledge is in the form of software packages. Statistical packages, though not related to AI, are one such example of software packages containing cumulated knowledge. AGE is an attempt to make yesterday’s “experimental technique" into tomorrow's “tool” in the field of knowledge engineering. (2). Speeding up the Process of Building Knowledge-based Programs: Many of the programs built at HPP are intelligent agents to assist human problem solving in tasks of significance to medicine and biclogy (see separate sections for discussions of work and relevance). Without exception the programs were handcrafted. This process often takes many years, both for the AI scientists and for the experts in the field of collaboration. AGE will reduce this time by providing a set of preprogrammed inference mechanisms and representational forms that can be used for a variety of tasks. Close collaboration is still necessary to provide the knowledge base, but the system design and programming time of the AI scientists can be significantly reduced. Since knowledge engineering is an empirical science, in which many programming experiments are conducted before programs suitable for a task are produced, reducing the programming and experimenting time would significantly reduce the time required to build knowledge-based programs. C. Highlights of Research Summary The plans made in 1976 for the AGE project included the construction of two systems. The development of the first of these systems, AGE-1, was officially concluded on October 31, 1981. The system, together with documentation, is now available for use. Much of the year was spent in activities related to releasing AGE-1. The most time-consuming activity was finishing the documentation. Most of the knowledge specification editors and debugging facilities were rewritten to improve the user interface. In addition, several new features were added in the area of data input protocols and focussing mechanisms. The current user interface is directed at teletype-like terminals; that is, terminals where information is presented linearly in a single window. For a complex system that has many inter-related components that E. A. Feigenbaum 84 P41 RROO785-09 AGE - Attempt to Generalize need to be specified and manipulated (such as AGE-1), this mode of interaction makes the system appear more complex than it is. We rewrote most of the user interface to alleviate many of the problem encountered by users in the past, but there are many problems thaw cannot be solved without moving to another medium of communication. With this motivation, we began our experiment in using multiple windows on the bit-map display of the Dolphin. The version of AGE-1 on the Dolphin is called AGE-1.5 -- the only difference from AGE-1 is the user interface protocals. (It should be noted that moving AGE, which was optimized for a time-sharing system, to a personal computer took several weeks.) Our plans to begin the design of AGE-2 was postponed until 1982 (see Future Research section). D. Publications Nii, H. Penny and Aiello, Nelleke, "AGE: a knowledge-based program for building knowledge-based programs,* Proc. of IJCAI-6, pp. 645-655, vol. 2, 1979. Nii, H. Penny, “An Introduction to Knowledge Engineering, Blackboard Model, and AGE,“ HPP Working Paper, HPP-80-29. Aiello, N. and Nii, H.P., "The Joy of AGE-ing: A User’s Guide to AGE~-1," October 31, 1981. Aiello, N., Boeck, C., Nii, H.P., White, W., "AGE Reference Manual," October 31, 1981. AGE Example Series i: “BOWL: A Beginner’s Program." AGE Example Series 2: “AGEPUFF: A Simple Event-Driven Program.” If. INTERACTION WITH THE SUMEX-AIM RESOURCES AGE Availability Currently AGE-1 is available on the PDP-10 at the SUMEX-AIM Computing Facility and on the PDP-20/60 at the SCORE Facility of the Computer Science Department. A tape of the compiled system that will run with Tenex or Tops-20 operating systems is available for a taping fee. The current implementation is described briefly in a later section. Summary Description of AGE-1 Currently Implemented Tools: AGE-1 provides the user with a set of preprogrammed modules called “components”. Using different combinations of these program modules, the user can build a variety of programs that display different problem-solving behavior. AGE-1 also provides a user interface modules can help the user in constructing and specifying the details of the components. A component is a collection of functions and 85 E. A. Feigenbaum AGE - Attempt to Generalize P4i RROO785-09 variables that support conceptual entities in program form. For example, production rule, as a component, consists of: (1) a rule interpreter that support the syntactic and semantic description of production-rule representation as defined in AGE, and (2) various strategies for rule selection and execution. The components in AGE-i have been carefully selected and modularly programmed to be useable in combinations. For those users not familiar enough to experiment with combining the components, AGE-1 provides the user two predefined configuration of components -- each configuration is called a “framework". One framework, called the Blackboard framework, is for building programs that are based on the Blackboard model [Lesser 77]. Blackboard modal uses the concepts of a globally accessible data structure called a "blackboard", and independent sources of knowledge which cooperate to form hypotheses. The Blackboard model has been modified to allow flexibility in representation, selection, and utilization of knowledge. The other framework, called the Backchain framework, is for building programs that use backward-chained production rules as its primary mechanism of generating inferences. The Front-End: To support the user in the selection, specification, and use of the components, AGE-1 is organized around four major subsystems that interact in various ways. Surrounding it is a system executive that allows the user access to these subsystems, as well as other user facilities, through menu selection. Figure 1. shows the general interrelationship among these subsystems. The Design subsystems helps to familiarize the user with AGE-1 and to guide the user in the construction of his programs through the use of predefined frameworks. The second subsystem is a collection of interface modules that help the user specify the various components of the framework. The other subsystems are designed for testing and refining the user program. Each of the subsystem is described in more detail below: DESIGN: The function of the DESIGN subsystem is to guide the user in the design and construction of his program through the use of predefined configuration of components, or framework. Each framework is defined in DESIGN-SCHEMA, a data structure in the form of AND/OR tree, that, on one hand, represents all the possible configuration of components within the framework; and, on the other hand, represents the decisions the user must make in order to design the details of the user program. Using this schema, the DESIGN subsystem guides the user from one design decision point to another. At each decision point, the user has access to the "help" file and also to advice regarding design decisions at that point. An appropriate ACQUISITION module can be invoked from the DESIGN subsystem so that general design and implementation specifications can be accomplished simultaneously. ACQUISITION: For each component that the user must specify, there is a corresponding specification editor module that queries the user for task- specific information. The calling sequence of the acquisition module is guided by DESIGN-SCHEMA when the user is using the DESIGN subsystem. They can also be accessed directly from the system executive or Interlisp. E. A. Feigenbaum 86 P41 RROO785-09 AGE - Attempt to Generalize INTERPRETER: This subsystem contains several modules that help the user run and debug his program. The Check module checks for the completeness and correctness of the specification for an entire framework. For any error found, the system can call an appropriate editor to fix the error. The Interpreter executes the user program. The Trace and Break modules are run-time debugging aids. The Editor, Check, Trace, Break, and the Explanation (described below) modules are designed to complement each other, and to help the user observe the workings of his program and to make corrections as necessary. EXPLANATION: AGE-1 has enough information to replay its execution steps, and it has reasonable justifications for the actions taken within the various frameworks. AGE-1 provides a trace-back explanation facility whereby questions related to the execution history can be answered by the system interactively. However, AGE is totally ignorant of the user’s task domain and has no means of conducting a dialogue about the specifics of the domain. A detailed history of the execution steps is available to the user to build his own domain specific explanation, if necessary. SYSTEM KNOWLEDGE SUBSYSTEM RESULT | { toescsc rere + terre Yorn n-- + Form r tere ee + | DESIGN |....>| DESIGN |....>/USER SYSTEM | | SCHEMA |... | | | DESIGN | teoccr coo + tenn +------ + trac to--- + [Soc eee | $e eee ern + . teo--- Yoor---- + teen ee nme + |COMPONENTS |....>| ACQUISITION|....>{ USER | | i | EDITOR | | SYSTEM | tr-- Hn + too eer + +------ tooe-- + [Mole eee | treo VYooo-- + [INTERPRETER |..... > EXECUTION tenn nn | =---- + .. HISTORY LIST Ve.00.0.00, | toon er ern + |EXPLANATION | too + Figure 1. AGE System Organization (... = data flow; --- = control flow) III. RESEARCH PLAN The primary objective of the AGE Project was, and continues to be, to see if a software laboratory could be built to speed up the process of building Expert Systems. This task was subdivided into two major subtasks: 87 E. A. Feigenbaum AGE ~ Attempt to Generalize P41 RROO785-09 ~-tool building: to isolate inference, control, and representation techniques used in other Expert Systems and reprogram them for domain independence; and --user interface: to build an intelligent front-end to guide the user in the use of the tools. The strategy for the tool-building task was to take paradigms with a history of successful applications, decompose them into more or less independent parts, and reprogram them. The first paradigm to be thus decomposed and reimplemented was the Blackboard model as used in HASP and CRYSALIS. Currently, AGE-1 contains components for building programs that use a rule-based blackboard model, backward-chained rules, Units, and any combination of the three. In each decomposed components we tried to extend and/or generalize; for example, in AGE-1, the user can define separate methods to deal with uncertainty for different kinds of knowledge. The task of intelligent front-end for AGE was further broken down into two stages based on different types of users: --Stage 1 Task: Build a system usable by an AI scientist or knowledge engineer who knows Lisp and the production rule representation of knowledge; who is familiar with methods of building knowledge-based systems; and who wants to use AGE to avoid coding basic system components and to try different problem-solving techniques with minimum recoding. --Stage 2 Task: Build a system for a person who has a good working knowledge of AI, but who is not familiar with building Expert Systems and needs guidance on what techniques to use. AGE-1 is a Stage 1 system. AGE-2 is to be a system directed at novice knowledge engineers. Determining the shape of AGE-2 and implementing it involves a variety of research tasks described below. FUTURE WORK There are many difficulties encountered by the users of AGE-1. Ninety per cent of the difficulties can be attributed to complexity of the system. It contains many design options, the consequences of which the average user does not understand, nor needs to understand for many of the applications. (It should be noted that most people interested in systems like AGE at this point in time are novice knowledge engineers.) This difficulty is compounded by the need to specify and view many interrelated parts of the user program in a linear presentation. Solving the first problem involves research in the design aspects of Expert Systems. Although AGE-1 can be used to design many different kinds of Expert Systems, the current Design module is minimal. It is minimal in two aspects: (1) it only keeps track of parts that need to be specified and what has been specified, with some suggestion on what part to work on next; and (2) it only knows about the design of programs based on blackboard E. A. Feigenbaum 88 P41 RROO785-09 AGE ~ Attempt to Generalize model and that of backward chained rules. This Design module needs to be Teplaced by one that can help the user match his problem characteristics with appropriate combination of AGE component codes and concepts. This leads to a difficult task of doing knowledge engineering on knowledge engineers. To do this, the current components will have to be re- represented in a uniform manner, and rules written to match aspects of user problem with AGE facilities. The immediate bottleneck is in representing parts of AGE which involve the description of interrelated processes. The more immediately doable task is to replace the linear aspect of program specification (including knowledge acquisition) by using the graphic facilities available on the Dolphins. Before any changes can be made to AGE-1, it must be brought up on a Dolphin and various system maintenance facilities implemented. This is currently being done. The system on Dolphin with the AGE-1 internals and a multi-dimensional user interface will be called AGE-1.5. AGE-1.5 In the current AGE System, as in most other expert system building tools, knowledge is acquired from the user serially. The user is asked questions and types in the answers. In general the user works on one part of his system at a time, and must exit the current editor or acquisition module to examine or look at another part of his system. The questions in AGE are friendly and mostly self-explanatory, requiring minimum intervention or aid from the knowledge engineer. However the serial nature of the questions does cause a bottleneck. For experienced users the questions and prompts frequently become annoying and seem to get in the way of productive work. This is true in spite of the fact that the experienced user sees an abbreviated version of explanations, comments, questions, and prompts. To ease this problem and speed up the knowledge acquisition process, our plan is to add graphic capabilities to AGE. Using Interlisp-D on a Dolphin, we will implement menu selection, windows and screen editors, and possibly, graphic display of blackboard contents. For information which can best be acquired using serial questions, menu selection will allow the user to select the proper answer with the touch of a button. (This is much faster than typing in enough of the correct response to ambiguate it from other possible answers.) We intend to display windows showing some of each component of the user’s system. Again, by a touch of a (mouse) button, the user will. be able to scroll a particular window to look at specific information, if it is not already visible. These windows will also be available to trace the execution of the user’s program, with the relevant, changing information in each component visible in the window at any particular time. Finally, a screen editor will be implemented to allow the user to edit information in a window or move information between windows. The window package in AGE will be designed to allow for experimentation with the sizes and locations of various windows. Tests will be conducted to compare AGE-1 and the graphic AGE-1.5 to show that the 89 E. A. Feigenbaum AGE - Attempt to Generalize P41 RROO785-09 graphics capabilities significantly improve the knowledge acquisition process. By improve we mean both to shorten the duration of the acquisition process and to improve its palatability. Similar experiments will be used to determine the most efficient layout of the windows in the graphic AGE systen. AGE-2 AGE-2 will try to address the second of the research tasks described above. Although the current Design subsystem provides specification functions that allow the user to interactively specify the knowledge of the domain and the control structure, it does not (aside from simple advise) provide the user any help in the actual design process. For example, AGE should be able to provide some aids to the user on what kinds of inference mechanisms and representations are appropriate for his application problem. We have stated this problem in our previous reports without any promising ideas on how we might attack this problem. With the variety of feedbacks we received from our experimental users, we now understand a few of the problems the inexperienced users are faced with. With these in mind, we have begun, and will continue, to explore ways in which we can redesign and add facilities that will help users who are not familiar with knowledge engineering techniques and methodologies. One of the major obstacles in the way of AGE-2 development is the way in which AGE-1 is implemented. Although the syntax of AGE-1 is clearly defined (see the Reference Manual), the semantics are not well-defined. They are defined in ad hoc fashion in the Editor, the Interpreter, and the Check modules. In order for AGE-2 to be able to conduct a dialogue about itself with the user, its semantics, as well as its syntax, must be uniformly represented. Since very little research results are available in the area of representing the semantics of systems (one exception is in the automatic programming research), we need to experiment with a variety of approaches. We have already begun to look into some alternative representations. In changing the representation of the AGE system, no new components will be added, and minimum amount of changes will be made to the definition of the existing components. Concurrent with re-representing the AGE system, we will identify a dozen or so framework, in addition of the existing two, that have simpler constructs and are easier for the novice users to understand. The Simplicity will be achieved by providing less options for the user -- options which, because of their nature, are confusing to new users. Limiting the degrees of freedom for the user has the side benefit of allowing AGE to provide more specific description and aids. For example, in a very constrained framework we can provide a library of "standard” predicates for the users, which can have associated with them English translations; with such texts available the rules and the back-trace explanation can be printed in English-like form. Once the user is comfortable with the more simple frameworks, he can add complexity simply by replacing the predefined options selected for the frameworks. E. A. Feigenbaum 90 P41 RROO785-09 AGE - Attempt to Generalize AGE-2 design will not begin until AGE-1.5 is further along and until we have more data points on how AGE-1 and AGE-1.5 are used. Computing Resources and Management We believe the computing and communication resources provide by the SUMEX Facility is one of the best in the country. The management is responsive to the needs of the research community and provides superb services. However, the system is getting to a point where no serious research and development is possible, because of the lack of computing cycles due to overcrowding. It is a compliment to the facility that there are sO many users. On the other hand, our productivity has gone down in recent months, because of the heavy load on the system. It would appear that the situation will not improve on its own, since many of the projects that were small a few years ago are maturing into larger, more complex systems. Which is the way it should be. The environment in which the work is done also needs to grow. In short, without augmentation to the current computing power and storage space (which had never been generous), our ability to make research progress at SUMEX will be drastically curtailed. 91 E. A. Feigenbaum AI Handbook Project P4i RROO785-09 II.A.1.2 AI Handbook Project Handbook of Artificial Intelligence E.A. Feigenbaum, A. Barr, and P. Cohen Stanford Computer Science Department I. SUMMARY OF RESEARCH PROGRAM A. Technical Goals The AI Handbook is a compendium of knowledge about the field of Artificial Intelligence. It has been edited by Avron Barr, Paul Cohen, and Edward Feigenbaum, with textual contributions from students and investigators at several research facilities across the nation. The scope of the work is broad: Hundreds of articles cover most of the important ideas, techniques, and systems developed during 26 years of research in AI. Each short article is a description written for non-AI specialists and students of AI. Additional articles serve as Overviews, which discuss the various approaches within a subfield, the issues, and the problems. There is no comparable resource for AI researchers and other scientists and technologists who need access to descriptions of AI techniques and concepts. The research literature in AI is not very accessible. And the elementary textbooks are not nearly broad enough in scope to be useful to a scientist working primarily in another discipline who wants to do something requiring knowledge of AI. Furthermore, we feel that some of the Overview articles are the best critical discussions of activity in the field available anywhere. To indicate the scope of the Handbook, we have included an outline of the articles as an appendix to this report (see page 269). B. Medical Relevance and Collaboration The AI Handbook Project was undertaken as a core activity by SUMEX in the spirit of community building that is the fundamental concern of the facility. We feel that the organization and propagation of this kind of information to the AIM community, as well as to other fields where AI is being applied, is a valuable service that we are uniquely qualified to support. C. Progress Summary The major work of this project is now finished. The Handbook Material was completed in April, 1982, and has been published in three volumes--over 1500 pages. The chapters also are appearing as Stanford Computer Science Department Technical Reports available through the National Technical Information Service. Work continues on developing a convenient mechanism for on-line access to the Handbook material. When E. A. Feigenbaum 92 P41 RROO785-09 AI Handbook Project that access software is completed, the Handbook text will be available for browsing by the SUMEX community. Both the first and second volumes of the Handbook have been selected by the Library of Science Book Club as main selections. D. List of Relevant Publications "The Handbook of Artificial Intelligence, Volume I," Avron Barr and Edward A. Feigenbaum, Eds., William Kaufmann, Inc., Los Altos, California, May 1981. "The Handbook of Artificial Intelligence, Volume II," Avron Barr and Edward A. Feigenbaum, Eds., William Kaufmann, Inc., Los Altos, California, June 1982. "The Handbook of Artificial Intelligence, Volume III," Paul Cohen and Edward A. Feigenbaum, Eds., William Kaufmann, Inc., Los Altos, California, June 1982. Many of the chapters of Volumes I and II of the AI Handbook have already appeared in preliminary form as Stanford Computer Science Technical Reports, authored by the respective chapter-editors. References follow. Chapters from Volume III will appear as Technical Reports in the summer and fall of 1982. HPP-79-12 (STAN-CS-79-726) Ann Gardner. Search. HPP-79-17 (STAN-CS-79-749) William Clancey, James Bennett, and Paul Cohen. Applications-oriented AI Research: Education. HPP-79-21 (STAN-CS-79-754) Anne Gardner, James Davidson, and Terry Winograd. Natural Language Understanding. HPP-79-22 (STAN-CS-79-756) James S. Bennett, Bruce G. Buchanan, and Paul R. Cohen. Applications-oriented AI Research: Science and Mathematics. HPP-79-23 (STAN-CS-79-757) Victor Ciesielski, James S. Bennett, and Paul R. Cohen. Applications-oriented AI Research: Medicine. HPP-79-24 (STAN-CS-79-758) Robert Elschlager and Jorge Phillips. Automatic Programming. HPP-80-3 (STAN-CS-80-793) Avron Barr and James Davidson. Representation of Knowledge. 93 E. A. Feigenbaum AI Handbook Project P41 RROO785&-09 E. Funding Support Status The Handbook Project is partially supported under the Heuristic Programming Project contract with the Advanced Research Projects Agency of the DOD, contract number MDA903-80-C-0107, E. A. Feigenbaum, Principal Investigator and under the core research activities of the SUMEX-AIM resource. IT. INTERACTIONS WITH SUMEX-AIM RESOURCE A. Collaborations and Medical Use of Programs via SUMEX We have had a modest level of collaboration with a group of students and staff at the Rutgers resource, as well as occasional collaboration with individuals at other ARPA net sites. B. Sharing and Interactions with Other SUMEX-AIM Projects As described above, we have had moderate levels of interaction with other members of the SUMEX-AIM community, in the form of writing and _ reviewing Handbook material. During the development of this material, arrangements were made for sharing the emerging text. The published material will also be made available to the community as an on-line resource. C. Critique of Resource Management Our requests of the SUMEX management and systems staff, requests for additional file space, directories, systems support, or program changes, have been answered promptly, courteously and competently, on every occasion. III. RESEARCH PLANS (8/80 - 7/83) A. Long Range Project Goals During 1982, all material will be published. The on-line access program will continue under development. B. Justifications and Requirements for Continued SUMEX Use The AI Handbook Project is a good example of community collaboration using the SUMEX-AIM communication facilities to prepare, review, and disseminate this reference work on AI techniques. The Handbook articles currently exist as computer files at the SUMEX facility. All of our authors and reviewers have had access to these files via the network facilities and have used the document-editing and formatting programs available at SUMEX. This relatively small investment of resources has resulted in what we feel is a seminal publication in the field of AI, of particular value to researchers who want quick access to AI ideas and techniques for application in other areas. E. A. Feigenbaum 94 P41 RROO785-09 AI Handbook Project C. Needs and Plans for Other Computational Resources We use document preparation programs at SUMEX and the Computer Science Department's SCORE machine. We have used and will continue to use a Computer Science Department phototypesetting machine, the Alphatype, to produce the final copy of the AI Handbook. The phototypesetting software called TEX, developed at Stanford, is the vehicle for this production. The on-line access program will be written as a SUMEX systems resource. D. Recommendations for Future Community and Resource Development None. 95 E. A. Feigenbaum DENDRAL Project P41 RROO785-09 II.A.1.3 DENDRAL Project The DENDRAL Project Resource-Related Research: Computers in Chemistry Prof. Carl Djerassi Department of Chemistry Stanford University I. SUMMARY OF RESEARCH PROGRAM The DENDRAL Project is a resource-related research project. The resource to which 1t is related is SUMEX-AIM, which provides DENDRAL its sole computational resource for program development and dissemination to the biomedical community. A. Project Rationale The DENDRAL project is concerned with the application of state-of- the-art computational techniques to several aspects of structural chemistry. The overall goals of our research are to develop and apply computational techniques to the procedures of structural analysis of known and unknown organic compounds based on structural information obtained from physical and chemical methods and to place these techniques in the hands of a wide community of collaborators to help them solve questions of structure of important biomolecules. These techniques are embodied in interactive computer programs which place structural analysis under the complete control of the scientist working on his or her own structural problem. Thus, we stress the word assisted when we characterize our research effort as computer-assisted structure elucidation or analysis. Our principal objective is to extend our existing techniques for computer assistance in the representation and manipulation of chemical structures along two complementary, interdigitated lines. We are developing a comprehensive, interactive system to assist scientists in all phases of structural analysis (SASES, or Semi-Automated Structure Elucidation System) from data interpretation through structure generation to data prediction. This system will act as a computer-based laboratory in which complex structural questions can be posed and answered quickly, thereby conserving time and sample. In a complementary effort we are extending our techniques from the current emphasis on topological, or constitutional, representations of structure to detailed treatment of conformational and configurational stereochemical aspects of structure. By meeting our objectives we will fill in the "missing link" in computer assistance in structural analysis. Our capabilities for structural analysis based on the three-dimensional nature of molecules is an absolute necessity for relating structural characteristics of molecules to their observed biological, chemical or spectroscopic behavior. These E. A. Feigenbaum 96 P41 RROO785-09 DENDRAL Project capabilities will represent a quantum leap beyond our current techniques and open new vistas in applications of our programs, both of which will attract new applications among a broad community of structural chemists and biochemists who will have access to our techniques. This access depends entirely on our access to and the continued availability of SUMEX~AIM. These issues are discussed in detail in the subsequent section, Interactions with the SUMEX-AIM Resource. The primary rationale for our research effort is that structure determination of unknown structures and the relationship of known structures to observed spectroscopic or biological activity are complex and time-consuming tasks. We know from past experience that computer programs can complement the biochemist’s knowledge and reasoning power, thereby acting as valuable assistants in solving important biomedical problems. By meeting our objectives we feel strongly that our programs will become essential tools in the repertoire of techniques available to the structural biochemist. - We are currently beginning the third year of our three year grant. This period represents a transition in the sense that we have pushed our research efforts in techniques for spectral interpretation, structure generation (e.g., CONGEN) and spectral prediction to their limits within the confines of topological representations of molecular structure. At this time, these techniques are perceived to be of significant utility in the scientific community as evidenced by our workshops, the demand for the exportable version of CONGEN and the number of persons requesting collaborative or guest access to our programs at Stanford (see Interactions with the SUMEX-AIM Resource). These existing techniques will, for some years to come, remain as important first steps in solving structural problems. However, in order to anticipate the future needs of the community for programs which are more generally applicable to biological structure problems and more easily accessible we must address squarely the limitations inherent in existing approaches and search for ways to solve them. Our major objectives are based on the following rationale. None of our techniques (or the techniques of any other investigators) for computer-assisted structure elucidation of unknown molecular structures make full use of stereochemical information. As existing programs were being developed this limitation was less important. The first step in many structure determinations is to establish the constitution of the structure, or the topological structure, and that is what CONGEN, for example, was designed to accomplish. However, most spectroscopic behavior and certainly most biological activities of molecules are due to their three-dimensional Nature. For example, some programs for prediction of the number of resonances observed in 13CMR spectra use the topological symmetry group of a molecule for prediction. However, in reality it is the symmetry group of the stereoisomer that must be used. This group reflects the usually lower symmetry of molecules possessing chiral centers and which generally exist in fewer than the total possible number of conformations. This will increase the number of carbon resonances observed over that predicted by the topological symmetry group alone. More generally, few of the techniques in the area of computer-assisted structure elucidation can be used in 97 E. A. Feigenbaum DENDRAL Project P41 RROO7&85-09 accurate prediction of structure/property relationships, whether the properties be spectral resonances or biological activities. A structure is not, in fact, considered to be established until its configuration, at least, has been determined. Its conformational behavior may then be important to determine its spectroscopic or biological behavior. For these reasons we are emphasizing in our current grant period development of stereochemical extensions to CONGEN, our newly-developed structure generator, GENOA (see References 17, 18), and related programs such as the C-13 Nuclear Magnetic Resonance (NMR) programs (see References 15, 16,19-23), including machine representations and manipulations of configuration (see References 1, 10) and conformation (see Reference 19,24,26) and constrained generators for both aspects of stereochemistry (see References 6, 9, 11, 12). None of the existing techniques for computer-assisted structure elucidation of unknown molecules, excepting very recent developments in our own laboratory, are capable of structure generation based on inferred partial structures which may overlap to any extent. Such a capability is a critical element in a computer-based system, such as we propose, for automated inference of substructures and subsequent structure generation based on what is frequently highly redundant structural information including many overlapping part structures. Important elements of our research are concerned with further developments of such a capability for structure generation (the GENOA program, (see Reference 17)). Given the above tools for structure representation and generation, we can consider new interpretive and predictive techniques for relating spectroscopic data (or other properties) to molecular structure (see References 2, 3, 7, 8, 14, 15, 16, 19-23). The capability for representation of stereochemistry is required for any comprehensive treatment of: 1) interpretation of spectroscopic data (see References 15, 16, 19-23); 2) prediction of spectroscopic data (see References 15, 16, 19- 23); 3) induction of rules relating known molecular structures to observed chemical or biological properties (see Reference 19,24,26). These elements, taken together, will yield a general system for computer-aided structural analysis (the SASES system) with potential for applications far beyond the specific task of structure elucidation. Parallel to our program development we have embarked on a concerted effort to extend to the scientific community access to our programs, and critical parts of our research effort are devoted to methods for promoting this resource sharing. Our rationale for this effort is that the techniques must be readily accessible in order to be used, and that development of useful programs can only be accomplished by an extended period of testing and refinement based on results obtained in analysis of a variety of structural problems, analyzed by those scientists actively involved in solutions to those problems. Our efforts in this area are summarized in Section II.A, Scientific Collaboration and Program Dissemination) . E. A. Feigenbaum 98 P41 RROO785-09 DENDRAL Project B. Medical Relevance and Collaboration The medical relevance of our research lies in the direct relationship between molecular structure and biological activity. The sciences of chemistry and biochemistry rest on a firm foundation of the past history of well-characterized chemical structures. Indeed, structure elucidation of unknown compounds and the. detailed investigation of stereochemical configurations and conformations of known compounds are absolutely essential steps in understanding the physiological role played by structures of demonstrated biological activity. Our research is focussed on providing computational assistance in several areas of structural chemistry and biochemistry, with primary attention directed to those aspects of the problem which are most difficult to solve by strictly manual methods. These aspects include exhaustive and irredundant generation of constitutional isomers, and configurational and conformational stereoisomers under chemical, biological and spectroscopic constraints with @ guarantee that no plausible stereoisomer has been overlooked. Although our programs can be applied to a variety of structural problems, in fact most applications by our group and by our collaborators are in the area of natural products, antibiotics, pheremones and other biomolecules which play important biochemical roles. In discussions of collaborative investigations involved with actual applications of our programs we have always stressed the importance of strong links between the structures under investigation and the importance of such structures to health-related research. This emphasis can be seen by examination of the affiliations of current DENDRAL-related investigators and the brief description of current collaborative efforts in Interactions with the SUMEX-AIM Resource. C. Highlights of Research Progress In this section we discuss briefly some major highlights of the past year and research currently in progress. 1. Past Year 1.1 Programs for Interpretation and Prediction of Spectral Data. We are actively pursuing several novel approaches to the automated interpretation of spectral data, concentrating on carbon-13 magnetic resonance (CMR), proton magnetic resonance (PMR) and mass spectral (MS) data. These approaches utilize large data bases of correlations between substructural features of a molecule and spectral signatures of such features. Our approaches are unique in that: 1) we can incorporate stereochemical features of substructures into the data bases; and 2) we can use the same data bases for both interpretation and prediction of data. For either interpretation or prediction of magnetic resonance data, stereochemical substructure descriptors are absolutely essential. Resonance positions are a strong function of the local environment of a resonating atom, including position in space relative to other neighboring atoms. Descriptors which include the three dimensional relationships among 99 E. A. Feigenbaum DENDRAL Project P41 RROO785-09 atoms in a substructure are required in order to obtain meaningful correlations. We now have programs for the interpretation (see References 15,19,21,23), prediction (see references 15,19-21), and assignment (see Reference 22) of carbon-13 nuclear magnetic resonance spectra which make use of an expanded carbon-13 NMR data base. We have completed preliminary work on a program for prediction of proton NMR spectra. All these programs use a structure and substructure representation which incorporates configurational stereochemistry (see reference 16) and make use of data bases. Such data bases can be used to interpret spectral data to obtain substructures to be used in CONGEN and GENOA, the structure generating programs (see References 15, 17). Continued automation of this aspect of structure elucidation will significantly ease the burden on the structural biochemist because the computer-based files are much more comprehensive and easier to use than correlation tables or diffuse literature sources. The same data bases can be used to predict spectral signatures in the context of a set of complete molecular structures. Comparison of predicted and observed spectra allows a rank-ordering of candidates and will be very useful in directing the attention of the investigator to the most plausible alternatives (see References 7, 8, 15, 20). 1.2 Improvement in the GENOA program for structure generation with overlapping atoms. A “significant improvement has been made to experimental versions of GENOA which remove the requirement that a fixed molecular formula be used. This allows a researcher to investigate problems in which this information is unknown. Instead of a fixed composition, a range of compositions is allowed. This program, named RANGEN, will be useful for less specified problems and in particular to the problem of designing molecules (see section 2.1 below). 1.3 Extension of treatment of stereochemistry to include conformations. A computer program has been designed and “written which provides a unique (canonical) designation for a conformation of a chemical structure or substructure based on earlier theoretical work (see Reference 24). This program takes as input a structure with 3-dimensional coordinates or cruder conformation designations and gives as output a structure or substructure with a unique conformation designation. Each rotatable bond can have discrete designations which represent a range of torsional angle positions of arbitrary fineness. This program represents a significant step in filling the final "missing link" in our structure representation. 1.4 Molecular Modelling and Graphics. In the past year we have purchased a Megatek Whizzard 7210 vector refresh graphics system and software necessary to interface with our programs and the SUMEX computer system. We have written a program called BUILD3D (see reference 25) which takes as input an augmented (with configurational stereochemistry) connection table representation of a chemical structure and gives as output three-dimensional coordinates. These are converted to a picture on the screen of the Megatek. This picture can be rotated, translated, etc. on the terminal locally with no need of the host computer therefore not adding E. A. Feigenbaum 100