SECS - Simulation and Evaluation of Chemical Synthesis— Section 9.2.2 9.2.2 SECS - Simulation and Evaluation of Chemical Synthesis SECS - Simulation and Evaluation of Chemical Synthesis PI: W. Todd Wipke Board of Studies in Chemistry University of California Santa Cruz, CA. 95064 Coworkers: D. Dolata (Grad student) R. Lasater (Grad Student) D. Rogers (Grad Student) J. Chou (Postdoctoral) P. Condran (Postdoctoral) T. Moock (Postdoctoral) T. Blume (Programmer) I. Summary of Research Program A. Technical Goals. The long range goal of this project is to develop the logical principles of molecular construction and to use these in developing practical computer programs to assist investigators in designing stereospecific syntheses of complex bio-organic molecules. Our specific goals this past year focused on basic research into representation of strategies, facilities for user-defined transforms, revision of our ALCHEM language for better debugging of transforms and extension of capabilities for representing complex reactions. In addition we hoped to improve capabilities for remote teletype usage of SECS and to initiate the. formation of a world-wide SECS Users Group for sharing chemical transforms. B. Medical Relevance and Cotlaboration. The development of new drugs and the study of how drug structure is related to biological activity depends upon the chemist's ability to synthesize new molecules as well as his ability to modify existing structures, e.g., incorporating isotopic labels or other substituents into biomolecular substrates. The Simulation and Evaluation of Chemical Synthesis (SECS) project aims at assisting the synthetic chemist in designing stereospecific syntheses of biologically important molecules. The advantages of this computer approach over normal manual approaches are many: 1) greater speed in designing a synthesis; 2) freedom from bias of past experience and past solutions; 3) thorough consideration of all possible syntheses using a more extensive library of chemical reactions than any individual person can remember; 4) greater capability of the computer to deal with the many structures which result; and 6) capability of computer to see molecules in graph theoretical sense, free from bias of. 2-D projection. E. A, Feigenbaum 226 Privileged Communication Section 9.2.2 SECS - Simulation and Evaluation of Chemical Synthesis The objective of using XENO (a spinoff of SECS) in metabolism is to predict the plausible metabolites of a given xenobiotic in order that they may be analyzed for possible carcinogenicity. Metabolism research may also find this useful in the identification of metabolites in that it suggests what to look for. Finally, it seems there may even be application of this technique in problem domains where one wishes to alter molecules so certain types of metabolism will be blocked. C. Progress and Accomplishments. RESEARCH ENVIRONMENT: At the University of California, Santa Cruz, we have a GT40 and a GT46 graphics terminal connected to the SUMEX-AIM resource by 1200 and 2400 baud leased lines (one leased line supported by SUMEX). We also have a T1725, T1745, CDI-1030, DIABLO 1620, and an ADM-3A terminal used over 300 baud leased lines to SUMEX. UCSC has only a small IBM 370/145, a PDP-11/45, 11/70 and a VAX 11/780, (the 11's are restricted to running small jobs for student time-sharing) all of which are unsuitable for this research. The SECS laboratory is in the process of moving to a newly renovated room with raised floor in the same building and same floor as the synthetic organic laboratories at Santa Cruz so the environment is excellent, I, C. Highlights of Research Progress 1. SECS Program Developments The Simulation and Evaluation of Chemical Synthesis (SECS) program has undergone many additions to improve its capabilities and usefulness to synthetic chemists. The CONGEN layout program of Carhart has been modified and incorporated in SECS for clean teletype output and simplified teletype input for users without graphics terminals. The synthesis tree plotting program for hard copy has been rewritten to give more compact trees which are faster to plot on the plotter. This generates better plots in less time and can also be used with XENO. The ALCHEM language which we developed for representing chemical reactions has undergone extensive revision to make it easier to represent absolute stereochemistry and some of the complex reactions in heterocyclic chemistry. Part of this revision now enables SECS to explain to the chemist which ALCHEM statements are being used and the results of their interpretation via a new decompiler for ALCHEM. A complete manual on ALCHEM and a manuscript on the revisions has been written. A User Defined Transform (UDT) module has been added to bridge the gap between program knowledge and user knowledge. This allows the chemist, during a synthetic analysis, to graphically specify a reaction which SECS doesn't know, and continue without interrupting the analysis. The SECS database is also still expanding as a result of contributions from our group and from the SECS Users Group. A META-SECS top-level plan generator has been outlined to reason using synthetic principles and conclude plans which will then be used to Privileged Communication 227 E. A. Feigenbaum SECS - Simulation and Evaluation of Chemical Synthesis Section 9.2.2 guide the existing SECS program in synthetic analysis. The First Order Predicate Calculus is being used to represent the synthetic Strategies and an inference processor is currently in design stages. The explicit representation of synthetic strategies will be an interesting exploration which we feel other synthetic chemists will benefit from, even through manual use of these strategies. Hand simulation of this program is in progress. 2. XENO - A Program to Predict Plausible Metabolites The XENO program was developed to assist metabolism researchers in predicting plausible metabolites of compounds foreign to an organism, and in evaluating the potential biological activity of the resulting metabolites. The knowledge base of XENO has been revised completely and now includes 110 types of metabolic processes. We have specialized on rat and mouse systems to date. The XENO program takes graphical input of a compound to be metabolized and stepwise generates a tree of metabolite structures which might result. The program is operational, but both the program and the data base need improvement for field use. The teletype input and output has been improved by incorporating a modified version of Carhart's teletype plot module from CONGEN so the program can be accessed remotely via teletype or graphics terminal. The second phase of XENO which evaluates potential biological activity is currently being developed. Currently XENO can check each metabolite generated by exact match against a library of compounds and thus if a match is found, pull out the biological activities. Our plans however are to allow extrapolations beyond known compounds and for that we are pursuing several approaches using chemical pattern recognition and chemical similarity. Collaborations with experimental metabolism researchers have begun in order that XENO can make predictions for compounds actively being studied in the laboratory. We hope to get feedback regarding the usefulness of this methodology and to accumulate a list of verified predictions for publication. These collaborators include scientists from NIH, FDA, EPA, ICI Pharmaceutical, Upjohn Co., and UCSF Medical School. This work is sponsored by the National Cancer Institute. D. List of Current Project Publications M.L. Spann, K.C. Chu, W.T. Wipke, and G. Ouchi, "Use of Computerized Methods to Predict Metabolic Pathways and Metabolites," J. of Env. Pathology and Toxicology, 2, 123 (1978); also reprinted in “Hazards from Toxic Chemicals," ed. M.A. Mehiman, R.E. Shapiro, M.F. Cranmer ‘and M.J. Norvell, Pathotox Publishers, Inc., Park Forest South, I11., 1978, pp. 123-121. J.D. Andose, E.J.J. Grabowski, P. Gund, J.B. Rhodes, G.M. Smith, and W.T. Wipke, "Computer-Assisted Synthetic Analysis: The Merck Experience,” in. Computer-Assisted Drug Design, ed Olson and Christoffersen, ACS Symposium Series 112, pp 527-552, 1979. E. A. Feigenbaum 228 Privileged Communication Section 9.2.2 SECS - Simulation and Evaluation of Chemical Synthesis S.A. Godleski, P.v.R. Schleyer, E. Osawa, and W.T. Wipke, "The Systematic Prediction of the Most Stable Neutral Hydrocarbon Isomer," Progress in Physical Organic Chemistry, in press. Manuscripts describing our work on symmetry, similarity, and ALCHEM are currently in the review process. E. Funding Status 1. Resource~Related Research: Biomolecular Synthesis PI: W. Todd Wipke, Associate Professor, UCSC ‘Agency: NIH, Research Resources No: RRO1059-03S1 7/1/80-2/28/81 $ 36,949 TDC 2. Computer-Aided Prediction of Metabolites for Carcinogenicity Studies PI: W. Todd Wipke Agency: NIH, National Cancer Institute No: NO1-CP-75816 1/1/80-12/31/80 $74,394 TDC II. Interactions with SUMEX-AIM Resource A. Medical Collaborations and Program Dissemination via SUMEX. SECS is available in the GUEST area of SUMEX for casual users, and in the SECS DEMO area for serious collaborators who plan to use a significant amount of time and need to save the synthesis tree generated. Much of the access by others has been through the terminal equipment at Santa Cruz because graphic terminals make it so much more convenient for structure input and output. A complete synthesis tree was generated for Prof. William Dauben, UC Berkeley of isocomene which was analyzed in detail by his students. They were impressed by the magnitude of the number of synthetic approaches and that all known syntheses were found by the computer. Similarly an analysis of several insect pheremones was done and sent to Prof. A.C. OehIschlager, Dept of Chemistry, Simon Fraser University, British Columbia, Canada. Other visitors for whom we have done analyses include Dr. M. Onozuka, A. Tomonaga and H. Itoh, Kureha Chemical Co, Tokyo Japan, Dr. Rhyner, Director of research, Ciba-Geigy, Basel. A synthesis of vellerolactone, a substance found to be toxic and teratogenic was generated for Prof. R.E. Carter, Univ. Lund Sweden. A conformational Study of substituted hydroazulenes was performed for Clayton Heathcock, Berkeley (Synthesis of Isoprenoid Antitumor Lactones, NIH CA 12617). The XENO project is working on metabolism of diallylmelamine N-oxide, a hypotensive compound in collaboration with Dr. John M. McCall of Cardiovascular Diseases Research, The Upjohn Co. Dr. Wipke has also used several SUMEX programs such as CONGEN in his course on Computers and Information Processing in Chemistry. Testing and collaboration on the XENO project with researchers at the NCI depend on having access through SUMEX and TYMNET. Privileged Communication 229 E. A. Feigenbaum SECS - Simulation and Evaluation of Chemical Synthesis Section 9.2.2 B. Examples of Sharing, Contacts and Cross-fertilization with other .SUMEX-AIM projects: This year the SECS and XENO project have made use of the teletype plot program which Ray Carhart of the CONGEN project wrote at Stanford. We modified the program to fit the needs of our projects. This was facilitated by being able to transfer the programs within areas on the same computer system at SUMEX. We continue to have intellectual interactions with the DENDRAL and MOLGEN project in areas where we have common interests and have had people from those projects speak at our group seminars. SUMEX also is used for discussions with others in the area of artificial intelligence on the ARPANET. We developed a local print capability through SUMEX with the help of the SUMEX staff which has facilitated our work greatly. C. Critique of Resource Services. We find the SUMEX-AIM network very well human engineered and the staff very friendly and helpful. The SECS project is probably one of the few on the AIM network which must depend exclusively on remote computers, and we have been able to work rather effectively via SUMEX. Basically we have found that SUMEX-AIM provides a productive and scientifically stimulating environment and we are thankful that we are able to access the resource and participate in its activities. SUMEX-AIM gives us at UCSC, a small university, the advantages of a larger group of colleagues, and interaction with people all over the country. We especially thank SUMEX for support of the leased line for our GT40, and for helping develop our remote print capability. SUMEX however has fallen short of our goals and desires: the load average on SUMEX has increased .and reduced my group's efficiency greatly-- the system is too overloaded. We also have not been able to utilize the 4800 baud high speed line we purchased because SUMEX limitations forced running at 2400 baud. We had hoped to be able to write tapes locally with the 4800 baud line, but at 2400 baud it is too slow to be practical. We would like to see some of their local lines slowed down so those remote people doing graphics can run at a higher speed. We have found that when a FORTRAN program is overlayed, the symbol table is lost, making symbolic debugging with DDT impossible, we wish that could be corrected. Lastly our disk space (8000 pages) is too small for our current research projects and staff. D. Collaborations and Medical Use of Programs via Computers other than SUMEX. Arrangements are currently being made to place SECS 2.7 on several computer networks so anyone can access it without having to convert code for their machine. This has proved very useful in the past as a method of getting people to try this new technology. SECS 2.0 has resided on the First Data network since 1974 and has been used extensively in the US and abroad. E. A. Feigenbaum 230 Privileged Communication Section 9.2.2 SECS - Simulation and Evaluation of Chemical Synthesis. III. Research Plans (8/80-7/86) A. Long Range Project Goals and Plans. The SECS project now consists of two major efforts, computer synthesis and metabolism, the latter being a very young project. Our plans for SECS for the next year include adding a high level reasoning module for proposing strategies and goals, and providing control which continues over several steps. This reasoning module also will be able to trace the derivation of goals and thus explain some of its reasoning. We also plan to focus on bringing the transform library up in sophistication to improve the performance and capabilities of SECS. In particular we plan to allow a transform to have access to the precursors generated as well as the product, this will allow much greater control and more natural transform writing, but it requires extensive changes in the SECS control structure to permit this. Currently the similarity module requires a special version of SECS. We plan in the next year to incorporate this module into the standard version of SECS so that the bonds that if broken could lead to identical or similar fragments can be used to create a goal to guide SECS toward such efficient syntheses, even though there may not be a reaction capable of doing that rejoining step. , We will incorporate the Aldrich catalog of available chemicals, both to recognize when a precursor is available and to explore strategies based on available starting materials. The process must be efficient for the library contains 20,000 compounds. We have now a PDP-10, a Univac, and an IBM version of SECS. We hope to compare these and create one version which will run on these and other machines to facilitate sharing of new modules among collaborators. The XENO metabolism project will be expanding the data base to cover more metabolic transforms, including species differences, sequences of transforms, and stereochemical specificities of enzymatic systems. Development of the second phase which assesses the biological activity of the metabolites will continue as will efforts to simulate excretion and incorporation, the endpoints of metabolism. Finally, application of the current program to the molecules actively being investigated by metabolism researchers will occur concurrently to test and verify the work done to date on XENO and provide examples for publication. In the next five years we foresee the SECS and XENO projects reaching a stage of maturity where they will find much application in other research groups. Our research will continue in these areas, but turn to some new programs that approach the problems from different viewpoints and allow us an opportunity to begin fresh taking advantage of what we have learned from the building of SECS and XENO, B. Justification and Requirements for Continued use of SUMEX. The SECS and XENO projects require a large interactive time-sharing capability with high level languages and support programs. I am on the campus computing advisory committee and am the campus representative to the UC Privileged Communication 231 E. A. Feigenbaum SECS - Simulation and Evaluation of Chemical Synthesis Section 9.2.2. Systemwide computing advisory committee and know that the UCSC campus is not likely in the future to be able to provide this kind of resource. Further there does not appear to be in the offing anywhere in the UC system a computer which would be able to offer the capabilities we need. Thus from a practical standpoint, the SECS and XENO projects still need access to SUMEX for survival. Scientifically, interaction with the SUMEX community is’ still extremely important to my research, and will continue to be so because of the direction and orientation of our projects. Collaborations on the metabolism project and the synthesis project need the networking capability of SUMEX-AIM, for we are and will continue to be interacting with synthetic chemists at distant sites and metabolism experts at the National Cancer Institute. Our requirements are for good support of FORTRAN. , Our needs for SUMEX include an expansion of our disk allocation from 8000 pages to 10000 pages for the growth of our programs, databases, and personnel. We are currently tightly constrained spacewise and are hampered in research because of inability to keep needed files. We also would like to have the overlay loader fixed so that an overlaid program can retain its symbol table and permit symbolic use of DDT. This is a serious problem we hope can be fixed by SUMEX staff because without symbols, debugging is very difficult and time-consuming, since we must run SECS and XENO overlaid. C. Needs beyond SUMEX-AIM. We do plan to acquire a virtual memory minicomputer like a VAX or PRIME in the future to offload some of our processing from SUMEX. Such a machine would enable us to do some production and development work locally and would explore the feasibility of those types of machines as hosts for SECS and XENO. A local machine would also free us from the problems we have experienced in the winter when the telephone lines to Stanford get wet and are too noisy to use. Even if we had such a machine we still need to use SUMEX because we plan to continue to develop and maintain the PDP-10 version of SECS and we need SUMEX for its networking capabilities. In the future if we had a mini at UCSC, we would Tighten our load on SUMEX, but currently we see our load increasing as our group grows and as we start new projects yet must maintain existing large programs. We especially need the local capabilities to read and write magnetic tape because we receive and send many tapes between our collaborators. Driving to SUMEX to write a tape is not efficient for our personnel and hinders communication with collaborators via tape. The problem will worsen because the SECS Users Group will be sending UCSC tapes of chemical transforms on a regular basis. D. Recommendations for Community and Resource Development. The AIM Workshops have been excellent in the past and should be continued. We feel the SUMEX resource is heavily utilized, too heavily utilized at times to get any productive work done. SUMEX staff could Tighten the load on the machine by reducing the speed of text terminals at Stanford from 2400 baud and above down to 1200 baud which is plenty fast for humans to read, and E. A. Feigenbaum 232 Privileged Communication Section 9.2.2 SECS - Simulation and Evaluation of Chemical Synthesis giving remote users faster capabilities, say 4800 baud. We feel the community would benefit if remote users such as we had a virtual minicomputer so the toad could be distributed more and not have everything go through Stanford which is highly congested and quite expensive for multiple leased lines. We further feel that it would be worthwhile if discussions regarding the future expansion of SUMEX and the community could include the remote users who depend on SUMEX. SUMEX can not currently handle additional people from the outside community using SECS or XENO for testing. The response time guests and outside collaborators see is not a good reflection on the actual efficiency of the programs. A trivial suggestion but also important is that TV-EDIT be improved to not leave null characters in files which cause problems with compilers both at SUMEX and at other sites when the files are sent to another machine. This suggestion has been made many times by many people but the Situation still exists. Privileged Communication 233 E. A. Feigenbaum Hierarchical Models of Human Cognition Section 9.2.3 9.2.3 Hierarchical Models of Human Cognition Hierarchical Models of Human Cognition (CLIPR Project) Walter Kintsch and Peter G. Polson University of Colorado Boulder, Colorado I. Summary of Research Program " The two CLIPR projects have made substantial progress in their research in this past year. This progress is almost completely due to our access to the SUMEX facility. The prose comprehension group has completed one major project, and is currently interacting with other SUMEX projects with the goal of building a prose comprehension model that reflects state- of-the-art knowledge from psychology and artificial intelligence. The main activity of the planning group during the last year has been the detailed analysis of thinking-out-loud protocols collected from both expert and novice software designers. SUMEX facilities have been used to store, edit, and reformat the raw protocols to facilitate later analysis. Results of successive analyses are then input to SUMEX, and SUMEX facilities are used to collate the various results. Technical Goals The CLIPR project consists of two subprojects. The first, the text comprehension project, is headed by Walter Kintsch and is a continuation of work on understanding of connected discourse that has been underway in Kintsch's laboratory for over seven years. The second, the planning project, is headed by Peter Polson of the University of Colorado and Michael Atwood of Science Applications Incorporated, Denver, and is Studying the processes of planning using software design tasks. The goal of the prose comprehension project is to develop a computer System capable of the meaningful processing of prose. This work has been generally guided by the prose comprehension model discussed by Kintsch and van Dijk (1978), although our programming efforts have identified necessary clarifications and modifications in that model (Miller & Kintsch, 1980a). Our more recent research (Miller & Kintsch, 1980b) has emphasized the importance of knowledge and knowledge-based processes in comprehension, and we are accordingly working with the AGE and UNITS groups at SUMEX toward the development of a knowledge-based, blackboard model of prose comprehension. We hope to be able to merge the substantial artificial intelligence research on these systems with psychological interpretations of prose comprehension, resulting in a computational model that is also psychologically respectable. The primary goal of the planning project is the development of a model of human performance on software design tasks. We intend to begin by modeling protocols of experts on solving a particular problem, eventually E. A. Feigenbaum 234 Privileged Communication Section 9.2.3 Hierarchical Models of Human Cognition extending the model to other levels of experience and problems. We propose a two-pronged attack on the process of developing a model, The first is to develop a deeper understanding of our protocol data, to increase our knowledge of the details of the planning processes and the knowledge structures that experts use in the process of planning. We have developed a method of protocol analysis that essentially involves the transforming of the protocol into a Tow level theoretical description of the processes used to solve the design problem. We have assumed a very simplified version of a blackboard model that is described in Atwood and Jeffries (1980). We currently carry out our analysis by hand, developing a form of this low level model for each protocol. However, much of the activities involved in developing this model are clerical in nature and involve the categorization of segments of a verbal protocol and then the reorganization of the categorized information. Much of this work can be automated, and we propose to develop a program that will facilitate our protocol analysis and the development of the Tow level models that we use to describe the behavior of individual subjects. Our second and much longer term objective is the development of a substantive model in AGE that can simulate the design processes. We feel that the software tools that are being developed at SUMEX -- in particular AGE and the UNITS package -- will dramatically facilitate our ability to develop this substantive model. Furthermore, current theoretical ideas about both the process of design and the representation of knowledge involved in developing a design have been strongly influenced by the MOLGEN project at SUMEX (Stefik, 1980). Medical Relevance and Collaboration The text comprehension project impacts indirectly on medicine, as the medical profession is no stranger to the problems of the information glut. By adding to the research on how computer systems might understand and Summarize texts, and determining ways by which the readability of texts can be improved, medicine can only be helped by research on how people understand prose. Development of a more thorough understanding of the various processes responsible for different types of learning problems in children and the corresponding development of a successful remediation Strategy would also be facilitated by an explicit theory of the normal comprehension process. Note that our goal of a blackboard model is particularly relevant to the understanding of learning difficulties. One important aspect of a blackboard model is the separation of cognitive processes into a set of interacting subprocesses. Once such subprocesses have been identified and constructed, it would be instructive to observe the model's performance when certain of these processes are facilitated or inhibited. Many researchers have shown that there are a variety of cognitive deficits (insufficient short-term memory capacity, poor long-term memory retrieval, and such) that can lead to reading problems. Having a blackboard model in which the power of individual components could be manipulated would be a Significant step in determining the nature of such reading problems. Priviteged Communication 235 E. A. Feigenbaum Hierarchical Models of Human Cognition Section 9.2.3 The planning project is attempting to gain understanding of the cognitive mechanisms involved in design and planning tasks. The knowledge gained in such research should be directly relevant to a better understanding of the processes involved in medical policy making and in the design of complex experiments. We are currently using the task of software design to describe the processes underlying more general planning mechanisms that are also used in a large number of task oriented environments like policy making. Both the text comprehension project and the planning project involve the development of explicit models of complex cognitive processes; cognitive modelling is a stated goal of both SUMEX and research supported by NIMH. The on-going development of the prose comprehension model would not be possible without our collaboration with the AGE and UNITS research groups. We look forward to a continued collaboration, with, we hope, mutually beneficial results. Several other psychologists have either used or shown an interest in using an early version of the prose comprehension model; these people include Alan Lesgold of SUMEX's SCP project. Needless to say, all of this interaction has been greatly facilitated by the local and network-wide communication systems supported by SUMEX. There has been considerable communication between members of the prose comprehension and AGE/UNITS groups as program bugs have been discovered and corrected; the presence of a mail system has made this process infinitely easier than if telephone or surface mail messages were required. Progress Summary The prose comprehension project has completed an early version of a comprehension model that has now been used by several different researchers (Miller & Kintsch, 1980a). This model has been applied to twenty different texts, and has yielded quite reasonable predictions of recall and readability. We are currently expanding on the premises of this model toward a system that can make use of world knowledge in its analyses, The planning group has completed the detailed analysis of several long thinking-out-loud protocols collected from both expert and novice software designers. These analyses involved the development of a lower level model for each of the protocols. See Atwood and Jeffries (1980) for details and examples. We are about to start development of a program toa partially automate this modelling process. List of Relevant Publications Atwood, M. E., & Jeffries, R. Studies in plan construction I: Analysis of an extended protocol. Technical Report SAI-80-028-DEN, Science Applications, Incorporated, Denver, Co. March, 1980. Polson, P. G., Jeffries, R., Turner, A., & Atwood, M. E. The process of designing software. To appear in J. R. Anderson (Ed.), Learning and Cognition. Hillsdale, N.J.: Erlbaum. E. A. Feigenbaum 236 . Privileged Communication Section 9.2.3 Hierarchical Models of Human Cognition Atwood, M. E., Polson, P. G., Jeffries, R., and Ramsey, H. R. Planning as a process of synthesis. Technical Report SAI-78-144-DEN, Science Applications, Incorporated, Denver, Co. December, 1978. Kintsch, W. On modelling comprehension. Invited address at the American Educational Research Association convention. San Francisco, April 10, 1979. Kintsch, W. and van Dijk, T. A. Toward a model of text comprehension and production. Psychological Review, 1978, 85, 363-394. Miller, J. R., & Kintsch, W. Readability and recall of short prose passages: A theoretical analysis. Journal of Experimental Psychology: Human Learning and Memory, 1980, in press. Miller, J. R., & Kintsch, W. Readability and recall of short prose passages. Paper presented at the American Educational Research Association meetings, April, 1980. Funding Support Status 1. Readability and Comprehension. Walter Kintsch, Professor, University of Colorado National Institute of Education NIE-G-78-0172 9/1/78 - 8/31/81: $96,627 9/1/79 - 8/31/80: $46,537 2. Text Comprehension and Memory Walter Kintsch, Professor, University of Colorado National Institute of Mental Health 5 Rol MH15872-9-13 6/1/76 - 5/31/81: $159,060 6/1/79 - 5/31/80: $32,880 3. Comprehension and Analysis of Information in Text Walter Kintsch, Professor, University of Colorado, and Lyle E. Bourne, Jr., Professor, University of Colorado Office of Naval Research, Personnel and Training Programs ONR N00014-78-C-0433 6/1/78 - 5/31/80: $68,315 6/1/80 - 5/31/81: $60,000 4, Procedural Net Theories of Human Planning and Problem Solving Michael Atwood, Research Psychologist, Science Applications, Incorporated; Denver, Colorado Office of Naval Research, Personnel and Training Programs ONR N0014-78-C-0165 1/25/78 ~ 12/31/80: $230,000 1/1/80 - 12/31/80: $85,000 Privileged Communication 237 E. A. Feigenbaum Hierarchical Models of Human Cognition Section 9.2.3 If. Interactions with the SUMEX-AIM Resource Sharing and Interactions with other SUMEX-AIM Projects Our primary interaction with the SUMEX community has been the work of the prose comprehension group with the AGE and UNITS projects at SUMEX. Feigenbaum and Nii have visited Colorado, and one of us (Miller) recently attended the AGE workshop at SUMEX. Both of these meetings have been very valuable in increasing our understanding of how our problems might best be solved by the various systems available at SUMEX. We also hope that our experiments with the AGE and UNITS packages have been helpful to the development of those projects. We should also mention theoretical and experimental insights that we have received from Alan Lesgold and other members of the SUMEX SCP project. It is likely that the initial comprehension model (Miller & Kintsch, 1980a) will be used by Dr. Lesgold and other researchers at the University of Pittsburgh, as well as researchers at Carnegie-Mellon University and the University of Manitoba. Critique of Resource Management The SUMEX-AIM resource is clearly suitable for the current and future needs of our project. We have found the staff of SUMEX to be cooperative and effective in dealing with special requirements and responding to our questions. The facilities for communication on the ARPANET have also facilitated collaborative work with investigators throughout the country. III. Research Plans (8/79 - 7/81) Long Range Projects Goals and Plans The primary long-term goal of the prose comprehension group is the development of a blackboard-based model of prose comprehension. Correspondingly, we anticipate continued use of the AGE and UNITS packages. These packages allow us to model the knowledge structures possessed by people and the inferential processes that operate upon those structures, and are essential to our work, The primary goal of the planning project is the development of a model, or a series of models, of human performance on the software design task. We intend to begin by modeling the protocols of experts on a particular task, eventually extending the model to other levels of experience and other tasks. To do this we will have to become more Familiar with AGE and work on articulating our theory in a way that is compatible with the AGE framework. This will involve two parallel lines of effort. One is a deeper analysis of our protocol data, to increase our knowledge of the detailed planning processes and knowledge structures experts are using to solve these problems. The second is the development of a model in AGE that can simulate these processes. We have to date been using SUMEX only for the latter activity, but we are beginning discover that both objectives are so intertwined that it is counter-productive for us to be using separate computer systems. We have transferred much of our protocol analyses activities to SUMEX, making it easier for us to share this very rich data source with other investigators. E. A. Feigenbaum 238 Privileged Communication Section 9.2.3 Hierarchical Models of Human Cognition Justification and Requirements for Continued SUMEX Use The research of the prose comprehension project is clearly tied to continued access to the AGE and UNITS packages, which are simply not available elsewhere, We hope that our continued use of these systems will be offset by the input we have been and will continue to provide to those projects: our relationship has been symbiotic, and we look forward to its continuation. Needs and Plans for Other Computational Resources We currently use three other computing systems, two of which are local to the University of Colorado. One is the Department of Psychology's CLIPR system, which is a Xerox Sigma 3 used primarily for the real-time running of experiments to be modeled on SUMEX. The second is the University of Colorado's CDC 6400, which is used for various types of statistical analysis. Thirdly, the planning group has been using a PRIME computer located at Science Applications, Incorporated for the storage and analysis of protocols. - CLIPR is about to replace the Sigma 3 with a VAX 11/780. When the ARPA-sponsored Vax/Interlisp project is completed, we would be most interested in experimenting with becoming a remote AGE/UNITS site. It would seem that this sort of development is the ultimate goal of the package projects, and this type of interaction, once it becomes feasible, would be a logical extension of our association with the SUMEX facility. Recommendations for Future Community and Resource Development Our primary recommendation for future development within SUMEX involves (a) the continued support of INTERLISP, which is needed for AGE and for other work we have underway on SUMEX and (b) the continued development of the AGE and UNITS projects. In particular, we would like to see an extension of AGE to include a wider variety of control structures so that our psychological models would not be confined to one particular view of knowledge-based processing. Given our imminent acquisition of a VAX, we would particularly Support the ongoing and continued development of INTERLISP for the VAX, so that local use of AGE and UNITS would be possible. Since we, as well as other psychologists, need the real-time capability of VAX/VMS to run on- line experiments, we hope that the INTERLISP system to be developed will be compatible with VMS. Note that this need for real-time work coincides with real-world applications of SUMEX programs, in which a VAX might be devoted to both real-time patient monitoring and diagnostic systems such as PUFF or MYCIN. Privileged Communication 239 E. A. Feigenbaum HMF - Higher Mental Functions Section 9.2.4 9.2.4 HMF - Higher Mental Functions Higher Mental Functions Project Kenneth Mark Colby, M.D. Professor of Psychiatry and Computer Science Neuropsychiatric Institute University of California at Los Angeles I. Summary of Research Program A. Project rationale The rationale of this project is to contribute new knowledge and instruments to the fields of psychiatry, neurology, and communication disorders using the concepts and methods of artificial intelligence. The project is involved in studies of paranoid conditions, psychiatric taxonomy, intelligent speech prostheses, ideographics for language generation, and computer enhancement of patient outcomes in large mental hospitals. B.. Medical relevance and collaboration. As can be seen from the above description, the project has clear medical relevance. The project collaborates with psychiatrists, neurologists, speech pathologists and biomedical engineers. Besides working at the UCLA Neuropsychiatric Institute, the project collaborates with the Northridge Hospital Foundation, Northridge, California. C. Highlights of research progress. In collaboration with three psychiatrists and four psychologists we are working out a new taxonomy for the "neuroses", a category which is notoriously unreliable in the psychiatric classification scheme. In this pilot study we are collecting data on 50 patients and 70 controls. One segment of data is provided by the subjects’ self-accounts which are analyzed by a large program run on the SUMEX facility. This program finds the key ideas in the subject's account and assigns him a profile. The profiles will be clustered into groups and the groups compared to those formed on the basis of the other data-collections in the study. During the past year, the project has developed intelligent speech prostheses (ISPs) which (a) utilize a lexical-semantic word-finding algorithm for anomic aphasias and (b) utilize ocular control for the generation of synthesized Speech. These devices serve as aids to nonvocal patients handicapped by Strokes, tumors, cerebral palsy, and tracheostomies. The word-finding algorithm is dynamically re-organized by the user's selection of words. It is currently being tested on a 54-year-old man with an almost complete anomia due to a stroke in the left hemisphere. The algorithm needs a larger memory to accommodate at least 5,000 English words. The large dictionary on the SUMEX facility is of great help in constructing the lexical-semantic memory, E. A. Feigenbaum 240 Privileged Communication Section 9.2.4 HMF - Higher Mental Functions We have just begun to test the use of ocular control of an ISP. The ‘patient wears specially designed spectacles which can detect where the eye is directed on a small TV screen. Thus the patient spells out words by looking at letters on the screen. Signals from the spectacles are sent to the ISP which generates the utterance of the words thus spelled. Although we have ceased to work on the paranoid PARRY program, due to Tack of funding, it is available for demonstration and study by those interested in modelling psychiatric syndromes. We are in the planning stages of developing a computer ideographic writing system for language generation by nonspeaking patients who cannot spell. If they can learn ideographic symbols which stand for certain concepts and construct the symbols on a graphics terminal by pressing keys, a translating program will convert the symbols into English words which in turn will be spoken by an ISP. We are also beginning to design a type of computerized “recreational-educative" therapy for patients in large mental hospitals with such a shortage of professional manpower that the patients' treatment is limited mainly to custodial care. D. List of Relevant Publications. Colby, K. M., Christinaz, D., Graham, S. 1978. A computer-driven personal, portable, and intelligent speech prosthesis. Computers and Biomedical Research, 11: 337-343, Colby, K. M. 1979. Computer simulation and artificial intelligence in psychiatry. In Methods of Biobehavioral Research E. A. Serafetinides, (ed.), New York: Grune and Stratton. Colby, K. M. 1980. Computer psychotherapists. In Technology in Mental Health Care Delivery Systems, J. B. Sidowski, J. H. Johnson, T. A. Williams (Eds.). Norwood, New Jersey: Ablex Publishing Corporation. Heiser, J. F., Colby, K. M., Faught, W. S., Parkison, R. C. 1980. Can psychiatrists distinguish a computer simulation of paranoia from the real thing? The limitations of Turing-like tests as measures of the adequacy of simulations. Journal of Psychiatric Research, Vol. 15, No. 3 Parkison, R. C. 1980. An effective computational approach to the comprehension of purposeful English dialogue. Stanford University, Ph.D. dissertation, (forthcoming). Colby, K. M., Christinaz, D., Graham, S., Parkison, R. C. A word- finding algorithm using a dynamic lexical- semantic memory for patients with anomia. (In press) Privileged Communication 241 E. A. Feigenbaum HMF - Higher Mental Functions Section 9.2.4 E. Funding Support. 1. Titles of grants a) Intelligent Speech Prosthesis b) Ocular control of Intelligent Speech Prosthesis. 2. Principal Investigator Kenneth Mark Colby, M.D. Professor of Psychiatry and Computer Science Neuropsychiatric Institute University of California at Los Angeles 3. Funding agencies a) Intelligent Systems Program, Division of Mathematics and Computer Science, National Science Foundation. b}) Science and Technology to Aid the Handicapped Program, National Science Foundation. 4. Grant numbers a) NSF-MCS 78-09900 b) NSF PFR - 17358 5. Total award period a) 6/1/78 - 11/30/80 $135,260. b) 10/1/79 - 3/31/81 $318,368. 6. Current period (see 5. above) II. Interactions with the SUMEX-AIM Resource A. The project communicates and collaborates with the Communication Enhancement Project at Michigan State University, John Eulenberg, Principal Investigator. B. The project communicates with the SUMEX project at the University of Texas at Galveston, John F. Heiser, M.D., Principal Investigator, who experiments with and demonstrates the PARRY program, C. Critique of resource management. The SUMEX staff is still excellent and responsive to our needs. Our only problems are with the telephone company portion of our communications link with SUMEX. E. A. Feigenbaum 242 Privileged Communication Section 9.2.4 HMF - Higher Mental Functions TII. Research Plans (8/80 - 7/86) A. Project goals and plans 1. Near-term We plan to continue to work on the problems described above. Further clinical experience is necessary in testing and developing the word-finding algorithm and the ocularly-controlled ISP. These efforts should be completed in about two years. 2. Long-range It will take years to solve the problems of psychiatric taxonomy, computer ideographic writing systems, and computer enhancement of hospitalized patient outcome. Our work in these areas will depend upon obtaining the requisite funding. B. Justification for continued SUMEX use, All the problems we work on involve natural language in some form or other. We analyze natural language input and generate natural language output. These efforts require large dictionaries and large LISP programs which run at SUMEX. No comparable facilities are available at UCLA. Hence we are heavily dependent upon SUMEX for the continuation of this research. C. Needs and plans for other computer resources. An ISP consists of a microprocessor interfaced with a speech Synthesizre. We have constructed 3 ISPs, building two of the microprocessors ourselves. We expect to purchase another microprocessor and a graphics terminal. D. Recommendations for future development. The SUMEX system is often heavily loaded during daytime hours. The batch facility permits us to run some large production jobs overnight unattended, but the daytime loading is often so great that it discourages even small interactive jobs, such as text editing. It would be very helpful to have more computing power during the daytime, if funding is available. Privileged Communication 243 E. A. Feigenbaum INTERNIST Project Section 9.2.5 9.2.5 INTERNIST Project INTERNIST Project J. D. Myers, M.D. and H. Pople, Ph.D. University of Pittsburgh Pittsburgh, Pennsylvania I. Summary of Research Program A. Medical Rationale The principal objective of this project is the development of a high- level computer diagnostic program in the broad field of internal medicine as an aid in the solution of complex and complicated diagnostic problems. To be effective, the program must be capable of multiple diagnoses (related or independent) in.a given patient. A major achievement of this research undertaking has been the design of a program called INTERNIST, along with an extensive medical data base now encompassing almost 500 diseases and more than 3,000 manifestations of disease. Although this consultative program is designed primarily to aid skilled internists in complicated medical problems, the program may have spin-off as a diagnostic and triage aid to physicians assistants, rural heaith clinics, military medicine and space travel. Development of the system which we now call INTERNIST-I was begun about eight years ago. The system was successfully demonstrated for the first time in 1974 and has been used since that time in the analysis of hundreds of clinical problems. ; ; A major point of departure for the design of the original INTERNIST program was the realization that the task of clinical decision making in internal medicine is an ill-structured problem. In other domains, the task of diagnosis is often viewed as one of pattern recognition or discrimination: there is available a predefined collection of possible classifications (characterizing disease entities or clinical states), one and only one of which is considered possible in the case being studied. A diagnostic problem solver dealing with such a well structured domain has the fairly straightforward task of selecting that one of this fixed set of alternatives which best fits the facts of the case. Many statistical, pattern recognition, and algorithmic techniques have been employed successfully in performing computer aided diagnosis in these well Structured clinical problem domains. Primarily because complex cases often involve two or more concurrently active disease processes, no set of exhaustive and mutually exclusive classifications can be developed to structure the diagnostic problem in internal medicine. In principle, it might be argued that this E. A. Feigenbaum 244 Privileged Communication Section 9,2.5 INTERNIST Project more complex problem domain could be reduced to a simple discrimination - task if, in addition to the individual disease entities, one includes appropriate multiple disease complexes in the set of allowable patient descriptors. However, since our experience indicates that as many as ten or twelve individual descriptors may apply in a complex clinical problem, and considering that there are a thousand or more individual descriptors of interest in Internal Medicine, the prospect of recording explicitly ail possible multiple disease classifications is clearly infeasible. Our thesis is that, in the absence of explicit structure derived from the problem domain, the successful clinician engages in heuristic imposition of structure so that effective problem solving strategies might be selected and employed for decision making relative to the postulated problem structure. In INTERNIST-I, this concept of heuristic imposition of structure is expressed primarily by means of a novel "problem-formation" heuristic. In effect, the program composes dynamically, on the basis of evidence provided, what in context constitutes a presumed exhaustive and mutually exclusive subset of disease entities that can explain, more or less equally well, some significant subset of the observed findings in a clinical case. This heuristic problem structuring procedure is invoked repeatedly during the course of a diagnostic consultation in order to deal sequentially with the component parts of a complex clinical problem. Because INTERNIST is intended to serve a consulting role in medical diagnosis, it has been challenged with a wide variety of difficult clinical problems: cases published in the medical journals, coc's, and other interesting and unusual problems arising in the local teaching hospitals. In the great majority of these test cases, the problem-formation strategy of INTERNIST has proved to be effective in sorting out the pieces of the puzzle and coming to a correct diagnosis, involving in some cases as many as a dozen disease entities. On the basis of this extensive test of the initial INTERNIST system, it has become clear that many aspects of the system's performance could be significantly enhanced if it would be possible to deal with the various component problems and their interrelationships simultaneously. This has led to the design of INTERNIST-II, a system embodying strategies of concurrent problem-formation which we expect will yield more rapid convergence to the correct diagnosis in many cases, and in at least some cases provide more acceptable diagnostic behavior. B. Medical relevance and collaboration The program inherently has direct and substantial medical relevance. The institution of collaborative studies with other institutions has been deferred pending completion of the programs and knowledge base enhancements required for INTERNIST-II, Privileged Communication 245 E. A. Feigenbaum INTERNIST Project . Section 9.2.5. C. Highlights of research progress Accomplishments this past year During the past year, the R & D activities of the INTERNIST project have concentrated on three major problem areas associated with the original implementation of INTERNIST. These areas are: a) restructuring of the underlying diagnostic logic of INTERNIST to conform more closely to the expectations of clinician users of the System. The primary goal in developing a new model of diagnostic reasoning is to achieve a concurrent problem formation capability in order that improved scoring methods and attention to the principle of parsimony might be exploited in focusing the attention of INTERNIST on regions of the problem space having the greatest potential for yielding a solution. Moreover, the new approach has the potential for improved modes of interaction with the user, as it can reveal at any point in its analysis the multiple partial characterizations that have been postulated, and expose the space of alternative complex descriptions that can be generated by combining these partial characterizations. The potential for providing justification and explanation of the system's behavior is thereby greatly enhanced. b) development of a friendlier user interface, enabling use of the system by clinicians unfamiliar with the specifics of the INTERNIST vocabulary. One of the barriers to successful implementation of the original INTERNIST system in a ward setting is the language of discourse used in that system for specifying the positive and negative findings in a clinical case. The number of possible findings that might be entered now numbers more than three thousand; thus some means for convenient browsing among these possible entries, and some convenient means for communicating the selected items to INTERNIST had to be found. We have developed for this purpose a menu-selection front end system, that comprises a network of approximately 1000 frames designed to permit selection of pertinent facts that might be revealed by any of a host of information acquisition procedures. Convenient escape mechanisms have been provided to permit the user to alternate between the interactive data entry and analytical components of the system. c) incorporation of additional disease profiles and related medical information in the INTERNIST knowledge base, to approach the critical mass required for effective field tests of the system. Research in Progress © There are five major components to the continuation of this research project: 1) The completion, continued updating, refinement and testing of the extensive medical knowledge base required for the operation of INTERNIST. E. A. Feigenbaum 246 Privileged Communication Section 9.2.5 INTERNIST Project 2) The completion and implementation of the improved diagnostic consulting program, which has been designed to overcome certain performance problems identified during the past four years' experience with the original INTERNIST program. 3) Institution of field trials of INTERNIST on the clinical services in internal medicine at the Health Center of the University of Pittsburgh. 4) Expansion of the clinical field trials to other university health centers which have expressed interest in working with the system. 5) Adaptation of the diagnostic program and data base of INTERNIST to subserve educational purposes and the evaluation of clinical performance and competence. D. List of relevant publications 1. Pople, H.E. "The Formation of Composite Hypotheses in Diagnostic Problem Solving: An Exercise in Synthetic Reasoning", Proceedings of the Fifth International Joint Conference on Artificial Intelligence, Boston, August 1977. 2. Pople, H.E. "On the Knowledge Acquisition Process in Applied A.1I. Systems", Report of Panel on Applications of A.I., Proceedings of Fifth International Joint Conference on Artificial Intelligence, 1977. 3. Pople, H.E., Myers, J. D. & Miller, R.A. “The DIALOG Model of Diagnostic Logic and its Use in Internal Medicine, Proceedings of the Fourth International Joint Conference on Artificial Intelligence, Tbilisi, USSR, September 1975. 4. Pople, H.E. "Artificial Intelligence Approaches to Computer-Based Medical Consultation, Proceeding IEEE Intercon, New York, 1975. E. Funding support 1. Title of grant. Clinical Decision Systems Research Resource. 2. Harry E. Pople, Jr., Ph.D. - Associate Professor of Business Jack D. Myers, M.D. University Professor (Medicine) University of Pittsburgh 3. Division of Research Resources National Institutes of Health 4. 5 R24 RRO1101-03 Privileged Communication 247 E. A. Feigenbaum INTERNIST Project Section 9.2.5 5. 07/01/77-06/30/78 $160,414 07/01/78-06/30/79 $178,414 6. 07/01/79-06/30/80 $200,414 II. Interactions with the SUMEX-AIM Resource A, B. Collaborations and Medical Use of Program Via SUMEX INTERNIST remains in a stage of research and development. As noted above, we are continuing to develop better computer programs to operate the diagnostic system, and the knowledge base cannot be used very effectively for collaborative purposes until it has reached a critical stage of completion. These factors have stifled collaboration via SUMEX up to this point and will continue to do so for the next year or two. In the meanwhile, through the SUMEX community there continues to be an exchange of information and states of progress. Such interactions particularly take place at the annual AIM Workshop. C. Critique of Resource Management SUMEX has been an excellent resource for the development of INTERNIST. Our large program is handled efficiently, effectively and accurately. The staff at SUMEX have been uniformly supportive, cooperative, and innovative in connection with our project's needs. III. Research Plans (8/80-7/86) A. Project Goals and Plans We expect that the conversion of INTERNIST knowledge structures to the form required by INTERNIST-II will be reasonably complete by the next fiscal year (June 30, 1981). Shortly thereafter, provided adequate hardware resources are available, we intend to commence formal field trials of INTERNIST at the Presbyterian-University Hospital of Pittsburgh. This local phase of the clinical evaluation will continue for approximately one year, Beginning in July 1982, we intend to extend the clinical trials to collaborating institutions, with the addition of one additional user group approximately every six months through June 1984. E. A. Feigenbaum 248 Privileged Communication Section 9.2.5 | INTERNIST Project B. Justification and Requirements for Continued SUMEX Use In order to provide the level of computer services required by the expanded level of R & D activity in the near term, and to support the schedule of field trial studies envisioned during the current five year planning horizon, we have requested NIH support for a dedicated INTERNIST machine to be acquired during the next fiscal year. If this hardware support becomes available, we would not expect to make additional demands on SUMEX-AIM for computing services. However, we would continue to look to SUMEX for software support and for the communications network that so effectively bridges the far-flung AIM community. Until such dedicated resources are in place, we would expect to make use of the SUMEX-AIM facilities at a moderately increased level of utilization. Privileged Communication 249 E. A. Feigenbaum PUFF/VM Project Section 9.2.6 9.2.6 PUFF/VM Project PUFF/VM: Biomedical Knowledge Engineering in Clinical Medicine John J. Osborn, M.D. The Institutes of Medical Sciences (San Francisco) Pacific Medical Center and Edward A. Feigenbaum, Ph.D. Computer Science Department Stanford University The immediate goal of this project is the development of knowledge- based programs to interpret physiological measurements made in clinical medicine. The interpretations are intended to be used to aid in diagnostic decision making and in therapeutic actions. The programs will operate within medical domains which have well developed measurement technologies and reasonably well understood procedures for interpretation of measured resuits. The programs are: (1) PUFF: the interpretation of standard pulmonary function laboratory data which include measured flows, lung volumes, pulmonary diffusion capacity and pulmonary mechanics, and (2) VM: management of respiratory insufficiency in the intensive care unit. The second, but equally important, goal of this project is the dissemination of Artificial Intelligence techniques and methodologies to medical communities that are involved in computer aided medical diagnosis and interpretation of patient data. Funding support: PUFF/VM is supported by NIH grant GM24669 for $164,000 from 1 September 1978 - 30 August 1981. Some indirect costs are included in this total. A proposal for supplemental funding, submitted 1 February 1979, is pending. I. Summary Of Research Program PUFF A. Technical Goals The task of PUFF program is to interpret standard measures of pulmonary function. It is intended that PUFF produce a report for the | patient record, explaining the clinical significance of measured test results. PUFF also must provide a diagnosis of the presence and severity E. A. Feigenbaum 250 Privileged Communication