SUMEX STANFORD UNIVERSITY MEDICAL EXPERIMENTAL COMPUTER RESOURCE COMPETING RENEWAL APPLICATION RR - 00785 BOOK II COLLABORATIVE PROJECTS AND APPEND IXES Submitted to BIOTECHNOLOGY RESOURCES PROGRAM NATIONAL INSTITUTES OF HEALTH June 1, 1977 DEPARTMENT OF GENETICS STANFORD UNIVERSITY SCHOOL OF MEDICINE Joshua Lederberg, Principal Investigator wer AFT rrr Fora » Approved | SECTION 1 . O.N.B. 68-R0249 DEPARTMENT OF LEAVE BLANK HEALTH, EDUCATION, AND WELFARE TYPE !PROGRAM NUMBER PUBLIC HEALTH SERVICE REVIEW GROUP. |FORMERLY GRANT APPLICATION COUNCIL (Month, Year} DATE RECEIVED TO BE COMPLETED BY PRINCIPAL INVESTIGATOR (items 1 through 7 and 158A} 1, TITLE OF PROPOSAL (Do not exceed 53 typewriter spaces) S U Medical EXperimental Computer Resource (SUMEX) __2. PRINCIPAL INVESTIGATOA 2A. NAME (Last, First, Initial) LEDERBERG, Joshua 2B. TITLE OF POSITION Professor and Chairman 13. OATES OF ENTIRE PROPOSED PROJECT PERIOD (This application, FROM THROUGH 8/1/78 7/31/83 4, TOTAL DIRECT COSTS RE- |5. OFRECT COSTS REQUESTED QUESTED FOR PERIOD IN FOR FIAST 12-MONTH PERIOC ITEM3 $ 5,155,655 $ 744,300 2G. MAILING ADDRESS (atreer City, State, Zip Code) Department of Genetics Stanford University Medical Center Stanford, California 94305 3. DEGREE SOC‘ Ph.D. 2F.TELE- [Ares LEPHONE N bara | 415 TON 497-5801 6, PEAFORMANCE SITE(S) (See Instructions) ‘Stanford University 2G. DEPARTMENT, SERVICE, LABORATORY OR EQUIVALENT — (See Instructions) Department of Genetics ZH. MAJOR SUBDIVISION (See Instructions} School of Medicine 7. Research Involving Human Subjects (See Trstructions} A. NO 8.([) VES Approved: C.C) YES — Pending Review Oates &. inventions (Henewal Appiicents Only - Ses Instructions) A.CKINO 8.((] YES — Not previously reported C.CLIVES — Previously reportea TO BE COMPLETED BY RESPONSIBLE ADMINISTRATIVE AUTHORITY (items 8 through 13 and 158) 9. APPLICANT ORGANIZATIONIS) (See fnstructions) Stanford University Stanford, California 94305 IRS No. 94-1156365 Congressional District No. 12 Tl. TYPE OF ORGANIZATION (Check applicable item] Coreoera. Clstate CIULOcCAL ROTHER (Specify) Private Non-Profit University TZ. NAME, TITLE, AODRESS, AND TELEPHONE NUMBER OF OFFICIAL IN BUSINESS OFFICE WHO SHOULD ALSO BE NOTIFIEO !F AN AWARD IS MADE K. D. Creighton Associate Vice President - Controller Stanford University - Stanford, California 94305 10. NAME, TITLE, AND TELEPHONE NUMBER OF OFFICIALIS) SIGNING FOR APPLICANT ORGANIZATION(S) D'Ann B. Downey Sponsored Projects. Officer _Sponsored Projects Office Telzphone Numoer {+} (415) 497-2883 eet Number (415) 497—2251_ MPONEN RECEIVE CR FOR INSTITUTIONAL GRANT PURPOSES (See Instructions) O01 School of Medicine 14. ENTITY NUMBER (Formerly PHS Account Number) IRS No. 94-1156365 15, CERTIFICATION AND ACCEPTANCE. We, the undersigned, certify that the statements herein ore true end complete to the best of our knowledge and accept, as to any grant swarded, the obligation ta comply with Public Health Service terms and conditions in effect at the time of the sward, SIGNATURES Be STON ATUAEOR BESSON NAMED IN ITEM 2A DATE (Signatures required on original copy only. ATUREIS OF one) £0 iN ITEM 2.0 DATE iso ink, “Per® signatures — 2; not occeptable) ae S, lz f77 wv win 398 (FORMERLY PHS 398) Rev. 1/73 Sea The undersigned agrees to accept responsibility for the scientific and technical conduct of the project and for the provision of required progress reports if a grant is awarded as the result of this application. 5/24/77 OQ thy, Date e// Principal Investigator / Table of Contents BOOK II Section »5. BIOGRAPHICAL SKETCHES . ee . 2 8 ee 6. COLLABORATIVE PROJECT PROGRESS AND OBJECTIVES . 6 6.1 STANFORD PROJECTS . . . . ul. - 2 e 6.1.1 DENDRAL PROJECT . . . wwe 6.1.2 HYDROID PROJECT . 2... eh lk kk 6.1.3 MOLGEN PROJECT . . . 2. 2 2 2 ew 6.1.4 MYCIN PROJECT 8 ee 6.1.5 PROTEIN STRUCTURE PROJECT 8 8 8 6.2 NATIONAL AIM PROJECTS . . . . . 6.2.1 ACQUISITION OF COGNITIVE PROCEDURES (ACT) 6.2.2 CHEMICAL SYNTHESIS PROJECT (SECS) . . . 6.2.3 HIGHER MENTAL FUNCTIONS PROJECT . 8 6.2.4 INTERNIST PROJECT . . 2... 6.2.5 MEDICAL INFORMATION SYSTEMS LABORATORY an . thw . oO RUTGERS COMPUTERS IN BIOMEDICINE . . . 6.3 PILOT STANFORD PROJECTS 6.3.1 GENETICS APPLICATIONS PROJECT . . . . 6.3.2 BAYLOR-METHODIST CEREBROVASCULAR PROJECT 6.3.3 COMPUTER AWALYSIS OF CORONARY ARTERIOGRAMS 6.3.4 QUANTUM CHEMICAL INVESTIGATIONS - Privileged Communication i Page * 6» « 1 - . 41 - AN - .« 42 - « 76 - .« 81 - . 84 tos 108 » « 112 113 - 118 » « 128 - + 132 - «+ 138 - » 4h . . 158 + 159 - + 161 © + 165 + «+ 169 - Lederberg TABLE OF CONTENTS BOOK II (continued) 6.4 PILOT AIM PROJECTS . . . 2... ee 6.4.1 COMMUNICATION ENHANCEMENT PROJECT . 6.4.2 AI IN PSYCHOPHARMACOLOGY . . . . 6.4.3 ORGAN CULTURE PROJECT. . .. . 6.4.4 NEUROPROSTHESES PROJECT . . . 6.4.5 MATHEMATICAL MODELING OF PHYSIOLOGICAL 6.4.6 PUFF/VM PROJECT . . 2. 2. we Appendix I OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH Appendix II AI HANDBOOK OUTLINE 8 8 ee el Appendix III SUMMARY OF MAINSAIL LANGUAGE FEATURES . . Appendix IV MICROPROGRAMMED MAINSAIL PLANS . . .. . Appendix V AIM MANAGEMENT COMMITTEE MEMBERSHIP Appendix VI USER INFORMATION - GENERAL BROCHURE Appendix VII GUIDELINES FOR PROSPECTIVE USERS Privileged Communication ii . SYSTEMS . * os 6 «6 171 - - 172 - « 179 . - « 189 7 «© « 194 . 194 » «6 «+ 197 » « 202 © 6+ 2 225 * 6 « 231 © 6 6 235 - 6 «) 239 - 6 2 243 ~ . 245 J. Lederberg BIOGRAPHICAL SKETCHES II. RESEARCH PLAN - BOOK II This is an application for renewal of a grant supporting the Stanford University Medical EXperimental computer (SUMEX) research resource for applications of Artificial Intelligence in Medicine (AIM). The research plan has been divided into several logical parts: 1) Book I - Resource research objectives and rationale, progress report, and detailed research plans. 2) Book II ~ Biographical sketches, collaborating project reports and plans, and supporting appendixes. 3) Budget - First year budget detail, five-year budget summary, and budget explanation and justification. 5 BIOGRAPHICAL SKETCHES The following are biographical sketches for all professional personnel contributing to the SUMEX-AIM resource project. These do not include sketches for individual collaborating project investigators. Privileged Communication 1 J. Lederberg SECTION Il ~ PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH (Giva the following information for all professional personnal listed on pega 3, begianirg with the Principal Investiontor. Use continuation pagas and fallow ths same genarst format for sach person} TAME TITLE ~~ [BIATHDATE (Ma, Day, Yn) Professor and Chairman FEIGE : E NBAUM, Edward A Computer Science Department January 20, 1936 PLACE OF BIRTH [City, Siete, Country) . PRESENT NATIONALITY {/f non-U.S. citizen, SEX indicate kind of visa and eapiration dats} Weehawken, New Jersey, U.S.A. U.S. citizen [3 Mata (1 Femata EDUCATION (3agin with baccalauraata training und includs postdcctoral} eeece EAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERRED FIELD Carnegie Institute of Technology, B.S. 1956 Electrical Engineering Pittsburgh, Pennsylvania ; Carnegie Institute of Technology, Ph.D. “1959 Industrial . Pittsburgh, Pennsylvania Administration HONORS MAJOR RESEARCH INTEREST ROLE IN PROPOSED PROJECT Artificial Intelligence Co-Investigator RESEARCH SUPPOAT (See instructions) (See continuation page.) RESEARCH AND/OR PAOFESSIONAL EXPERIENCE (Sarrtirg with present position, List trining and axpariance relevant to arsa of project, List ail or most representative publications, Do not exceed 3 pages for cach individual.) 1976 - present Professor (by Courtesy) Department of Psychology, Stanford University 1976 - present Chairman, Department of Computer Science, Stanford University 1969 - present Professor of Computer Science, Stanford University 1965 - 1968 Associate Professor of Computer Science, Stanford University 1965 - 1968 Director, Stanford Computation Center, Stanford University 1964 - 1965 Associate Professor, School of Business Administration, University of California, Berkeley 1960 - 1963 Assistant Professor, School of Business Administration, University of California, Berkeley 1961 - 1964 Research Appointment, Center for Human Learning, University of California, Berkeley 1960 - 1964 Research Appointment, Center for Research in Management Science, University of California, Berkeley 1965 - present Editor, Computer Science Series, McGraw-Hill Book Company, New York 1968 - 1972 Member, Computer and Biomathematical Sciences Study Section, NIH Professional Societies: American Psychological Association, American Association for the Advancement of Science, Association for Computing Machinery (member, National Council of ACM, 1966-68) Consultantships: Information. Sciences Institute, University of Southern California; The RAND Corporation; System Development Corporation (knowledge-based systems project); Systems Control, Inc. (HASP project) PUBLICATIONS (See continuation page.) HIH 399 (FORMERLY PHS 393) Rav. 1/73 2 U, S. COVERNMENT PRINTTIG OFFICE : 1974 384-250/2004 Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH - FEIGENBAUM, Edward A. RESEARCH SUPPORT 1. Contract No.: Title of Project: Grant Agency: a. Project Period: Annual Funding: % of Effort: b. Proposed Renewal: Annual Funding: % of Effort: Grant No.: Title of Project: Project Period: Annual Funding: % of Effort: Grant Agency: Grant No.: Title of Project: Project Period: Annual Funding: % of Effort: Grant Agency: Grant No.: Title of Project: Project Period: Annual Funding: % of Effort: urant Azency: Proposal Submitted: Title of Project: Project Period: Annual Funding: % of Effort: Grant Agency: DAHC-15~-7 3-C-0435 Heuristic Progranning Projeat ARPA 1/73 - T/TT $ 225,762 90% summer, 49% academic year 8/77 ~ 9/79 $ 375,000 (8/77-9/78), $ 350,000 (10/78-9/79) 33% summer 1977, 17% academic year 1977-78, 100% summer 1978, 18% academic year 1978-79 RR-00612 Resource Helat Computers and 5/T7T =~ 4/89 $ 218,580 (5/77-4/738) (Direct Costs) 5% (no salary) NIA ed Research - Chemistry (DENDRAL) MCS 74-23451 Automation of Seientifie Inference: Heuristic Computing Applied to Protein Crystallography 2/75 - 4/79 + 6 mos. & 75,000 5% (no salary) NSF MCS 76~11649 MOLGEN: A Computer Science Application to Molecular Genetics 6/76 ~ 5/73 + 5 mos. $ 55,350 10% acadenia year, 100% summer (2 mos. 1977) NSF Biomedical Knowledge Engineering in Clinical Hedicine 1/78 = 12/89 $ 170,879 (Direct Costs) 10% (no salary) NIH (subcontract) Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH - FEIGENBAUM, Edward A. PUBLICATIONS Books and Monographs: Computers and Thought, co-editor with Julian Feldman, McGraw-Hill, 1963. Information Processing Language V Manual, Englewood Cliffs, N.J., Prentice-Hall, 1961 (with A. Newell, F. Tonge, G. Mealy, et.al.). An Information Processing Theory of Verbal Learning, Santa Monica, The RAND Corporation Paper P-1817, October 1959 (Monograph). Papers (1965-present): (List organized by topic) Heuristic DENDRAL Project: (1) J. Lederberg and E. A. Feigenbaum, "Mechanization of Inductive Inference in Organic Chemistry", in B. Kleinmuntz (ed), Formal Representations for Human Judgment, (Wiley, 1968). (Also Stanford Artificial Intelligence Project Memo No. 54, August 1967). (2) E. A. Feigenbaum and B. G. Buchanan, "Heuristic DENDRAL: A Program for Generating Explanatory Hypotheses in Organic Chemistry", in Proceedings, Hawaii International Conference on System Sciences, B. K. Kinariwala and F. F. Kuo (eds), University of Hawaii Press, 1968. (3) B. G. Buchanan, G. L. Sutherland, and E. A. Feigenbaum, "Heuristic DENDRAL: A Program for Generating Explanatory Hypotheses in Organic Chemistry". In Machine Intelligence 4 (B. Meltzer and D. Michie, eds) Edinburgh University Press (1969). (Also Stanford Artificial Intelligence Project Memo No. 62, July 1968.) (4) E. A. Feigenbaum, "Artificial Intelligence: Themes in the Second Decade". In Final Supplement to Proceedings of the IFIP 68 International Congress, Edinburgh, August 1968. (Also Stanford Artificial Intelligence Project Memo No. 67, August 1968.) (5) J. Lederberg, G. L. Sutherland, B. G. Buchanan, E. A. Feigenbaun, A. V. Robertson, A. M. Duffield, and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference I. The Number of Possible Organic Compounds: Acyclic Structures Containing C, H, 0 and N", Journal of the American Chemical Society, 91:11 (May 21, 1969). (6) A. M. Duffield, A. V. Robertson, C. Djerassi, B. G. Buchanan, G. L. Sutherland, E. A. Feigenbaum, and J. Lederberg, "Applications of Artificial Intelligence for Chemical Inference Il. Interpretation of Low Resolution Mass Spectra of Ketones". Journal of the American Chemical Society, 91:11 (May 21, 1969). Privileged Communication . Joshua LEDERBERG BIOGRAPHICAL SKETCH —- FEIGENBAUM, Edward A. Publications (continued) (7) B. G. Buchanan, G. L. Sutherland, E. A. Feigenbaum, "Toward an Understanding of Information Processes of Scientific Inference in the Context of Organic Chemistry", in Machine Intelligence 5, (B. Meltzer and D. Michie, eds) Edinburgh University Press (1970). (Also Stanford Artificial Intelligence Project Memo No. 99, September 1969.) (8) J. Lederberg, G. L. Sutherland, B. G. Buchanan, and E. A. Feigenbaum, "A Heuristic Program for Solving a Scientific Inference Problem: Summary of Motivation and Implementation", Stanford Artificial Intelligence Project Memo No. 104, November, 1969. (9) G. Schroll, A. M. Duffield, C. Djerassi, B. G. Buchanan, G. L. Sutherland, E. A. Feigenbaum, and J. Lederberg, “Applications of Artificial Intelligence for Chemical Inference III. Aliphatic Ethers Diagnosed by Their Low Resolution Mass Spectra and NMR Data". Journal of the American Chemical Society, 91:26 Wecember 17, 1969). (10) A. Buchs, A. M. Duffield, G. Schroll, C. Djerassi, A. B. Delfino, B. G. Buchanan, G. L. Sutherland, E. A. Feigenbaum, and J. Lederberg, "Applications of Artificial Intelligence for Chemical Inference IV. Saturated Amines Diagnosed by Their Low Resolution Mass Spectra and Nuclear Magnetic Resonance Spectra", Journal of the American Chemical Society, 92 (1970), 6831. (11) Y. M. Sheikh, A. Buchs, A. B. Delfino, G. Schroll, A. M. Duffield, C. Djerassi, B. G. Buchanan, G. L. Sutherland, E. A. Feigenbaum and J. Lederberg, "Applications of Artificial Intelligence for Chemical Inference V. An Approach to the Computer Generation of Cyclic Structures. Differentiation Between All the Possible Isomeric Ketones of Composition C6H100", Organic Mass Spectrometry, 4 (1970), 493. (12) A. Buchs, A. B. Delfino, A. M. Duffield, C. Djerassi, B. G. Buchanan, E. A. Feigenbaum and J. Lederberg, "Applications of Artificial Intelligence for Chemical Inference VI. Approach to a General Method of Interpreting Low Resolution Mass Spectra with a Computer", Chem. Acta Helvetica, 53 (1970), 1394. (13) E. A. Feigenbaum, B. G. Buchanan, and J. Lederberg, "On Generality and Problem Solving: A Case Study Using the DENDRAL Program". In Machine Intelligence 6 (B. Meltzer and D. Michie, eds.) Edinburgh University Press (1971). (Also Stanford Artificial Intelligence Project Memo No. 131.) (14) A. Buchs, A. B. Delfino, C. Djerassi, A. M. Duffield, B. G. Buchanan, E. A. Feigenbaum, J. Lederberg, G. Schroll, and G. L. Sutherland, "The Application of Artificial Intelligence in the Interpretation of Low-Resolution Mass Spectra", Advances in Mass Spectrometry, 5, 314. Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH — FEIGENBAUM, Edward A. Publications (continued) (15) B. G. Buchanan, E. A. Feigenbaum, and J. Lederberg, "A Heuristic Programming Study of Theory Formation in Science." In proceedings of the Second International Joint Conference on Artificial Intelligence, Imperial College, London (September, 1971). (Also Stanford Artificial Intelligence Project Memo No. 145.) (16) D. H. Smith, B. G. Buchanan, R. S. Engelmore, A. M. Duffield, A. Yeo, E. A. Feigenbaum, J. Lederberg, and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference VIII. An Approach to the Computer Interpretation of the High Resolution Mass Spectra of Complex Molecules. Structure Elucidation of Estrogenic Steroids", Journal of the American Chemical Society, 94 (1972), 5962-5971. (17) B. G. Buchanan, E. A. Feigenbaum, and N. S. Sridharan, “Heuristic Theory Formation: Data Interpretation and Rule Formation". In Machine Intelligence 7, Edinburgh University Press (1973). (18) D. H. Smith, B. G. Buchanan, W. C. White, E. A. Feigenbaun, C. Djerassi and J. Lederberg, "Applications of Artificial Intelligence for Chemical Inference X. Intsum. A Data Interpretation Program as Applied to the Collected Mass Spectra of Estrogenic Steroids." , Tetrahedron, 29, 3117 (1973). (19) E. A. Feigenbaum, "Computer Applications: Introductory Remarks," in "Proceedings of Federation of American Societies for Experimental Biology," 33, 2331 (1974). (20) B. G. Buchanan, D. H. Smith, W. C. White, R. Gritter, E. A. Feigenbaum, J. Lederberg and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference. XXII. Automatic Rule Formation in Mass Spectrometry by Means of the Meta-DENDRAL Program." Journal of the American Chemical Society, 98:6168, (1976). (21) E. A. Feigenbaum, R. S. Engelmore, and C. K. Johnson, "A Correlation between Crystallographic Computing and Artificial Intelligence Research," Acta Cryst. A33 (Jan 1):13-18, (1977). (22) H. Penny Nii and Edward A. Feigenbaum, "Rule-based Understanding of Signals," to be presented at Workshop on Pattern-directed Inference Systems, (May 1977). Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH — FEIGENBAUM, Edward A. Publications (continued) Information Processing Model Building in Psychology: (1) "Information Processing" in Readiness to Remember: Proceedings of the Third Conference on Remembering, Learning, and Forgetting, Gordon and Breach (1972). (2) "Information Processing and Memory," Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 4 (Biology and Health), University of California Press, 1967. Reprinted in Norman, D. (ed.) Models for Memory, Academic Press (1971). IFIP Congresses: (1) Invited speech: “Artificial Intelligence: Themes in the Second Decade." In Final Supplement to Proceedings of the IFIP 68 Congress, Edinburgh, August, 1968. Aliso available as A.I. Project Working Paper No. 67, August 1968. (2) Report on Panel on the Mechanization of Creative Processes. In Kalenich, W. (ed.), Proceedings of IFIP Congress 65, Volume 2, Spartan Books, 1966, pp. 600-601. Stanford Computation Center: “Computers at Stanford," (with N. Nielsen). In Stanford Annual Financial Report Summary, Stanford University, November 1967. Reprinted in IBM Computing Report, Vol. IV, No. 3 (May, 1968), 15-18. Other: "Soviet Computer Science, Revisited." Proceedings of the 20th ACM National Conference, August, 1965, pp. 225-226. Papers, Pre-1965: Available upon request. SECTION N — PRIVILEGED COMMUNICATION . BIOGRAPHICAL SKETCH (Givs the following information for oll professional personnal listed on page 3, beginning with the Principal Investigator, tke continuation pages and follow the same general format for each parson} NAME ‘ TITLE BIRTHDATE (Ma, Day, Yr] JIRAK, Gregory A. System Programmer April 24, 1951 PLACE OF SIRTH (City, Stats, Country) PRESENT NATIONALITY (If n9n-U.S. citizen, SEX " . indicats kind of visa and expiration cats) Flagstaff, Arizona, U.S.A. U.S. citizen CUsie Cl remats EDUCATION [32gin with baccalaureate training and inctuda postdoctoral) — —— par "YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERAED FIELD California Institute of Technology, B.S. 1974 Mathematics Pasadena . (Information Science) HONORS MAJOA RESEARCH INTEREST ROLE IN PROPOSED PROJECT Computer systems design System Programmer, MAINSAIL RESEARCH SUPPORT (See instructions) — RESEARCH ANDJOR PROFESSIONAL EXPERIENCE {Seartirg with present position, fist teyining and experiance ratev3nt ta ares of project, List ail or most representative publications, Do not exceed 3 pages for cach individual.) 1976 — present System Programmer, MAINSAIL, SUMEX Computer Project, Department of Genetics, Stanford University 1974 - 1976 Scientific Programmer, DENDRAL Project, Instrumentation Research Laboratories, Department of Genetics, Stanford University 1973 -~ 1974 Software Engineer, Image Processing Laboratory, Jet Propulsion Laboratory, California Institute of Technology 1971 - 1973 Systems Programmer, Rapidly Extensible Language Project, California Institute of Technology e 1970 ~- 1971 Electronic Technician, Sound Master, Inc., Tempe, Arizona 1970 System Programmer, IBM 360/25 DOS, Curtis, Woodman & Roach, Inc., Yuma, Arizona 1968 — 1969 Junior Programmer, IBM 1401, Data Processing Center, Inc., Yuma, Arizona PUBLICATIONS (none) NIH 393 (FORMERLY PHS 398) 7 Ray. 1/73 # U.S. COYERNMENT PRINTING OFFICE : 1974 sa4-050/2036 SECTION H — PAIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH {Give tha following information for ail professional parsonnal listed on pags 3, beginning with the Principal Inv=stigator. Use continuation pogas and follow ths me gensril format far seach person} NAME TITLE BIRTHDATE (Ma, Bay, ¥rJ JOHNSON, Suzanne M. Scientific Programmer November 26, 1944 PLACE OF GIRTH (City, Stats, Country) PRESENT NATIONALITY [/f non-U.S. citizen, SEX indicats kind of visa and expiration date) Pleasantville, New York, U.S.A. U.S. citizen > 3 CJ Mate Ry Famata EDUCATION ({Segin with baccatauraate training and include postdoctoral} . ne YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFEARED FIELO University of Arizona, Tucson B.S. 1966 Chemistry HONORS MAJOR RESEARCH INTEREST Computer applications in medicine and chemistry ROLE tN PROPOSED PROJECT Applications Programmer “RESEARCH SUPPOAT (See instructions) RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with present position, list trvnicrg and experiences relevant to area of project List alt or most roprssentative publications, Oo not axcoed 3 pages for sach individual.) 1974 - present Scientific Programmer, SUMEX Computer Project, Department of Genetics, Stanford University 1973 -— 1974 Scientific Programmer, Center for Radar Astronomy, Stanford Electronics Laboratories, Stanford University 1971 - 1973 Research Assistant (crystallographic studies/computer data reduction), Department of Chemistry, University of Iowa, Lowa City 1970 - 1971 Engineer, Geochemistry Section, Lockheed Electronics, Houston, Texas 1966 - 1969 Research Assistant (x-ray crystallographer) , Department of Chemistry, University of Tllinois, Urbana PUBLICATIONS (See continuation page.) NiH 393 (FORWEALY PHS 99a) Rey. 1/73 ll aU. S. GOVERNMENT PRINTING OFF7IC2: 1974 594-259/2004 Privileged Communication JoShua LEDERBERG BIOGRAPHICAL SKETCH - JOHNSON, Suzanne M. PUBLICATIONS 10. 11. 12. 13. Th. Johnson, S.M., Newton, M.G., Paul, I.C., Beer, R.J.S. and Cartwright, D.: The Molecular Structure of an Unsynn netrical §a-Thiathiophthen. Chem. Comaun., 1170, 1967. Johnson, S.M., MeKecknie, J.S., Lin, B. T-S. and Paul, I.C.: Crystal Structure of Bullvalene at 259. J. Am. Chem. Soc. 8937123, 1967. Jonnson, S.M., Paul, I.C., Rinehart, K.L., Jr. and Srinivasan, R.: The Molecular Configuration of Caldariomyein. J. Am. Chem. Soc. 90:135, 1968. Paul, I.C., Johnson, S.s., Paquette, L.A., Barrett, J.H. and Haluska, R.J.: The itlolecular Geometry of Derivatives of iH-Azepine in the Free and Complexed State. J. Am. Chem. Soc. 90:5023, 1968. Jonnson, S.M. and Paul, I.C.: Crystal and Molecular Structure of [16] Annulene. J. Am. Chem. Soc. 90:5555, 1958. Jonnson, S.M., Newton, M.G. and Paul, I.C.: Crystal and Molecular Structure of an Unsymmetrical 6a-Thiatniophthen: Single- erystal X-ray Analysis of 3~Benzoyl-5-p-bromo-pnenyl-2~methyl-thio-6a- thiathiopnthen. J. Chem. Soe. (8), 985, 1969. Paul, I.C., Johnson, S.M., Barrett, J.H. and Paquette, L.A.: The Thermal (6 + 4)W Co-cycloaddition of N-alkoxycarbonylazepins: Crystal Structure Analysis of a Derived Monomethiodide. Chen. Commun., 6, 1959. Coates, R.M., Parney, R.F., Johnson, S.M. and Paul, I.C.: The Crystal Structure of Khusimol p-Bromobenzoate. Chem. Commun., 999, 1959. Johnson, S.il. and Paul, I.C.: The Crystal and Molecular Structure of the Perhydromethiodide of an Unsyzsetrical N-~alkoxycarbonylazepine Dimer. J. Chem. Soc. (B), 1244, 1969. Johnson, S.. and Paul, I.C.: Crystal and Molecular Structure of 1- Acetonyl- 1-thionia-5-thia-cyclooetane Perchlorate. Tet. Letters, 177, 1969. Leonard, N.J., Golankiewiez, K., MoCredie, R.S., Johnson, S.M. and Paul, 1.C.: Synthetic Spectroscopic } iodels Related to Coenzymes and Base Pairs. III. A 1,1 t trimethyl ene-Linked Toaymine Photodiner of ecis-syn Structure. J. Am. Chem. Soc. 91:5855, 1969. Sabacky, M.J., Jonnson, S.M., Martin, J.c. a Paul, I.C.: Steric Effects in ortho-Substituted Triarylmethan J. An. Chem. Soc, 91:7542, 1969. Jonanson, S.M., Paul, I.C. and Xing, &.S.D.: [16] Annulene: The Crystal and Molecular Structure. J. Chem. Soc. (5), 643, 1970. Johnson, S.tl., Herrin, J., Liu, S.J. and Paul, I.C.: Crystal Structure of a Barium Complex of Antibiotic X-537A, Ba (C34 H5 3,03 )9 Hj 0. Chem. Commun. 72, 1970. 12 Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH - JOHNSON, Suzanne M. Publications (continued): 15. 17. 18. 19. Johnson, S.M., Herrin, J., Liu, S.J. and Paul, I.C.: The Crystal and Molecular Structure of the Barium Salt of an Antibiotic Containing a High Proportion of Oxygen. J. Am. Chem. Soc. 92:4428, 1970. Gibson, E.K. and Johnson, S.M.: Thermal Analysis-Inorganic Gas Release Studies of Lunar Samples. Proc. Second Lunar Science Conference 2:1351, 1971. Gibson, E.K. and Johnson, S.M.: Thermogravimetric-Quadrupole Mass-Spectrometric Analysis of Geochemical Samples. Thermochimica Acta 4:49, 1972. Carhart, R.E., Johnson, S.M., Smith, D.B., Buchanan, B.G., Dromey, R.G. and Lederberg, J.L.: Networking and a Collaborative Researcn Community: A Case Study Using the DENDRAL Progran. In Computer Networking and Chemistry (Ed. Peter Lykos), American Chemical Society Symposium Series, No. 19, 1975. Levinthal, E.C., Carhart, R.E., Johnson, S.M. and Lederberg, d.: When Computers Talk to Computers. Industrial Research, November, 1975. 13 _ SECTION Ul — PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH {Give tha following information for all professional personnal list2d on paga 3, beginning vath tha Principal Investigator. Use continuation pogas and follow tha mame zeneratl format for each person} NAME TITLE BIRTHOATE f{Ma, Oay, Yr.) KAHLER, Richard Q. Scientific Programmer — November 4, 1952 PLACE OF BIRTH (City, Stats, Country) PRESENT MATIONALITY (/f non-U.S. citizan, SEX — indicate kind of visa and expiration cata} Los Angeles, California, U.S.A. U.S. citizen FO mats [cl Female - : EDUCATION (3 9in with baccalauraats training and includs postdoctoral) , YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREES CONFERRED FIELD Stanford University (1969-72) None -- Electrical Engineering, Computer Science HONORS MAJOR RESEAACH INTEREST Subsystem software development, human engineering of user programs, user/project communications RESEARCH SUPPOAT (Soe jnstructions) ROLE IN PROPOSED PROJECT User Consultant RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Sorcing with prasent position, fst trining and experiance ralavant to ares of project. List all or mest reorssentotive publications, Do not excesd 3 pages for ech individual.) 1975 - present Scientific Programmer, SUMEX Computer Project, Department of Genetics, Stanford University 1975 Computer Programmer, Institute for Mathematical Studies in the Social Sciences (IMSSS), Stanford University PUBLICATIONS (none) NIH 39a (FORMERLY PHS 9398 Rav. 1/73 398) 15 aU. S. GOVERNMENT PRINTING OF PICZ : 1974 saa0259/2006 SECTION U1 — PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH {Give the following information for ail professional personnal tisted on page 3, beginning vd th the Principal lnysstigator. Use continuation pagas and follow the sams ganersi format for each person.) NAME LEDERBERG, Joshua TITLE Professor and Chairman Department of Genetics BIRTHDATE (Ma, Day, Yr.} May 23, 1925 PLACE OF BIATH (City, State, Country) Montclaire, New Jersey, U.S.A. PRESENT NATIONALITY (7f non-US eftizen, SEX indicate kind of visa and aupiration dats} U.S. citizen (XJ Mata (] Femala EDUCATION (Sayin with baccalaureate training and faclud? postdoc tora) - YeAR SCIENTIFIC INSTITUTIGN AND LOCATION DEGREE CONFERRED FIELD Columbia College, New York B.A. 1944 College of Physicians and Surgeons, Columbia Univ., New York (1944-46) Yale University Ph.D. 1947 Microbiology HONORS 1957 - National Academy of Sciences 1958 ~ Nobel Prize in Medicine MAJOR RESEARCH INTEREST Molecular Genetics, Artificial Intelligence AOLE IN PROPOSED PROJECT Principal Investigator RESEARCH SUPPORT (See instructions) (See continuation page.) RESEARCH AND/OR PAOFESSIONAL EXPERIENCE (Startirg with prasent position, Uist training and axpariance raisvant to arza of project, List ail or mest representative publications, Dea not exceed 3 pages for asch individual.) 1959 — present Professor and Chairman, Department of Genetics Stanford University School of Medicine 1957 - 1959 Chairman, Department of Medical Genetics University of Wisconson 1947 - 1957 Professor of Genetics University of Wisconsin SELECTED PUBLICATIONS (See continuation page.) HiH 398 (FORMERLY PHS 393) Rev. 1/73 17 Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH ~ LEDERBERG, Joshua RESEARCH SUPPORT Funding Current Project % of Grant Grant Ho. Title of Project Year Period Effort Agency PERSONAL RESEARCH COMMITMENTS: CA16896 Genetics of Bacteria $ 80,000 $ 464,669 15 NIH (5/77-4/78 (5/77-4/82) pending) NAS1-9692 Viking Mission $ 20,000 $ 82,572 10 NASA participation (5/77-9/78) (4/70-9/78) {inel. Indirect Costs) PRINCIPAL INVESTIGATOR EX OFFICIO: GMQ0295 Genetics Training $111,000 $ 536,363 20 NTH Grant (graduate (7/77-6/78) 9 (7/7%-5/79) research training) GM20832 Genetics Research $2656 ,587 $1,292,113 10 NIB Project (5/77-4/78) = (5/74-4/79) NGR~05-0290-004 Cytochemical Studies $ 137,500 3 NASA of Planetary (9/76-12/77) Microorzanisns (inel. Indirect Costs) SELECTED PUBLICATIONS 1. Lederberg, J.: Topology of Molecules. In The Mathematical Sciences (Ed. Committee on Support of Research in the Mathematical Sciences (COSRIMS) with George A.W. Boehm), MIT Press, ov. 37-51, 1959. 2. Lederberg, J., Sutherland, G.L., Buchanan, 8B.G., Feigenbaum, F.A., Robertson, A.V., Duffield, A.M. and Djerassi, C.: Applications of artificial intelligence for chemical inference. I. The number of possible organic compounds. Acyclic structures containing C, H, 0, and N. J. Am. Chem. Soc. 91:2973-76, May 21, 1949. 3. Buchs, A., Delfino, A.B., Duffield, A.M., Djerassi, C., Buchanan, B.a., Feigenbaum, E.A. and Lederbers, J.: Applications of artificial intelligence for chemical inference. VI. Aporoach to a general method of interpreting low resolution mass spectra with a computer. Helvetia Chinica Acta 53:1394-1417, 1970. 4. Lederberg, J.: Use of Computer to Identify Unknown Compounds: The Automation of Scientific Inference. In Biochemical Applications of Mass Spectrometry (Ed. G.R. Waller), John Wiley and Sons, New York, P. 193-207, 1972. 18 Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH ~ LEDERBERG, Joshua Selected Publications (continued): 5. 6. 10. 14. 12. 13. 4. 17. Lederberg, J.: The freedoms and the control of science — notes from the ivory tower. Southern California Law Review 45:596~614, 1972. Lederberg, J.: The control of chemical and biological weapons. Stanford J. International Studies 7:22-44, 1972. Lederberz, J.: The genetics of human nature. Social Res. 40:375-406, 1973. Lederberg, J.: A System-analytic Viewpoint. In How Safe is Safe? - Tae Design of Policy on Drugs and Food Additives, National Academy of Sciences, Washington, D.C., p. 66-94, 1974. Masinter, L., Sridharan, N., Lederberg, J. and Smith, D.H.: Applications of artificial intelligence for chemical inference. XII. Exhaustive generation of cyclic and acyclic isomers. J. Am. Chem. Soc. 96:7702-7714, 1974. Harris-Warrick, R.M., Elkana, Y¥., Ehrlich, S.D. and Lederberg, J.: Electrophoretic separation of B. subtilis genes (EcoR;/agarose gel electrophoresis). Proc. Nat. Acad. Sci. U.S.A. 72:2207-2211, 1975. Carhart, R.E., Johnson, S.M., Smith, D.H., Buchanan, B.G., Dromey, k.G. and Lederberg, J.: Networking and a Collaborative Research Community: A Case Study using the. DENDRAL Programs. In Computer Networking and Chemistry (Ed. Peter Lykos), ACS Symposium Series, No. 19, p. 192-217, 1975. Buchanan, B.G., Smith, D.H., White, W.C., Gritter, R., Feigenbaum, E.A., Lederberg, J. and Djerassi, C.: Applications of artificial intelligence for cheaical inference. XXII. Automatic rule formation in mass spectrometry by means of the meta-DENDRAL program. J. Am. Chem. Soe. 98:6168-63878, 1975. Sagan, C. and Lederberg, J.: The prospects for life on Mars: A pre-Viking assessment. Icarus 28:291-300, 1976. Ehrlich, S.D., Bursztyn-Pettegrew, H., Stroynowski, I. and Lederberg, J.: Expression of the thymidylate synthetase gene of the B. subtilis bacteriophage phni-3-T in E. coli. Proc. Nat. Acad. Sei. 73:4145-4199, 1976. Klein, H.P., Lederberg, J., Rich, A., Oyama, V.I. amd Levin, G.V.: The Viking Mission search for life on Mars. Nature 262:2H-27, 1975. Klein, H.P., Horowitz, N.H., Levin, G.V., Oyama, ¥.1I., Lederberg, J., Rich, A., Hubbard, J.S., Hobby, G.L., Straat, P.A., Berdahl, B.J., Carle, G.C., Brown, F.S. and Johnson, R.D.: The Viking biological investigation: Preliminary results. Science 194:99-105, 1975. Chi, N-Y. W., Ehrlich, $.D. and Lederberg, J.: Functional expression of two Bacillus subtilis chromosomal genes in Escnerichia coli. J. Bact., 1977. (In press) 19 SECTION Hl — PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH {Givg the folowing information for all professional personnal listed on page 3, beginnirg with tha Principat Inyastigator. Us? continuation pages ard follow the same ganeral format for each person) NAME TITLE , BIRTHDATE (Min, Day, Yr) . Adjunct Professor of Genetics LEVINTHAL, E11 . . i ; fott © Dir., Instrumentation Res. Lab. April 13, 1922 PLACE OF BIRTH iCity, State, Country} PRESENT NATIONALITY [ff n0n-U.S. citizen, SEX indicate kind of visa and expiration dats) Brooklyn, New York, U.S.A. U.S. citizen {I Mate Cl Femata EDUCATION {S2gin with baccalaureate training and includ postdoctoral) < YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERRED FIELD Columbia College, New York: B.A. 1942 Physics Massachusetts Institute of Technology M.S. 1943 Physics and Math Stanford University Ph.D. 1949 Physics and Math HONORS Public Service Medal, awarded by NASA, April, 1977, for exceptional contributions to the success of the Viking project MAJOR RESEARCH INTEREST ROLE IN PROPOSED PROJECT Medical instrumentation research AIM Liaison RESEARCH SUPPORT {See instructions) (See continuation page.) RESEARCH AND/OA PROFESSIONAL EXPERIENCE (Starting with orssant position, list tainicg and aaperiance ralavant to arza of project: List aif Of Most representative publications, Do not axceed 3 pages for each individual.) 1974 - present Adjunct Professor, Department of Genetics, Stanford University; Director, Instrumentation Research Laboratory, Department of Genetics, Stanford University 1970 - 1973 Associate Dean for Research Affairs, Stanford University School of Medicine 1961 - 1974 Senior Scientist/Director, Instrumentation Research Laboratories, Department of Genetics, Stanford University 1953 - 1961 President, Levinthal Electronic Products 1952 -— 1953 Chief Engineer, Century Electronics 1950 - 1952 Research Director/Member of Board of Directors, Varian Associates © 1949 — 1950 Research Physicist, Varian Associates 1946 - 1948 Research Associate, Nuclear Physics, Stanford University 1943 - 1946 Project Engineer, Sperry Gyroscope Company, New York 1943 Teaching Fellow in Physics, Massachusetts Institute of Technology PUBLICATIONS (See continuation page.) " WEH 398 (FORNEALY PHS 393) 21 Rev. 1/73 aU, S, COVE2NMENT PRINTING OF FIC2 11974 sae%259/2004 Privileg BIOGRAPH RESEARCH Grant NAS1-968 NGR-05-0 GiH20832 RR-00612 ed Communication Joshua LEDERBERG ICAL SKETCH - LEVINTHAL, Elliott C. SUPPORT Funding Current Project. 4% of Grant No. Title of Project Year Period Effort Agency 2 Viking Mission $ 85,000 $ 175,552 50 NASA participation (11/76-9/77) (11/75~-9/78) 20-004 Cytochemical Studies $ 137,500 11 NASA of Planetary (9/76-12/77) Microorganisms Genetics Research $ 266,587 $1,292,113 8 NIH Project (5/77-4/78) — (5/74-4/79) Resource Related $ 218,580 & 698,399 6 NIH Research-Computers . (5/77-4/78) (5/77-4/80) and Chemistry ( DENDRAL) SELECTED PUBLICATIONS AND PAPERS 1. 4, Levinthal, &.C.: Detection of Extraterrestrial Life. Professional and Technical Group of Instrumentation and Measurements of IEEE, April, 1963. Levinthal, £.C.: The Detection of Life within our Planetary System. Presented at WESCON, August, 1963. Levinthal, E.C.: The Biological Exploration of Mars. Presented at the Space Technology Laboratory’s Invited Lecture Series, November 6, 1963. Levinthal, E.C.: The Biological Exploration of Mars. Presented at woffet Field, Fullerton, Los Angeles and San Diezo, as part of the University of California Extension Series Lectures -~ Horizons in Space Biosciences: Exobiology, April 27-30, 1964. Levintnal, E.C., Lederberg, J. and Hundley, L.: Multivator - 4 Biochemical Laboratory for Martian Experiments. Life Seiences and Space Research II, COSPAR (Committee on Space Research), 1964. Halpern, B., Westley, J.W., Levinthal, E.C. and Lederberg, J.: Tne Pasteur Probe: An Assay for Molecular Asymmetry. Life Sciences and Space Research, COSPAR (Committee on Space Research), 1956. Levinthal, &.C.: Space Yehicles for Planetary Missions. In Biology and the Exploration of Mars, Nat. Acad. Sei., National Research Council. 22 Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH - LEVINTHAL, Elliott C. selected Publications and Papers (continued): 8. 9. 10. 11. T2. 13. 14. 15. 16. V7. ‘Masursky, H., Batson, R., Borgeson, W., Carr, M., MeCauley, J Levinthal, E.C.: Prospects for Manned Mars Missions. In Biology and the Exploration of Mars, Nat. Acad. Sci., National Research Council. Reynolds, O., Levinthal, E. and Soffen, G.: The Role of the Scientist in Automated Laboratory Systems. AIAA Paper No. 67-632, 1967. Levinthal, E.C., Lederberg, J. and Sagan, C.: Relationship of Planetary Quarantine to Biological Search Strategy. Presented at COSPAR Meeting (Committee on Space Research), London, 1967. Sagan, C., Levinthal, E.C. and Lederberg, J.: Contamination of Mars. Seience 159:1191-1196, 1968. Levinthal, E.C.: The Role of Molecular Asymmetry in Planetary Biological Exploration. Presented at Gordon Pesearch Conferences, Nuclear Chemistry Section, 1968. Kriss, J.P., Bonner, W.A. and Levinthal, E.C.: Variable Time-Lapse Videoscintiscope: A Modification of the Scintillation Camera Designed for Rapid Flow Studies. J. Nuclear Med. 10:249, 1959. Reynolds, W.E., Bacon, V.A., Bridges, J.C., Coburn, T.C., Halpern, B., Lederberg, J., Levinthal, &.C. and Steed, E.: A Computer Operated Mass Spectrometer System. Anal. Chem. 42:1122, 1970. “s Milton, D., Wildey, R. and Wilhelms, D., Murray, B., Horowitz, N., Leighton, R. and Sharp, R., Thompson, W., Briggs, G., Chandeysson, P. and Shipley, E., Sagan, C. and Pollack, J., Lederoerg, J., Levinthal, E., Hartmann, W, McCord, T., Smith, 8., Davies, M., de VYaucouleurs, G., Leovy, C.: Television Experiment for Mariner Mars 1971. Iearus 12:10-45, 1970. Masursky, H., Batson, R.M., McCauley, J.F., Soderblom, L.A., Wildey, R.L., Carr, M.H., Milton, D.J., Wilhelms, D.E., Smith, B. A., Kirby, T.B., Robinson, J.C., Leovy, C.B., Briggs, G.A., Duxbury, T.C., Acton, C.H., Jr., Murray, B.C., Cutts, J.A4., Sharp, R.P., Smith, Susan, Leignton, R.B., Sagan, C., Veverka, J., Noland, M., Lederberg, J., Levinthal, E., Pollack, J.B., Moore, J.T., dr., Hartmann, W.X., Shipley, E.N., de Vaucouleurs, G., Davies, M.&.: Mariner 9 Television Reconnaissance of Mars and Its Satellites: preliminary Results. Science 175(4919):294, 1972. C., Morris, E.C., Muteh, T.A., Binder, A.B., Huck, F.O., Levinthal, &. earus 16:92, 1972. Sagan, Carl and Young, A.T.: Imaging Exveriment. I Sagan, Carl, Veverka, Joseph, Fox, Paul, Dubisen, Russel, Lederberg, Joshua, Levintnal, Elliott, Quam, Lyna, Tucker, Robert, Pollack, James B. and Smith, Bradford A.: Variable Features on Mars: Preliminary Mariner 9 Television Results. Icarus 17:346, 1972. 23 Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH - LEVINTHAL, Elliott C. Selected Publications and Papers (continued): 19. 20. 21. ‘22. 23. 24. 25. Levinthal, E.C., Green, W.B., Cuts, J.A., Janelka, E.D., Johnsen, R.A., Sander, M.J., Seidman, J.B., Young, A.T. and Soderblom, L. A.: Mariner 9 - Image Processing and Products. Icarus 18:1088, 1973. Sagan, C., Veverka, J., Fox, P., Dubisch, R., French, R., Gierasch, P., Quam, L., Lederberg, J., Levinthal, E., Tucker, R., Eross, B. and Pollack, J.B.: Variable Features on Mars, 2, Mariner 9 Global Results. J. Geophysical Research 78, No. 20, p. 4163-4196, 1973. Lederberg, J., Feigenbaum, E., Levinthal, E. and Rindfleisch, T.: SUMEX - A Resource for Application of Artificial Intelligence in Medicine. Proc. Ann. Conference, Association for Computing Machinery, November, 1974. Levinthal, E.C., Carhart, R.E., Johnson, S.M. and Lederberg, J.: When Computers Talk to Bach Other. Industrial Research 17(12):35-42, 1975. Mutch, T.A., Binder, A.B., Huck, F.O., Levinthal, E.C., Liebes, S., Morris, E.C., Patterson, W.R., Pollack, J.B., Sagan, C. and Taylor, G.R.: The Surface of Mars: The View from the Viking I Lander. Seience 193(4255):791-801, 1976. Mutch, T.A., Arvidson, R.E., Binder, A.B., Huck, F.O., Levinthal, E.c., Liebes, S., Jr., Norris, E.C., Nummedal, D., Pollack, J.B. and Sagan, C. Fine Particles on Mars: Observations with the Viking I Lander Cameras. Seience 194(4260):87-91, 1976. Levinthal, E.C. and Huck, F.O.: Multispectral and Stereo Imaging on Mars. In Astronautical Research 1976 —- A New Era cf Space Transportation, Pergamon Press, 1976. Proc. of the XXVII International Astronautical Congress, Anaheim, California, 1976. Muteh, T.A., Arvidson, R.E., Aurin, P., Binder, A.B., Huck, F.O., Levinthal, E.C., Liebes, S., Jr., Morris, E.C., Pollack, J.B., Sagan, C. and Saunders, R.: The Surface of Mars: The View from Lander 2, Seience 194(4271):1277-1283, 1976. 24 SECTION I — PRIVELEGEO COMMUNICATION _ BIOGRAPHICAL SKETCH {Give tha following information for all professional parsonnel listed on page 3, beginning with the Princioal Investigetor. Use continuation pages and follow tha same ganar format for each person} NAME TITLE BIATHDATE (a, Day, Yr} NII, H. Penny Scientific Programmer - October 6, 1939 PLACE OF BIRTH {City, Stats, Country) PRESENT NATIONALITY (ff non-US, citizen, SEX indicate kind af visa and expiration data) Tokyo, Japan U.S. citizen Climate 3) Farmata- EOUCATION (B2gin with baccalaursate trainirg and includs postdoctoral) . , eras YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFEARED FIELD Tufts University, Jackson College, . B.S. 1962 Mathematics Medford, Massachusetts Stanford University M.A. 1973 Computer Science HONOAS BIAJOR RESEARCH INTEREST ROLE IN PROPOSED PROJECT Knowledge-based computer systems design Scientific Programmer — , AI tool generalization RESEARCH SUPPORT (Ses instructions) (See continuation page.) RESEARCH AND/OR PROFESSIONAL EXPERIENCE {Srartirg wath prasent position, list triining and experiance ratevant to arsa of project. List pl} or mest rprssentativa publications, Do not axceed 3 pages for each individual.) 1976 - present Scientific Programmer, Heuristic Programming Project, Department of Computer Science, Stanford University 1973 - 1975 Associate Investigator for Computer Science, HASP Project, Systems Control, Inc., Palo Alto, California 1967 ~ 1968 Systems Engineering Advisor, International Business Machines World Trade Asia Corporation, Tokyo, Japan 1962 - 1967 Research Staff Programmer, International Business Machines Corporation, Thonas J. Watson Research Center, 1965-67 Project Leader, Electronic Coding Pad (ECP) System 1965-66 Assistant Manager, Man-Computer Interaction Group 1963-64 Programmer, World's Fair Lexical Processing System 1962-63 Programmer, applications ranging from text processing to linear programming problems RECENT PUBLICATIONS (See continuation page.) WIH 338 (FORWEALY PHS 998) 2 Rov. 1/73 &U, 3, GOVERNMENT PRINTING OFFICE: 1974 se4-25g/0008 Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH ~ NII, H. Penny RESEARCH SUPPORT Funding _ Current Project % of Grant Grant No. Title of Project Year Period Effort Agency DAHC- Heuristic Programming (incl. Indireet Costs) ARPA 15-7 3-C-~0435 Project Current: $ 225,762 ¢$ -- 100 (7/76-7/77) (7/73-7/77) Proposed renewal: $ 375,000 ¢$ 725,000 80 (8/77-9/78) (8/77-9/79) MCS 74-23461 Automation of $ 75,000 $ 150,200 20 NSF Scientific Inference: (5/77-4/78) (5/77-4/79 (eff. Heuristic Computing + 6 mos.) 6/TT) Applied to Protein (inel. Indirect Costs) Crystallography RECENT PUBLICATIONS 1. Feigenbaum, E.A., Nii, H.P., et al.: HASP (Heuristic Adaptive Surveillance Program) Final Report, Vol. I-IV, Technical Report under ARPA Contract M66314-~74-C-1235, Systems Control, Inc., Palo Alto, California, 1975. (Classified document) 2. Engelmore, R.A. and Nii, H.P.: A Knowledge-based System for the Interpretation of Protein X-ray Crystallographic Data. Heuristic Programming Project Memo, HPP-77-2 (also STAN-CS-77-589), January, 1977. 3. Nii, H.P. and Feigenbaum E.A.: Knowledge-based Understanding of Signals. Proc. Workshop on Pattern-Directed Inference Systems, May, 1977. 26 SECTION UH — PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH (Giva the following information for all professional personnal listed on page 3, beginning with tha Principal Investigator, SF continuation pegas and follow tha came general format for aach person} NAME TITLE BIRTHDATE (a, Day, ¥7.)} RINDFLEISCH, Thomas C. Senior Research Associate December 10, 1941 PLACE OF SIRTH (City, Stata, Counrry} PRESENT NATIONALITY (If non-US eftizen, SEX indicate kind of visa and expiration cata} Oshkosh, Wisconsin, U.S.A. U.S. citizen (Mala (“) Femata EDUCATION (3 29in with baccslaursaia training and includes postdcetory) ane YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONZEARED FIELD Purdue University, Lafayette, Indiana B.S. 1962 Physics California Institute of Technology, M.S. 1965 | Physics Pasadena Ph.D. Thesis to ne completed; all course work and examinations completed. HONORS Graduated with Highest Honors, Purdue University NSF Fellowship, Caltech Sigma Xi MAJOR RESEARCH INTER =ST Computer science AOLE IN PROPOSED PROJECT applications in medical research; image Facility Manager processing and artificial intelligence RESEAACH SUPPORT (See instructions} RESEARCH AND/OA PROFESSIONAL EXPERIENCE (Starting with prasant zosition, list training and experiance ratavant to orza of project List a} or mest representative pudlications, Do not axcead 3 pages for each individual.) ‘ Department of Genetics, Stanford University School of Medicine: 1976 ~ present Senior Research Associate/Director, SUMEX Computer Project 1974 - 1976 Research Associate/Director, SUMEX Computer Project . 1971 - 1976 Research Associate — Mass Spectrometry, Instrumentation Research Jet Propulsion Laboratory, California Institute of Technology, Pasadena: 1969 - 1971 Supervisor of Image Processing Development and Applications Group 1968 - 1969 Mariner Mars 1969 Cognizant Engineer for Image Processing 1962 - 1968 Engineer, design and implement image processing computer software PUBLICATIONS (See continuation page.) HIH 393 (FORNERLY PHS 393) 27 Rev. 1/73 # YU. S. COVERNMENT PRINTING OFFICE: 1974 saspso/smas Privileged Communication Joshua LEDERBERG BLOGRAPHICAL SKETCH - RINDFLEISCH, Thomas C. PUBLICATIONS 10. 11. 13. 4. Rindfleisch, T. and Willingham, D.: A Figure of Merit Measuring Picture Resolution. JPL Technical report 32-666, September, 1955. Rindfleisch, T.: A Photometric Metnod for Deriving Lunar Topographic Information. JPL Technical Report 32-785, September, 1965. Rindfleisch, T. and Willingham, D.: A Figure of Merit Measuring Picture Resolution. Advances in Electronics and Electron Physics, Vol. 22A, Photo~Electronic Image Devices, Academic Press, 1956. Rindfleisch, T.: Photometric Method for Lunar Topography. Photogrammetric Engineering, March, 1966. Rindfleisch, T.: Generalizations and Linitations of Photoclinometry. JPL Space Science Summary, Vol. ITI, 1957. Rindfleiseh, T.: The Digital Kemoval of Noise from Imagery. JPL Space Science Summary 37-62, Vol. III, 1970. Rindfleisch, T.: Digital Image Processing for the Rectification of Television Camera Distortions. Astronomical Use of Television- Type Image Sensors. NASA Special Publication SP-256, 1971. Rindfleisch, T., Dunne, J., Frieden, H., Stromberg, W. and Ruiz, R.: Digital Processing of the Mariner 6 and 7 Pictures. J. Geophysical Research, Vol. 76, Yo. 2, January, 1971. Pereira, W.E., Summons, R.E., Reynolds, W.E., Rindfleisch, T.c. and Duffield, A.M.: The Quantitation of Beta-Aminoisobutyric Acid in Urine by Mass Fragmentography. Clinica Chimica Acta, 49, 1973. Summons, R.E., Pereira, W.E., Reynolds, W.E., Rindfleisch, T.C. and Duffield, A.M.: Analysis of Twelve Amino Acids in Biological Fluids by Mass Fragmentograpny. Analytical Chemistry, Vol. 46, No. 4, April, 1974. Pereira, W.E., Summons, R.E., Rindfleisch, T.c. and Duffield, A.M.: The Determination of Ethanol in Blood and Urine by Mass Pragmentograpny. Clin. Chim. Acta, 51, 1974. Pereira, W.E., Sumaons, R.E., Rindfleisoh, T.C., Duffield, A.M., Zeitman, B, and Lawless, J.G.: Stable Isotoos : a ss Fragmentography: Quantitation and Hydrogen-Deuterium Exchange Studies of Eight Murchison Meteorite Amino Acids. Geochem. et Cosmochim. Acta, 39, 153, 1975. h x Dromey, R.G., Stefik, M.J., Rindfleise T.C. and Duffield, A.M. Extraction of Mass Spectra Free of Background and Neighboring Component Contributions from Gas Chromatography/Mass Spectrometry Data. Analytical Chemistry, 48, 1358, 1976. Ui ~~ Smith, D.H., Yeager, W.J., Anderson, P.J., Fiteh, W.., Rindfleiseh, T.Cc. and Achenbach, M.: Historical Library Search. An Approach to Quantitative Comparison of GC/MS Profiles of Complex Mixtures. (Submitted for publication) 28 SECTION 11 — PAIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH (Giv2 the folowing information for all professional parsonnal listed on paga 3. baginnirg vith tha Principal Inyastigator. Use continuation pegas and follow tha same general format for cach person} NAME TITLE BIRTHDATE (Ma, Day, Yr) SCHULZ, Rainer W. Computer Systems Specialist January 29, 1942 PLACE OF SIATH (City, Stara, Country) PRESENT NATIONALITY (/f non-U.S citizen, SEX indicats kind of visa and axpiration cata} Berlin, Germany U.S. citizen (OU Mate ("} Female EDUCATION (2agin with baccalaureate training and include postdoctoral} = . - YEAR SCIENTIZIC INSTITUTION AND LOCATION DEGREE CONFERAED FIELD California State University, San Jose B.A. 1964 Mathematics, Engineering RONOAS Graduated Summa Cum Laude, California State University MAJOR ASSEARCH INTEREST Computer systems design ROLE IN PROPOSED PROJECT System Programmer RESEARCH SUPPOAT (Ses instruczons) RESEARCH AND/OA PROFESSIONAL EXPERIENCE (Strrting with prasent position, jist training and axperiance rasvant to area of project. List all or Most represantstive pubkicstions, Do not exceed 3 pages for eech individual.) (See continuation page.) PUBLICATIONS (none) HIH 393 (FORYERLY PNS 398) Rav, 1/73 29 ® U.S, GOVERNMZNT PRINTING OFFICE: 1974 sas-25a/a004 Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH - SCHULZ, Rainer W. RESEARCH AND/OR PROFESSIONAL EXPERIENCE Work Experience: 1971 present Institute for Mathematical Studies in the Social Sciences (IMSSS), Stanford University: System Manager. Responsible for operations of large-scale PDP-10 timesharing system. Manager, system software. Technical evaluation responsibility of software and computer hardware. System design and systems development. 1970 -— 1971 Computer Qperations, Inc., Costa Mesa, California: Design of operating system for computer to be built by COT. 1969 ~ 1970 Berkeley Computer Corporation, Berkeley, California: Project leader of BCC timesharing software. Guided monitor and peripheral processor software design and implementation. Coded approximately 50% of basic system. Wrote some micro eode for peripheral processors. 1967 - 1969 Scientific Control Corporation, Dallas, Texas: Assisted Project Genie at the University of California, Berkeley, refining XDS 940 timesharing system. Involved in design of SCC 6700 timesnaring software and hardware, particularly resource allocation and memory management. 1965 =~ 1967 Xerox Data Systems, El Segundo, California: Diagnostic programming for I/O channels. Design of peripheral hardware simulators. Design/implementation of multi-vrogramned system evaluation and diagnostic test for all Sizma computers. 1904 - 1965 IBM, San Jose, California: Wrote an assembler and loader for IBM 1890 and 1130 systems. Assembler ran on a 1401. Wrote diagnostic programs for process control equipment. Assisted engineering in debugeing prototype 1800 and 1130 machines. 30 Privileged Communication Joshua LEDERBERG SLOGRAPHICAL SK ETCH - SCHULZ, Rainer W. Research and/or Professional Experience (continued): Professional Aetivities: 1975 1974 1974 - 1975 1974 = 1975 1973 - present 1973 - present 1973 - 1974 1971 - 1976 1971 - 1973 Intel Corporation, Santa Clara, California: Data processing administrative consultant. systen performance and hardware evaluation. System improvement proposals. System Control, Ine., Palo Alto, California: Secure system design. Consultant in system computer system evaluation. design and University of Southern California (USC~ECL, USC-ISI), Los Angeles: Consultant in system and administrative area regarding computer operations and system develooment. Digital Equipment Corporation, Marlboro, ilassachusetts: Consultant in system development area and marketing decisions for large-scale systems. National Science Foundation, Washington, D.C.: Consultant in technological innovations. Evaluating proposals for technical feasibility. Reviewing highly technical projects in computer science area. Computer Curriculum Corporation, Palo Alto, California: System consultant and software management of prograaminz staff for small computer systems. University of Hawaii, Honolulu: Lecturer in Computer Systen Design and Conputer-Assisted Instruction. Ames Research Center, Mountain View, California: Consultant in System Design and Development of timesharing systems for the ILLIAC IY Project. rnia: Institute for the Future, Menlo Park, o sign r Information Consultant in Computer Svstem De Retrieval Systeus. 31 SECTION Il — PRIVILEGED COMMUNICATION BIOGRAPHICAL SXETCH {Siva the foliowing information for ail professional personne listed on page 3, beginning with t19 Principal Inystigator. Use continuation pogas and follow the same general format for oaca parson} MAME SWEER, Andrew J. TITLE System Programmer SIRTHDATE (Ma, Day, Yr} March 12, 1945 PLACE OF BIRTH [City, Stata, Counmry} Washington, D.C., U.S.A. U.S. citizen PRESENT NATIONALITY Uf ren-U.S citizen, indicsta kind of visa and expiration date) SEX KJ Mata ([] Femaia EDUCATION (82gin with daccalaursete training and includa pastcoctoral)} eron YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERRED FIELO University of Pittsburgh, Pennsylvania B.S. 1965 Mathematics University of Pittsburgh, None -_—— Mathematics, graduate school (1965-66) Computer Science HONORS MASOR RESEAACH INTEREST Operating systems ROLE tN PROPOSED PROJECT System Programmer RESEARCH SUPPOAT (S29 jnstructions} RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Sarrting with prssent position, lizt tesiniog and expariance reiavant to aras of prefect, List all or most representativa publications, Do not exceed 3 peges for each individual.) 1976 —- present Head System Programmer, SUMEX Computer Project, Department of Genetics, Stanford University 1974 ~ 1975 Senior Systems Designer, ILLIAC IV Project, Evans and Sutherland 1970 - 1974 Systems Analyst Supervisor, Computer Center, University of Pittsburgh 1968 1966 University of Pittsburgh PUBLICATIONS (none) 1969 Computer Specialist, Office of Personnel Operations, Department of the Army, Headquarters the Pentagon 1968 Systems Programmer/Analyst, Computer Center, HiH 393 (FORMERLY PHS 993) Ray. 1/73 33 w U.S. GCOVYEANMANT PRINTING OFFICE : 1974 saa-esasooca SECTION H — PAIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH {Gives the following information for all professional personnel listed on pags 3, beginning vith the Principal Invsstigator, Use continuation payas and follow the same zanaral format for each parson} NAME VELZADES, Nicholas TITLE R&D Engineer Instrumentation Research Labs. SIRTHOATE fa, Day, ¥rJ August 25, 1932 PLACE OF BIRTH /City, Stata, Country) Larissa, Greece PRESENT NATIONALITY {ff non-US citizan, SEA indicate kind of visa and expiration cata} U.S. citizen () Maia [7 Femata EDUCATION [Segfa with baccalaureate training and ineluda postdoctoral) - epee YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERRED FIELD City College of San Francisco, California (1954-55) University of California, Berkeley B.S. 1958 Electrical Engineering Stanford University M.S. 1961 Engineering Science HONORS MAJOR HESEARCH INTEREST Electronic circuit design HOLE IN PROPOSED PROJECT Electronics Engineer RESEARCH SUPPORT (See instructions} (See continuation page.) RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with prssent position, fist training and aaperiancs retzvant to arse oF project, List all or mest r2prssenta Sys publications, Do not exceed 3 pages for each individual, } 1962 ~- present Electronics Engineer, Instrumentation Research Laboratories, Department of Genetics, Stanford University 1961 ~- 1962 Project Engineer, Fairchild Semiconductor (Instrumentation), Division of Fairchild Instrument and Camera Company, Palo Alto, Ca. 1958 - 1961 Senior Engineer, Link Division, General Precision, Inc., Palo Alto, Ca, PUBLICATIONS (none) NIH 398 (FORSZRLY PHS 39a) Rav. 1/73 35 & U.S. GOYESNMENT PRINTING OFFICE: 1974 sas-osasanos Privileged Communication Josnua LEDERBERG BIOGRAPHICAL SKETCH - YELZADES, Nicholas RESEARCH SUPPORT Funding Current Project % of Grant Grant No. Title of Project Year Period Effort Ageney RR-00512 Resource Related $ 213,530 $ 698,399 25 NIH Research-Computers (5/77-4/78) (5/77-4/89) and Chemistry (DENDRAL) GM20832 Genetics Research $ 265,587 $1,292,113 18 NIH Project (5/77-4/78) (5/74-4/79) NGR-05-020-004 Cytochemical Studies $ 137,509 1 NASA of Planetary (9/76-12/77) Microorganisms 36 . SECTION 10 ~ PRIVILEGED COMMUMICATION BIOGRAPHICAL SKETCH (Give the foitowing information far ail professional personnd listed on pags 3, beginning with the Principal Invastigator. Use continuation peges and fallow the same ganerst format for each person} NAME TITLE BIRTHDATE (4a, Day, Yr) WILCOX, Clark R. Student Research Assistant May 3, 1948 PLACE OF BIRTH (City, Stata, Country) PRESENT NATIONALITY (If non-US. citizen, SEX indicata kind of visa ard axpiration data} Winston-Salem, North Carolina U.S. citizen [2] Mata (] Femata EDUCATICN (329in with baccalauraata training and inciuds postdoctoral) INSTITUTION AND LOCATION DEGREE CONeEHAED aaa Duke University, Durham, North Carolina B.S. 1970 Mathematics Stanford University M.S. 1973 Computer Science Stanford University (1973-present) Ph.D. (In progresg) Computer Science HONOAS Phi Beta Kappa, Duke University Graduated Magna Cum Laude, Duke University MAJOR RESEARCH INTEREST ROLE IN PROPOSED PROJECT Software portability System Programmer RESEARCH SUPPORT (Sas instructions) ESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with prosent position, list training snd expariance relevant to area of project. List all ormecst rspissantative publicatior® Do not exceed 3 pages for each individual.) 1974 - present Student Research Assistant (MAINSAIL design/implementation), SUMEX Computer Project, Department of Genetics, Stanford University 1970 ~ present Ph.D. Candidate, Department of Computer Science, Stanford University: , 1973-present Research in software portability and directly executable languages under Dr. Michael Flynn 1972-73 Research in complexity theory under Dr. Robert Floyd 1969 — 1970 Undergraduate student, Duke University: 1969-70 Research in symbolic computation under Dr. Robert Caviness, Math. 1969-70 Design/implementation of medical information system under Dr. William Hammond, Medicine 1969 Programmer, Computer Center PUBLICATIONS Wilcox, C.R.: MAINSAIL - A Machine Independent Programming System. Proc. Digital Equipment Computer Users Society (DECUS), 2(4):975-979, Spring, 1976. NIH 393 (FORMERLY PHS 398) 37 Rav. 1/73 w U.S. GOVERNMENT PRINTING OFFICE: 1974 sae-25u/2038 COLLABORATIVE PROJECTS 6 COLLABORATIVE PROJECT PROGRESS AND OBJECTIVES The following subsections report on the collaborative use of the SUMEX facility including the formally authorized projects within the Stanford and AIM aliquots and the various "pilot" efforts currently under way. These project descriptions and comments are the result of a solicitation for contributions sent to each of the project Principal Investigators requesting the following information: I) Summary of research program A) Technical goals B) Medical relevance and collaboration C) Progress summary D) Up-to-date list of publications E) Funding status 1) Current funding 2) Pending applications and renewals II) Interactions with the SUMEX-AIM resource A) Examples of collaborations and medical use of programs via ' SUMEX B) Examples of sharing, contacts and cross-fertilization with other SUMEX-AIM projects (via workshops, system facilities, personal contact, etc.) C) Critique of resource services III) Follow-on SUMEX grant period (8/78 - 7/83) A) Long-range user project goals and plans B) Justification for continued use of SUMEX by your project C) Comments and suggestions for future resource goals, development efforts, ete. We believe that the reports of the individual projects speak for themselves as rationales for participation; in any case the reports are recorded as submitted and are the responsibility of the indicated project leaders. 6.1 STANFORD PROJECTS The following group of projects is formally approved for access to the Stanford aliquot of the SUMEX-AIM resource. Their access is based on review by the Stanford Advisory Group and approval by Professor Lederberg as Principal Investigator. As noted previously, the DENDRAL project was the historical eore application of SUMEX. Although this is described as a "Stanford project," a Significant part of the development effort and of the computer usage is dedicated to national collaborator-users of the DENDRAL programs. Privileged Communication 4 J. Lederberg Section 6.1.1 DENDRAL PROJECT 6.1.1 DENDRAL PROJECT DENDRAL - Resource Related Research - Computers & Chemistry Carl Djerassi, Principal Investigator Professor of Chemistry Stanford University I. OVERVIEW OF RESEARCH ACTIVITIES Technical Goals Qur research, development and future plans focus on both the question of structure elucidation in general and the problem of providing computer assistance to scientists engaged in specific aspects of this important activity. A simplified representation of major milestones in solving unknown biomolecular structures by manual methods is presented in Figure 1. UnNxNOWN ee Jo) SPECTROSCOPY STRUCTURAL comPoun SPECTRA pata INFERENCES oR CHEMsBIOL/ [TT + pe} 9 =STRUCTURE REARRANGED OTHER DATA INTERPRETATION AND ASSEMBLY — CONSTRAINTS KNOWN PHYS.HISTORY COMPOUNDS NEW STRUCTURAL ELIMINATE ANDIDAT INCONSISTENT [-—> Colon INFERENCES AND STRUCTURES CONSTRAINTS MORE SPECTROSCOPY common - 2 rLAN AND EXAMINE ‘ 1 | EXPERIMENTS UNIQUE STRUCTURES ; CHEMICAL ' FEATURES ! | TRANSFORMS ’ ' ' t ‘REACTION ‘ FINAL oo 1 STRUCTURES | 1 SEQUENCES , ns ' ee eee ee ee Figure 1. Important steps in manual solution of structures of unknown chemical compounds. These steps, indicated as separate boxes, may be performed explicitly or implicitly. There are considerably more complex relationships among the boxes of Fig. 1 than are indicated when structures are actually solved. Nevertheless, the Figure provides a good introduction to both our recent work and our future directions. We describe briefly each of the milestones in the following paragraphs. More detailed discussions of each topic follow in subsequent sections. J. Lederberg 2 Privileged Communication DENDRAL PROJECT Section 6.1.1 The first step in identification of an unknown structure is to separate it from other components in a potentially complex mixture and to isolate it in reasonably pure form. These steps are performed by scientists, frequently with the assistance of various instruments. Although our research is not directed toward any part of this separation and isolation procedure (except insofar as these procedures also yield data which are subject to computer-assisted interpretation), information about the chemical and physical characteristics of tne compound may be crucial to further efforts to determine its structure. Depending on the quantity of sample available and its characteristics, various spectroscopic and additional chemical data are then collected on the unknown. A mass spectrum is frequently obtained, e.g., from a combined gas chromatograph/mass spectrometer (GC/MS) system. An important part of our recent proposal to the NIH is directed toward automation of combined GC/MS systems operated at high mass spectrometer resolving powers. Data on elemental compositions and relative ion abundances are then available in computer-readable form for further analysis (see MSRANK). The chemist possess an armamentarium of Spectroscopic techniques which can be brought to bear on a structure. One advantage of our work is that any data so obtained can be used to help solve the Structure as long as it can be expressed, manually or by computer, in Substructural statements about the unknown. The next important phase in structure elucidation is interpretation of the available data (Fig. 1) in terms of structural features of the molecule. These interpretations may be in terms of known structural units ("superatoms", polyatomic aggregates of atoms in known configurations), or in terms of structural units, ring sizes, proton or carbon distributions. The latter set of features represents constraints on the kinds of structures which are possible. Our efforts in the area of computer-assisted data interpretation are focussed on mass spectral and carbon-13 nuclear magnetic resonance (13CMR) data. We are developing general approaches to automated analysis of these data in terms of structural features of unknowns. Our recent efforts are summarized in Figure 2, and discussed in detail Subsequently. We have been concerned with use of these data from two points of view, planning and prediction (Fig. 2). During planning, experimental data are examined in order to extract specific structural information to be used in assembling candidate structures. In prediction each candidate structure is tested to determine how closely its predicted spectrum agrees with the observed Spectrum. The candidates can be ranked accordingly. The Meta-DENDRAL research is directed toward determination of rules of spectroscopic data which can be used either for planning or prediction (see below). Given possible structural fragments of the complete molecule and constraints on how these fragments may be assembled into complete molecules, a process of structural assembly follows (Fig. 1). There has been no proven algorithm for solving this problem prior to earlier work supported by tne current grant. Traditionally, this process has been left to manual, pencil and paper work. Our CONGEN program, which was designed to solve this problem, is the farthest advanced of programs designed to assist in various aspects of structure elucidation. It performs the structural assembly process, under constraints, and Privileged Communication 43 J. Lederberg Section 6.1.1 DATA INTERPRETATION PLANNING” EXTRACTION OF STRUCTURAL INFORMATION DIRECTLY FROM SPECTROSCOPIC DATA, DENDRAL PROJECT PREDICTION USE OF SPECTROSCOPIC DATA TO RANK CANDIDATE STRUCTURES, 1. Mass Spectra - MDGGEN 1. MSPRUNE, MSPRED 2, I5CNMR . 2, 13CNMR “ Meta — DENDRAL FORMATION OF RULES TO BE USED FOR BOTH PLANNING AND PREDICTION, Figure 2. Relationship between use of rules in either planning or prediction. Both approaches are used in utilizing data for structure elucidation. J. Lederberg Wy Privileged Communication DENDRAL PROJECT Section 6.1.1 allows the scientist using the program to examine structural candidates and remove those deemed implausible (Fig. 1). A large portion of our recent and future work is directed toward improving the CONGEN program and building other facilities around it (see later sections). We have demonstrated the utility of CONGEN in structural studies, and subsequent sections discuss our recent developments and applications of CONGEN as well as our interactions with other scientists desiring access to our programs. Given a set of structural candidates, the experimenter examines them to determine what experiments might be performed to focus on the correct structure by stepwise rejection of alternative hypotheses. When there are only a small number of possibilities under consideration, manual methods suffice. But CONGEN provides the capability for exhaustive enumeration of structural possibilities at a point in a structural problem when there may be many hundreds of possibilities. It is very difficult to examine these structures and plan experiments by hand. We have begun exploring ways to provide computer assistance to this important aspect of structure elucidation. We refer to this research area as the Experiment Planner, discussed in more detail below. When new experiments have been planned the researcher carries them out and uses the results as additional constraints on the structural candidates (Fig. 1). New experiments may include collecting of additional spectroscopic data or performing a sequence of chemical reactions on the unknown. The latter experiments may be chosen to convert the unknown into a related compound which possesses physical or chemical properties more amenable to analysis. During the past year we have developed a program to assist scientists in carrying out representations of chemical reactions in the computer and eliminating undesired structural candidates based on constraints exercised on the products of the reaction. This work is described in two subsequent sections. One section describes use of the program, which we call REACT, to explore structural possibilities exactly as outlined above. A later section describes recent progress in increasing the power of REACT. Medical Relevance Structure elucidation is a fundamental problem for medical practice and biomedical research. For example, we are collaborating with physicians in the Department of Pediatrics who monitor the body fluids of newborn infants in order to detect abnormal compounds. Much of the research leading to new drugs and new methods for synthesizing drugs also depends on careful analysis and identification of molecular structures of compounds. The computer tools that we are developing will aid in the determination of molecular structures by giving working scientists help with data collection, data interpretation, hypothesis testing and, most important, systematic consideration of all molecular structures that are consistent with the interpretations of the available data. Privileged Communication 45 J. Lederberg Section 6.1.1 DENDRAL PROJECT PROGRESS SUMMARY Experiment Planner We have begun preliminary considerations of design and implementation of an experiment planner. This program will assist chemists in designing the most effective set of experiments to perform to solve the structure. Although the experiment planner will be a future activity of our group, we are developing and using other structure manipulation functions which will provide groundwork for future developments. One important aspect of experiment planning is the ability to examine in Some way the set of candidate structures. Although many can be drawn for visual review, drawing is impractical when dozens or hundreds of structures are involved. To assist persons using CONGEN in reviewing their structures we have developed a function auxiliary to CONGEN which we call SURVEY. SURVEY FUNCTION: Arps IN PERCEPTION OF ANY OF A PRE-SPECIFIED SET OF STRUCTURAL _ FEATURES IN A GROUP OF STRUCTURAL CANDIDATES, E.G. A) FUNCTIONAL GROUPS B) TERPENOID SKELETONS C) AMINO ACID SKELETONS Figure 3. Function of the SURVEY program and examples of recent application areas. Tne function of SURVEY is summarized in Figure 3. SURVEY simply acts asa reminder to the scientist of the presence or absence of certain structures or Structural features. During the past year we have used SURVEY extensively. For example, we have used it to detect implausible functional groups ina set of candidate structures, using a file of substructures representing a wide variety of functionalities. In many problems, implausible functional groups are forgotten and CONGEN is never constrained to remove them. Another example of use of SURVEY is in conjunction with collaborative work with persons in the J. Lederberg N6 Privileged Communication DENDRAL PROJECT Section 6.1.1 Department of Genetics. Ina analysis of serum or urinary metabolites in patients of high risk of metabolic disorder, we have had occasion to use CONGEN in exploration of unknown structures {Report HPP-77-11]. Some of these structures could formally be conjugates of amino acids with organic acids. If so, such structures will possess backbones of naturally-occurring amino acids. SURVEY was used to provide a summary of which structural candidates possessed such amino acid skeletons. We have recently used SURVEY in a related application involving the structure of "polyalthenol", discussed by LeBoeuf, et al. (Figure 4). Superatoms and constraints supplied to CONGEN to derive structural candidates are summarized in Fig. 4. We summarize in Figure 5 the structural possibilities which resulted. There are five structures possessing a bicyclo[2.1.1] system, and six which possess a bicyclo{4.3.1] system (Fig. 5, top). These structures are energeticaly less favorable. For example, several possess a double bond at a bridgehead atom, which violates Bredt’s Rule. There remain, however, 11 structures which are not formally excluded by data presented by LeBoeuf, et al. Because these workers based their structural assignment on biogenetic grounds, we used SURVEY and REACT to test their hypothesis. We have, in computer-accessible libraries, known terpenoid ring systems which can be used within SURVEY to test sets of structures for known skeletons. Wone of the 22 structural candidates possesses a previously known skeleton. Because the authors postulated a relationship to a known skeleton via a single methyl shift, we used REACT to exercise a single methyl shift in all possible ways on each of the 22 candidates. SURVEY was then used to test the results for the presence of known terpenoid systems, and the drimane skeleton, the postulated precursor of polyathenol, was the only known skeleton which resulted. This does not prove the hypotnesis of LeBoeuf, et al., but certainly helps strengthen it. SURVEY is, however, only the barest beginning of an experiment planner, even though it has proven useful. We plan to build from this beginning toward a much more powerful system. Privileged Communication 47 J. Lederberg Section 6.1.1 DENDRAL PROJECT aN M. LeBoeuf, M. Hamonniere, A. Cave, H. Gottleib, N. Kunesch, and E. Wenkert, Tet. Lett., 3559 (1976). “POLYALTHENOL” Cost NO SUPERATOMS Argpitrary Name NumpeR FV 7 N FY CH-FV | : / | CHis-C-CH-CHp CHC BI 1 | -FY | OH YS F CH2 CHz FV CHz-FV ME 1 FY-CH5-FV CH2 3 AV FV-CH-FV CH l CONSTRAINTS 1) ALL FREE VALENCES BONDED TO NON-HYDROGEN ATOMS 2) GOODLIST IN-CH2-BI 1 To ANY (EVENTUALLY IN-CHy-CHg 5.9) ME-(BI_ CH) 1 To ANY (EVENTUALLY Chz-CH, EXACTLY 1) 3) GOODRINGS 2 exactLy 5 4) BADRINGS 3 Figure 4. Superatoms and constraints supplied to CONGEN in investigations of plausible structural alternatives to the proposed structure of Polyalthenol. J. Lederberg é ivi | 48 Privileged Communication DENDRAL PROJECT (5) \ OH i "2 IN CHoiN OH CH5IN HO HO CHIN Privileged Communication \ OH OH (CHSIN HO HO ; —_ 49 Section 6.1.1 OH (G) \ OH CH» IN CHy5IN OH CHsIN HO Figure 5. Structural candidates for polyalthenol based on data given in Figure 4. J. Lederberg Section 6.1.1 DENDRAL PROJECT REACTION CHEMISTRY DEVELOPMENTS 1, SEPARATION FROM CONGEN - coMMUNICATION VIA FILES OF STRUCTURES, 2, ADDING CONSTRAINTS - SITE - AND TRANSFORM - SPECIFIC, 3, CONTROL STRUCTURE - RAMIFICATION A, ESTABLISH RELATIONSHIPS AMONG PRODUCTS AND REACTANTS B, DEAL PROPERLY WITH RANGES OF NUMBERS OF PRODUCTS 4, INTERACTION - DEVELOP MANIPULATION COMMANDS WHICH PARALLEL LABORATORY OPERATIONS, E.G. SEPARATE INTO FLASKS, TEST CONTENTS OF VARIOUS FLASKS, INCOMPLETE SEPARATIONS, ETC, 5, REPRESENTATION OF REACTIONS 6. PROSPECTIVE DETECTION OF DUPLICATE PRODUCTS BASED ON SYMMETRY PROPERTIES OF: A) STARTING MATERIAL; AND B) TRANSFORMATION, Figure 6. Current and future direction for improvement and extension of REACT, a program for exploration of applications of reaction chemistry to structure elucidation problems. J. Lederberg 50 Privileged Communication DENDRAL PROJECT Section 6.1.1 Applications of REACT to Structure Elucidation Problems We have recently described our initial efforts toward representation of chemical reactions and their use in structure elucidation problems [Report HPP~ 76-5]. These efforts provided the framework for carrying out reactions within the computer which emulate actual laboratory reactions performed on a unknown. Constraints on the numbers and identities of the products are used to constrain the reaction products and, implicitly, the starting materials. Based on the results of that work we drew up a set of steps to be carried out to provide a truly useful tool for the chemist. Although the current program can be used in applications to real problems it has some fundamental limitations which we have been working to solve. The developments we have undertaken to improve REACT are Summarized in Figure 6. We first undertook to separate REACT from CONGEN, for two reasons. One reason was due to program size. Many funetions of CONGEN are not needed in REACT and become unnecessary when only REACT is being exercised. The procedures of structure generation (CONGEN) and REACT are sequential and a separate program introduces no problems. A second reason was the different uses of certain CONGEN functions in REACT. For example, the ways in which the graph matcher is used are different between the two programs, necessitating keeping two different versions around with the programs together. The separation has been accomplished. The current version of REACT is now a separate program. It communicates structural information with CONGEN via files. All interactive portions are consistent with the structural manipulation functions of CONGEN so that learning the structural language of CONGEN is sufficient to use either program. We have also added new constraint types to the reaction to expand greatly the ways in which reactions can be defined and constrained. An example of new extensions to reaction definitions illustrates some of the new features (Figures 7-10). The reaction defined here is one which will perform a dehydration of an alcohol; the site of the reaction is defined in Fig. 7. The transform is defined as cleavage and loss of the oxygen resulting in formation of a double bond between the two carbon atoms of the original site (Fig. 7). In this particular dehydration the chemist wished to specify a site Specific constraint. It was known that a tertiary butyl group was part of the structure, and the dehydration will be prevented if that group is in close proximity to the reaction site (i.e., in a position alpha to the earbinol earbon). The definition of this constraint is given in Figure 8. Subsequently, this constraint ("HINDERED") is placed on BADLIST for constraints specific to the site as shown in Fig. 9. The completed definition of the reaction is summarized in Figure 10. Privileged Communication 51 J. Lederberg Section 6.1.1 DENDRAL PROJECT :EDITREACT NAME : DEHYDRATION (NEW REACTION) >ATNAME 1 0 >HRANGE 111313 >ADRAW DEHYDRATION: (HRANGES NOT INDICATED) 0-C-C >DONE * TRANSFORM >UNJOIN 1 2 >JOIN 2 3 >DELATS 1 > ADRAW DEHYDRATION: CHRANGES NOT INDICATED) C=C >DONE Figure 7. Definition of reaction site and chemical transform in REACT. J. Lederberg 52 Privileged Communication DENDRAL PROJECT Section 6.1.1 *DEFINE-CONSTRAINTS 2 PLEASE ENTER ONE OF: GRIPE BUGOUT GENERAL (G) SITESPECIFIC(S) TRANSFORMSPECIFIC(T) DONE HALT s STTESPECIFIC NAME: HINDERED (NEW CONSTRAINT) (WARNING: THE FINAL CONSTRAINTS MUST HAVE AT LEAST ONE ATOM OF THE SITE) >NDRAW HINDERED: (HRANGES NOT INDICATED) NON-C ATOMS: 1 0 1-2-3 >BRANCH 324141] >ADRAW HINDERED: (HRANGES NOT INDICATED) r O-C-C-F-€ C >DONE Figure 8. Definition of a site-specific constraint to be applied to the reaction DEHYDRATION. Privileged Communication 53 J. Lederberg Section 6.1.1 DENDRAL PROJECT *CONSTRAINTS 1? PLEASE ENTER ONE OF: GRIPE | BUGOUT ST FOR CONSTRAINTS ON STARTING MATERIAL ~S FOR SITESPECIFIC CONSTRAINTS -T FOR TRANSFORMSPECIFIC CONSTRAINTS PR FOR CONSTRAINTS ON PRODUCTS DONE HALT :S >BADLIST BADLIST CONSTRAINTS CONSTRAINT NAME:HINDERED CONSTRAINT NAME: Figure 9. Specification of constraint named HINDERED as a BADLIST constraint for the reaction. J..Lederberg 54 Privileged Communication DENDRAL PROJECT SITE: NAME=DEHYDRAT LON ATOM# TYPE ARTYPE NEIGHBORS HRANGE 1 Q NON-AR 2 1-1 2 C NON-AR 1 3 3 - C€ NON-AR 2 1-3 DEHYDRATION: (HRANGES NOT INDICATED) NON-C ATOMS: 1 0 1-2-3 TRANSFORM: - UNJOIN 1 2 JOIN 2 3 DELATS 1 DEHYDRATION: (HRANGES NOT INDICATED) 2=3 CONSTRAINTS: CONSTRAINTS ON STARTING MATERIAL: NO CONSTRAINTS SITE-SPECIFIC CONSTRAINTS: BADLIST CONSTRAINTS NAME HINDERED TRANSFORM-SPECIFIC CONSTRAINTS: NO CONSTRAINTS CONSTRAINTS ON PRODUCTS: NO CONSTRAINTS “DONE (DEHYDRATION DEFINED) (DEHYDRATION ADDED TO THE REACTION LIST) Privileged Communication 55 section 6.1.1 Figure 10. Summary of the completed definition of the DEHYDRATION reaction. J. Lederberg Section 6.1.1 DENDRAL PROJECT The remaining items summarized in Figure 6 are currently under development. We are redesigning the control structure so that the scientist using the program can use intuitive concepts as commands, such as separation. To carry this out important parts of the current mechanism have to be redesigned. Although the current program can be used effectively, its non-intuitive approach to dealing with reactions yielding multiple products and subsequent separation (within the computer) and analysis of each product presents a barrier to use by a wider community. We are continuing to develop our capabilities for representing reactions to ensure that the user of REACT has a complete descriptive language with which to specify reactions. We continue to study ways to avoid duplication in carrying out reactions. We know how to implement certain of the symmetry- related constraints and will do so shortly. CONGEN Developments The problem solving paradigm that has emerged from DENDRAL work is the so- called "plan-generate-test" paradigm. It is based on heuristic search of a space of possible hypotheses with planning before generation of hypotheses and testing of each generated candidate. The generator for DENDRAL, named CONGEN, is a general-purpose graph generator which produces a list of all possible graphs containing specified numbers of nodes of various types. The most important features of the generator are that the list of graphs is guaranteed to be complete and non-redundant and, equally important, that the list need not be exhaustively generated. The generator can be constrained to produce only graphs that meet specified criteria that are inferred from the initial problem data. During the past year, CONGEN has developed along two major lines: 1) tools have been developed which will allow more efficient and "intelligent" use of substructural information supplied by the chemist; and 2) data from chemical reactions and from observed mass spectra can be used to eliminate unlikely structural candidates from a set produced by a CONGEN generation. These extensions will be discussed below. 1) Intelligent use of substructural information as constraints There is sometimes a significant conceptual gap between the intuitive chemical phrasing of a CONGEN problem and the phrasing which is most efficient, in both computer time and storage requirements, for the program. CONGEN provides a rich language for stating structure elucidation problems in precise Substructural terms. However, there are usually many ways of defining a given problem and different definitions can place widely different demands upon the program. We have a continuing interest in reducing this conceptual gap by in making CONGEN responsible for rephrasing a problem in the most efficient way, thus freeing the chemist to concentrate upon the chemical, rather than the algorithmic, aspects of a given case. One distinction which is frequently puzzling to new CONGEN users is the one between superatoms and GOODLIST items. A superatom is a polyatomic "building block" which CONGEN joins with other superatoms and single atoms to form full J. Lederberg 56 Privileged Communication DENDRAL PROJECT Section 6.1.1 structures. GQODLIST items are substructures which are required to be present in those full structures, but they are not incorporated directly into the initial phrasing of a problem as are superatoms. Rather, their presence or absence is tested by a graph-matching routine after the structures are produced. Frequently, a great many structures produced by the structure generator are discarded by this final test and a significant amount of the program’s time can be spent "shooting blanks”. The concepts behind these two types of constraints - that specified substructural features must be present ~ are Similar, but their implementations differ substantially in efficiency. GOODLIST items cannot simply be transferred to the superatom list, though, because GOODLIST items are allowed to share atoms and bonds with other GOODLIST items or with superatoms. For example, if two substructures which are benzene rings are placed on GOODLIST, then a naphthalene derivative will be an acceptable structure even though the two occurrences of the ring have two atoms and one aromatic bond in common. Because of the building-block nature of superatoms, they may be joined to one another by additional bonds in CONGEN, but never "merged" (i.e, overlapped). Thus the price of efficiency is a more restricted interpretation of structural possibilities for superatoms. We have developed a new procedure which captures the best of both Situations. In order to incorporate a GOODLIST substructure into the problem at the earliest stage, it is necessary to find all unique ways that the given substructure can be created using parts of the existing building blocks (atoms and superatoms). This produces a set of new CONGEN problems with more or larger Superatoms, each of which is easier to solve than the original one because the GOODLIST item is built-in and needs not be tested. Figure 11 shows schematically some of the ways this construction might occur: a) by bonding together two (or more) existing superatoms to create one larger one; b) by bonding additional atoms to a superatom to create a larger one; and c) by constructing a copy of the substructure from single atoms, creating a new superatom. Figure 12 summarizes a CONGEN problem which was attempted but which could not be completed because of the unintelligent use of GOODLIST. The problem amounts to finding all ways of allocating three new bonds to the free valences (the bonds with unspecified termini) in the superatom CEMB such that the three indicated substructures are present in the final molecules. There are perhaps 19,000 unique allocations of those three new bords, but only 7 pass the GOODLIST tests. Using GOODLIST as a post-test only, CONGSN would generate all 10,000 and discard nearly all of them, a process which would have been so lengthy that it was never completed. The constructive graph-matching routine approaches the problem in a much more efficient and chemically intuitive way: 1) there are only three places in which the first GOODLIST item cen be constructed; 2) for each of these, there are four ways of constructing the second; and 3) for each of these, there are 0, 1 or 2 ways of incorporating the third. It quickly arrives at the correct set of solutions. Host CONGEN problems contain one or more GOODLIST items which can be processed in this way, and when the constructive graph-matcher is fully integrated into CONGEN, it will make a substantial difference in its ability to use this structural information effectively. Privileged Communication 57 J. Lederberg Section 6.1.1 Figure 11. CH3 ] Cemb: + Hy f Hoc ae CH, fo~ CH, CH>0H GOODLIST: | CH3-C=CH-CHe- CFB -G=CH-CH- HC Example of breaking one GOODLIST substructure into several Ssubproblems for CONGEN, each with different superatoms. J. Lederberg 58 DENDRAL PROJECT Privileged Communication DENDRAL PROJECT SUPERAT OMS ATOMS CONGEN PROBLEM GOODLIST ENTRY CONSTRUCTIVE SUBSTRUCTURE SEARCH NEW ne CONGEN C_) rd ) oO PROBLEMS — koxy Oo ST} ove OE ETC, Section 6.1.1 SUPE2 ATOMS ATCRS ' CCC r 2 GH + GHe ETC, Figure 12. Example showing the inefficiency of specifying a constraint as a GOODLIST item instead of analyzing its implications for constructing allowable chemical graphs. Privileged Communication 59 J. Lederberg Section 6.1.1 DENDRAL PROJECT 2) New tools for post-pruning CONGEN structures. From aa algorithmic standpoint, CONGEN is successful if it ean, ina reasonable amount of of time and without exhausting storage resources, produce a list of candidate structures satisfying the chemist’s constraints. However, this list is often quite large, perhaps several hundred structures, and from a chemical standpoint the problem may be far from complete. It remains for the chemist to discriminate among the candidates, eventually reducing the possibilities to just one structure. A SURV&Y funetion is available for classifying the list into groups of chemically related structures using either pre-defined or user-defined libraries of substructural features, and this process can help the chemist perceive groups which might easily be ruled out by additional experiments. Also, the graph-matching (pruning) mechanism of CONGEN allows him to express, in terms of substructural tests on the candidates, new data which he gathers on the unknown. These are both important aids in dealing with a list of candidates, but are restricted to tests which can easily be phrased purely in terms of structural features of the candidates themselves, There are two informative sources of data which cannot always be phrased in this way: 1) structural features observed in products of the unknown when it undergoes simple chemical reactions; and 2) empirical spectroscopic measurements on the unknown which cannot be interpreted unambiguously in precise structural terms. During the past year, we have made progress in utilizing such information. The program REACT addresses the first problem while MSRANK concerns the second, in the context of mass spectrometric observations. 2.1 REACT This program [see Report HPP-76-5] has two basic goals: 1) to provide the chemist with a computerized language for defining graph transformations and applying them to structures, thus simulating chemical reactions; and 2) to automatically keep track of the interrelationships between structures in a conplex sequence of reactions so that whenever structural claims are made ruling out structures at one level, the implications in terms of structures at other levels can traced. During the last year sone progress has been made toward both of these goals. EDITREACT, the reaction-editing language, has been extended to allow the user to define subgraph constraints which apply relative to a potential reaction site rather than to the molecule as a whole. For example, in the present version of REACT, we can say either that a hydroxyl group (OH), if present anywhere in the reactant molecule, would inhibit the reaction, or that such inhibition would take place only if the OH group is adjacent to the reaction site. Such site- specific constraints, applied either before or after the transformation (i.e., reaction) has been carried out on the site, are critical to the detailed description of real chemical reactions. The inclusion of this facility in REACT sudstantially increases its usefulness in real-world chemical problems. The pookkeeping problem has undergone a complete reconceptualization in the past year, the purpose being to mimic more closely the actual steps taken by a chemist in the laboratory. In the initial implementation, a set of products arising from the application of a given reaction to a given starting structure J. Lederberg 60 Privileged Communication DENDRAL PROJECT Section 6.1.1 could be subjected to a multi-level classification which grouped the products based upon user-defined substructural constraints. Each of these classes had an associated minimum and maximum number, representing the numbers of products which were allowed to be members of the class. Any starting materials whose products could not satisfy these conditions were removed from the list of candidates. Structures in any class could be further reacted, their products classified, and so on. This treatment of bookkeeping was sufficient for stating many chemical problems. For example, suppose a chemist knew that a particular reaction on an unknown compound yielded two carbonyl compounds (i.e., containing C=0), at least one of which was an ester (-0-C=0). He could define a product class CARBONYL using the C=0 substructure with a minimum and maximum of two products. He could then define a sub-class of CARBONYL called ESTERS using the substructure -0-C=0 with a minimum of one and a maximum of two products. The program would automatically use this information to eliminate candidate starting structures which could not give the indicated product distribution with the given reaction. There are chemical problems, though, for which the above scheme is too rigid. For example, suppose a reaction gives several products, two of which are isolated and labelled P1 and P2. Suppose that only a small amount of P1 is available so only mass spectroscopic measurements are practical. Suppose also that a deuterium-exchange experiment shows that P1 has two exchangable protons (say, either N-H or 0-H). P2 shows a strong carbonyl absorption in the IR. P1 might also contain a carbonyl group, but that was never determined, and neither was the number of exchangable protons in P2, which could be two. No matter how one attempts to use the above-described classification system, one cannot express this information accurately. In the new approach, for which the algorithmic design has been completed, one is allowed to express data in a much more natural sequence which parallels the experimental steps. The first experimental step after a reaction is usually the separation and purification of products. An analogous step is to be included in REACT, in which the separation amounts to the setting up of a specified number of labelled "flasks" (analogous to the labels P1 and P2 in the above example) each of which is ultimately to contain a specified number (usually 1) of the products. As experimental data are gathered on each real product, corresponding Substructure constraints are attached to the corresponding flask in the program. AS each such assertion is made, the bookkeeping mechanism verifies that, for a set of reaction products from a given starting material, there is at least one way of distributing them among the flasks such that each product satisfies the constraints for its flask. If this test is ever violated, the starting material is removed as a candidate structure. Flasks containing more than one product may be further separated into "subflasks" to any level, and the contents of any flask may be made to undergo further reactions. This capability, the reacting of flask contents, is analogous to common laboratory vrocedures in which incomplete separations of products are encountered. Dealing with such situations adds considerable complexity to the bookkeeping mechanism, because the contents of a flask may be ambiguous to the program when the reaction is applied. REACT must keep track of all possible structures which might, based on the current flask constraints, occupy the reacting flask. If such a reaction fails (because the products did not satisfy the constraints specified for them), REACT does not eliminate the starting structure entirely, but notes that the structure may not occupy that flask in future flask-allocation tests. Privileged Communication 61 J. Lederberg Section 6.1.1 DENDRAL PROJECT | 2.2 MSRANK This program is an outgrowth of MSPRUNE described in last year’s annual report. It is a combination of a predictor which uses a very simple theory of mass spectrometry to predict the spectra of candidate structures, and an evaluation function which compares the predictions with the observed spectrum of the unknown, assigning a goodness-of-fit score to each candidate. The candidates are then sorted based upon how well they match the observations. The basic concept here is not a new one to the DENDRAL project [see, for example, Buchanan, et al. in Machine intelligence 4 (Meltzer & Michie, eds., Edinburgh Univ. Press, 1969)], but there are some new aspects to the problem when viewed in the overall CONGEN context. Because of the wide variety of structural types which can be produced by CONGEN, it is necessary for MSRANK to use a very general model of mass Spectrometry. The best predictive theories of mass spectrometry are limited to families of closely related structures (i.e., class specific theories), and the Meta-DENDRAL program is designed to help in discovering such theories. There are very few general principles upon which to draw in predicting mass spectra, though, so MSRANK is limited to only the most approximate kinds of evaluation functions. One principle which we noticed being used by practicing mass Spectrometrists was: of two candidate structures for an unknown, the most likely Structure is the one which explains the observations most "simply" - i.e., with the fewest complex explanations involving many bond cleavages and the transfer of many hydrogen atoms. The evaluation function used by MSRANK is based on a quantitation of this principle. MSRANK is quite new and we have not yet had sufficient experience with it to evaluate its overall usefulness. By using only unit plausibilities for selected characteristics of the mass-spectral cleavages, we are able to duplicate earlier results obtained with the predictor/comparitor functions applied to mono- and di~ketoandrostanes. These tests serve to check the accuracy of the MSRANK program, We are now doing a systematic study of various classes of compounds by ranking the spectrum of a known structure against a CONGEN-generated list of structures which contains the correct one among several which are closely related. stereochemistry in CONGEN We have started the complex task of giving CONGEN the capability of recognizing stereochemical features of molecules and using stereochemical information in structure determination. The ability to recognize stereochemical features would allow, for example, the generation of all stereoisomers of a given topological structure with or without constraints. The ability to use stereochemical information would allow the determination of constraints on stereoisomer (and topological isomer) generation caused by, for example, partial knowledge of relative or absolute stereochemistry of structural fragments, knowledge of overall molecular chirality (or lack of), absolute and relative J. Lederberg 62 Privileged Communication DENDRAL PROJECT Section 6.1.1 Stereochemistry from circular dichroism measurements, and so forth. Thus far, only the topological information (constitution) has been recognized and used by CONGEN. The first stage of this development is to produce a program which generates all the stereoisomers of a given topological structure. This program will be placed at the end of the existing CONGEN program. The present report describes the development of the theory and algorithm for stereoisomer generation and the progress on the programming of this algorithna. The GC/HRMS DATA SYSTEM New Developments In addition to upgrading old versions of the high resolution system, work is being done on creating a low resolution system for the MAT 711. The ultimate aim is collect data that can be run through CLEANUP, a program that resolves multiple spectra under a single GC peak, and cleans up the final spectra. The problem with the current system is that we cannot scan fast enough to provide CLEANUP the data it needs. The high resolution system requires resolution good enough to separate sample peaks from the reference peaks. If the scan is sped up past a certain point, SAMRUN can no longer separate the peaks, and therefore cannot calibrate the run. At the same time, CLEANUP requires at least 7 spectra across a GC peak be taken to insure resolution of multiple spectra. The fundamental problem then is that an alternate method of calibrating the mass Spectrum, without using known calibration peaks, must be found before sean speeds required by CLEANUP can be achieved. Tne most direct solution to this is to directly measure the magnetic field strength of the instrument, and using it to calculate the mass that is being observed. To do this we inserted a hall probe between the poles of the magnet, and connected it to the data acquisition system on the PDP-11/20. The main problems with the hall probe are as follows: 1) to make sure that the ion reading and the hall probe reading are simultaneous 2) to insure that the correct hall reading can be assigned to the correct ion reading 3) to determine the reproducibility of hall readings versus mass being observed in both dynamic (scanning) and static situations and 4) to decide if the probe has the speed and accuracy to calibrate the instrument. The first two problems are a matter of hardware. The configuration of the original data collection system is as follows: the ion detector goes to an A/D converter, which is connected to a DMA. Tne DMA is on an 11/20, which has a data collection system, SAQMON, running. This performs various low level filtering and buffering operations. The DMA is actually a low level processor which counts the number of samples taken, stores them into successive memory locations, and interrupts the central processor when a block of data has been collected. The timing of the sample collection is controled by a quartz crystal clock. On each timing pulse, a signal is sent to Che A/D on the ion detector to convert that value to a digital number. To Privileged Communication 63 J. Lederberg Section 6.1.1 DENDRAL PROJECT accommodate the hall probe, the DMA was modified so that on the timing pulse, the start signal is sent simultaneously to botn the A/D on the ion detector and the A/D on the hall probe. The DMA then services both of the A/D’s, and stores the readings in successive memory locations. The net result is that when the DMA interrupts the central processor, the block of data is a set of pairs of readings, an ion reading and the hall reading for that time. This solves both of the first two problems, since we now have the ion reading and the hall reading connected both in time and location. The second two problems, testing the reliability and reproducibility of the hall probe, requires new software. We are currently modifying portions of the calibration mechanism of the high resolution system to calculate masses for a large number of hall readings. META DENDRAL The success of any reasoning program is strongly dependent on the amount of domain-specific knowledge it contains. This is now almost universally accepted within AL, partly because of DENDRAL’s success. Because of the difficulty of extracting specific knowledge from experts to put into the progran, many years ago we began to explore the problems of efficiently transferring knowledge into a program. We have looked at two alternatives to "hand-crafting" each new knowledge base: interactive knowledge transfer programs and automatic theory formation programs. In this enterprise the separation of domain-specific knowledge from the computer programs themselves has been a critical component of our success, One of the stumbling blocks with the interactive knowledge transfer programs is that for some domains there are no experts with enough specific knowledge to make a high performance problem solving program. We were looking for ways to avoid forcing an expert to focus on original data in order to codify the rules explaining those data because that is such a time-consuming process. Therefore we began working on an automatic rule formation program (called Meta- DENDRAL) that examines the original data itself in order to discover the inference rules for that part of the domain. The problem solving paradigm for Meta~DENDRAL is also the plan-generate- test paradigm used in Heuristic DENDRAL. In this case one part of the program (RULEGEN) generates plausible rules within syntactic and semantic constraints and within desired limits of evidential support. Tne model used to guide the generation of rules is particularly important since the space of rules is enormous. The planning part of the program (INTSUM) collects and summarizes the evidential support. The testing part (RULEMOD) looks for counterexamples to rules and makes modifications to the rules in order to increase their generality and simplicity and to decrease the total number of rules. Meta-DENDRAL successfully formulated rules of mass spectrometry that were new to the science. These rules, along with a discussion of the methodology, J. Lederberg 64 Privileged Communication DENDRAL PROJECT Section 6.1.1 were published in the scientific literature [Report HPP-76-4]. The program was tested to see if it could rediscover the rules of mass spectrometry for two classes of chemical compounds that were already well understood (amines and estrogenic steroids). Then it was applied to three classes of compounds whose mass spectrometry was not as well known (mono-, di-, and tri-ketoandrostanes). The program produced three sets of rules that explained much of the significant data for these classes. The time for manual rule formation for these data was estimated to be several months. Progress was made on generalizing the Meta-DENDRAL program, and rules for a new domain were successfully discovered by the program. A scientific paper on this application was submitted for publication [Report HPP-~77-4]. The new application was learning rules for interpreting signals from C13-NMR Spectroscopy. The instrument produces data points in a bar graph in response to the resonance of each carbon-13 nucleus in the sample. The rules deseribe an environment of a C13 atom and predict a resonating frequency range for every atom that matches the description. The Meta-DENDRAL program needed some modification because the rules are predicting ranges of data points, and not precise processes, as for the mass spectrometry version. The RULEGEN component of Meta-DENDRAL was demonstrated to work with its heuristic search paradigm. Guidance from a model of mass spectrometry is an important feature of RULEGEN. Also, the program uses problem data for pruning possible rules (and all more specific rules formed from those). The amount of data examined during the search is very large and the space of rules is immense, so the search needs to be rather coarse in order to produce plausible, but not necessarily optimal, rules. The RULEMOD program for "fine-tuning" Meta-DENDRAL’s newly-discovered rules was finished. This program provides a number of important subtasks, including merging similar rules, making rules more specific or more general, and filtering out the weakest rules. RULEMOD checks for counterexamples to rules and uses this information in all of the named tasks. Because of the expense of computing counterexamples to possible rules, this computation is delayed until Meta-—DENDRAL has a set of plausible rules, rather than computing counterexamples on each possible rule examined in the search of the rule space. A report was written on the AI methodology underlying Meta~DENDRAL The major idea developed in this report is that knowledge of the domain can be used effectively to guide a learning program. The major difference between Meta- DENDRAL and statistical learning programs is that Meta-DENDRAL uses a strong model of mass spectrometry, including any assumptions the user cares to make about the domain, to guide the formation of explanatory rules. C13 NMR SPECTROMETRY 13C NMR was selected aS a new application area for the rule formation program, Meta-DENDRAL. The algorithms used for mass spectrometry rule formation Privileged Communication 65 J. Lederberg Section 6.1.1 DENDRAL PROJECT were extended to 13C NMR and used to obtain a set of rules for These two classes and acyclic amines. These two classes were chosen since compounds in these classes are known to show a strong correlation between structural environment and snift. Thus, the programs could be tested knowing that the underlying basis for the form of the rule was valid. The form of the rule is substructure ~--> shift range. A sample rule generated is C-C#-C-X- ---> 19.85<= (delta sub C)<=21.3. The asterisk in the substructure description denotes the atom for which the shift is predicted. Only topological descriptors were used to construct the substructures. The addition of stereochemical terms is a topie of current work. It was necessary to change RULEGEN so that the left-hand sides of rules were expanded outward from a carbon atom rather than from a bond. The right-hand side of the rule is associated with a range rather than a precise mass as in the mass Spectrometry program. This modification also required changes in the rule search procedure. The user sets two parameters which guide the rule search. These parameters are MINIMUM-EXAMPLES which requires each rule to explain a given number of peaks in the training set and MAXIMUM-RANGE which defines the acceptable snift range for a rule. These parameters regulate the degree of specificity or generality of the rules. From the set of rules generated a subset is selected corresponding to the "best" set which still covers all the training set data. The best rule is Selected by calculating (number of peaks predicted/(range #* 2)). Data which are predicted by the best rule are removed and the next best rule is found for the remaining data using the criterion given above. This process is repeated until all data are explained. In order to test the informational content of the rules generated a second program was written wnich applied the rules to a list of candidate molecules and ranked the molecules. Firsts, all possible structural isomers for a given empirical formula were generated using CONSEN. The rules were applied to each of the possible isomers and spectra were predicted. The predicted spectra were compared to that of a known spectrum from a compound with the same empirical formula. Tne structural isomers were ranked according a comparison score to determine how well the correct compound was distinguished from its isomers, on the basis of the predictive rules. The details of the generation of rules and the use of rules for structure selection can be found in a paper recently submitted for publication {Report HPP. 77-4] J. Lederberg . 66 Privileged Communication DENDRAL PROJECT Section 6.1.1 The 13C NMR rule formation program was applied to a set of paraffins and acyclic amines. The program generated 138 rules to cover 435 data peaks. The rules generated were applied in a structure selection test for the structural isomers of C9H20 and C6H15N. No structures with these empirical formulas were included in the training set. Twenty-four C9H20 and eleven C6H15N 13C NMR Spectra were available to act as unknowns in the structure selection test. The results of the structure ranking applied to these spectra are shown below. EMPIRICAL NUMBER OF NUMBER OF CANDIDATES FORMULA CANDIDATE ISOMERS RANKING 1st and..... 6th.....- gth C9H20 35 20/24 3/24 1/24 C6H15N 39 8/11 2/11 1/11 The performance of the rules in discriminating among similar structures not included in the training set data demonstrated the content of the rules. FUNDING STATUS Renewal of funding for three years was just received for NIH Grant RR-00612 from the Biotechnology Resources Program (May, 1977 - April, 1980). The award for 1977-78 is approximately $193,000. In addition, support for the basic artificial intelligence research on which this work is grounded is provided by the Advanced Research Projects Agency of the Department of Defense (ARPA Contract DAHC-15-7 3-C-0435). A new two-year contract was just negotiated for the period July, 1977 - June, 1979. RECENT PUBLICATIONS (Only publications related to computers in chemistry are shown.) HPP-76~1 D.H. Smith, J.P. Konopelski and C. Djerassi, “Applications of Artificial Intelligence for Chemical Inference. XIX. Computer Generation of Ton Structures", Organic Mass Spectrometry, 11: 86, (1976). HPP-76~2 Raymond E. Carhart and Dennis H. Smith, "Applications of Artificial Intelligence for Chemical Inference XX. Intelligent Use of Constraints in Computer-Assisted Structure Elucidation", Computers In Chemistry (in press). HPP-76-3 C.3. Cheer, D.H. Smith, C. Djerassi B. Tursch, J.C. Braekman and D. Privileged Communication 67 J. Lederberg Section 6.1.1 DENDRAL PROJECT Daloze, “Applications of Artificial Intelligence for Chemical Inference XXI. Chemical Studies of Marine Interbrates - XVII. The Computer- Assisted Identification of [+]-Palustrol in the Marine Organism Cespitularia sp., aff. subviridis". Tetrahedron. 32:1807, Pergamon Press, (1976). HPP-76-4 B.G. Buchanan, D.H. Smith, W.C. White, R.J. Gritter, E.A. Feigenbaum, J. Lederberg, and Carl Djerassi, "Application of Artificial Intelligence for Chemical Inference XXII. Automatic Rule Formation in Mass Spectrometry by Means of the Meta-DENDRAL Program", Journal of the American Chemical Society, 98: 6168 (1976). HPP-76-5 T.H. Varkony, R.E. Carhart and D.H. Smith, "Applications of Artificial Intelligence for Chemical Inference XXIII. Computer-Assisted Structure Elucidation. Modelling Chemical Reaction Sequences Used in Molecular Structure Problems", in "Computer-Assisted Organic Synthesis", W.T. Wipke, Ed., American Chemical Society, Washington, D.C., in press. HPP-76-6 D.H. Smith and R.E. Carhart "Applications of Artificial Intelligence for Chemical Inference XXIV. Structural Isomerism of Mono and Sesquiterpenoid Skeletons 1,2-", Tetrahedron, 32:2513, Pergamon Press (May 1976). HPP-76-10 Bruce G. Buchanan and Dennis Smith, "Computer Assisted Chemical Reasoning", in Proceedings of the III International Conference on Computers in Chemical Research, Education and Technology", Plenum Publishing, (1976). HPP-77-4 T.M. Mitchell and G.M. Schwenzer, “Applications of Artificial Intelligence for Chemical Inference. XXV. A Computer Program For Automated Empirical 13C NMR Rule Formation", (Submitted to JACS, January 1977). HPP-77-65 Bruce G. Buchanan and Tom Mitchell. "Model-Directed Learning of Production Rules", Submitted to the Proceedings for the Workshop on Pattern-Directed Inference Systems in Hawaii, (February, 1977). (STAN- CS-77-597 ) HPP-77-11 Dennis H. Smith and Raymond E. Carhart, "Structure Elucidation Based on Computer Analysis of High and Low Resolution Mass Spectral Data", Proceedings of the Symposium on Chemical Applications of High Performance Spectrometry. University of Nebraska, Lincoln, (in press). If. INTERACTION WITH THE SUMEX-ATM RESOURCE The number of persons experimenting with CONGEN has grown as a result of both the continuing practice of issuing an “invitation for program trial use" at the conclusion of publications, as well as continuing personal contact between J. Lederberg 68 Privileged Communication DENDRAL PROJECT Section 6.1.1 Dendral project members and potential program users. Three categories of users make up this group: Chenists Using Exported Programs The part of CONGEN responsible for teletype output of chemical structures (the DRAW program) is coded in Fortran. Since the paper describing this program appeared in print [R. Carhart, JACS, 16:82, 1976]. we nave exported the program to half a dozen sites, ranging from Japan, across North America, to England. Similarly, the entire CONGEN program, is largely coded in Interlisp and SAIL, and has been exported to a collaborator in England who is very interested in the methods and programming techniques employed in coding the program. Another program which we have exported for use by other chemists is the PDP-11 CLEANUP program which was described in ANALYTICAL CHEMISTRY [48:1368, 1976]. This program "cleans up" new GC/MS data to eliminate noise peaks and to separate the data associated with components in the mixture. In each case, the requestors were provided with an initial choice of format options from which they could select the one most suitable for their eomputer installation. They were asked to send a 2400 foot reel of magnetic tape appropriate to the selected format option. The programs were written on the tape and returned to them along with a brief written explanation of program organization. Accurate records are kept of who has received the programs, so that omissions and errors can be corrected by mail at a later date, if ever necessary. 1. Dr. James F. Elder, Dow Chemical U.S.A., Midland, Michigan. 2. Dr. Robert M. Supnik, Massachusetts Computer Associates, Inc., Wakefield, Massachusetts. 3. Mr. Dan Pearce, Orange County Sheriff-Coroner Department, Santa Ana, California 92702 4. Dr. H. J. Stoklosa, Central Research & Development Department, E. I. du Pont de Nemours & Company, Wilmington, Delaware. 5. Dr. Douglas W. Kuehl, Environmental Research Laboratory~Duluth, Duluth, Minnesota. 5. Dr. Richard A. Graham, Food Sciences Laboratory, U. S. Army Natick Laboratories, Natick, Massachusetts. 7. Dr. Walter M. Shackelford, United States Environmental Protection Agency, Environmental Research Laboratory, Athens, Georgia. 8. Dr. Richard Gans, Chemical Research Division, American Cyanamid Company, Bound Brook, New Jersey. 9. Dr. John C. Marshall, Department of Chemistry, the University of North Carolina, Chapel Hill, North Carolina. 19. Dr. Graham S. King, Department of Chemical Pathology, Queen Charlotte’s Hospital for Women, London, England. Privileged Communication 69 J. Lederberg Section 6.1.1 DENDRAL PROJECT 11. Dr. J. Wyatt, Chemistry Division, Naval Research Laboratory, Washington, D. C.. 12. Dr. Gareth Templeman, Research and Development Laboratories, The Pillsbury Company, Minneapolis, Minnesota. 13. De. J. B. Justice, Department of Chemistry, Emory University, Atlanta, Georgia. 14. Dr. Thomas Knudsen, Northrop Services, Environmental Sciences Group, Research Triangle Park, North Carolina. 15. Dr. Ingolf Meineke, Fachbereich Chemie, Philipps Universitaet, Lahnberge, West Germany. 16. Dr. M.A. Shaw, Unilever Research, Port Sunlight Laboratory, Wirral, Merseyside, England. 17. Dr. Ernst Weber, Varian MAT, Bremen, West Germany. 18. Paul V. Fennessey, Department of Pediatrics, University of Colorado Medical Center, Denver, Colorado. 19. R. G. A. R. Maclagan, Department of Chemistry, University of Canterbury, Christchurch, New Zealand. 20. James E. Oberholtzer, Arthur D. Little, Inc., Cambridge, Massachusetts. 21. F. Street, AEI Scientific Apparatus Limited, Manchester, England. Remote Users of SUMEX Due to the fact that the SUMEX computer is available via both the TYMNET and ARPANET communication networks, it is possible for scientists in many parts of the world to directly aecess the Dendral programs on SUMEX. Primary usage is centered on CONGEN, although INTSUM is beginning also to gain a following. Although access points to SUMEX are widespread, they frequently are not diverse enougn to accommodate the dispersed group of scientists who have expressed an interest in using one of the Dendral programs. For example, Dr. Joseph Baker of the Roche Institute of Marine Pharmacology in Dee Why, Australia, is looking at the possibility of accessing SUMEX by using International Direct Distance Dialing (IDDD). Cnoemists Communicating by Mail Many Scientists interested in using DENDRAL programs in their own work are not located near a network access point. -Users of this type choose to use the mail to send details of their structure elucidation problem to a Dendral Project eollaborator at Stanford. J. Lederberg 70 Privileged Communication DENDRAL PROJECT Section 6.1.1 Chemical Problems Posed to CONGEN Following is a list of CONGEN users, and a brief summary of their program interests during the past year. 1. Dr. Roger Hahn, Syracuse University. While at Stanford he used CONGEN to help solve the structures of photoproducts by obtaining all possibilities under available constraints and designing NMR experiments to differentiate the possibilities. This work will be published soon, Dr. William Epstein, University of Utah. During a demonstration of CONGEN, he posed a problem to verify that the structural possibilities he determined for an unknown were in fact all possibilities. The structure of methyl santolinate has been published (see Epstein, et al., J.C.S. Chem. Commun., 590 (1975)). Dr. Clair Cheer, University of Rhode Island. While on sabbatical at Stanford, Dr. Cheer has worked on a number of structure elucidation problems using CONGEN including Briareine D and [+]-Palustrol (Cheer et al., Tetrahedron Letters, 1807 (1976)). Work is continuing on the structure of another marine natural product, presumably a cembrenolide, for which there are currently seven possibilities. Dr. Jerrold Karliner, Ciba-Geigy Corporation. Dr. Karliner has solved several structural problems using CONGEN, including material with flame retardant properties, an impurity in a production sample and nitrogen heterocycles being investigated for pharmacological activity. CONGEN enabled reduction of the number of possibilities to the point where subsequent experiments led to unambiguous structural assignment. Dr. Gino Marco, Ciba-Geigy Corporation. He has used CONGEN to help solve structures of conjugates of pesticides witn sugars and amino acids. Dr. Milton Levenberg, Abbott Laboratories. He has worked on the structure of a compound with mild antibiotic activity, isolated from a fermentation broth. There are currently ten structural possibilities, reduced to that number from the 33 initially determined using CONGEN by additional experimental data. Dr. David Pensak, DuPont. He is currently learning to use CONGEN and plans to evaluate its utility for structural problems of some of his coworkers. Dr. Douglas Dorman, Eli-Lilly. He is using CONGEN to assist in structure elucidation of metabolites of microorganisms shown to have pharmacological activity. He has worked on five such problems, including a current one where the developing MSPRUNE capabilities are being used. Dr. L. Minale, Napoli, Italy. We have worked with him by sending him Privileged Communication 71 J. Lederberg Section 10. 11. 12. 13. TH. 15. 16. 17. 18. 6.1.1 DENDRAL PROJECT structural alternatives for proposed structures for some marine natural products (Pallescensins, Tetrahedron Letters, 1417 (1975)) and cyclic diethers from the lipid fraction of a thermophilic bacterium (J. C. S. Chem. Commun., 543 (19748)). Dr. K. Nakanishi, Columbia University. We have worked with him by sending him structural possibilities for termite defense compounds (structure finally solved by X-ray crystallography). This trial plus a live demonstration to one of his students has resulted in efforts toward continued collaboration on cther insect defense secretions and exploration of the possibility of his direct access to SUMEX. Dr. L. Dunham, Zoecon Corporation. We have collaborated with him on the use of INTSUM for mass spectral fragmentation studies of insect juvenile hormones. Dr. A. G. Gonzales, Tenerife, Spain. We have recently sent him structural alternatives for constituents of Laurencia Perforata (Tetrahedron Letters, 2499 (1975)), and expect to continue discussions on the structures of these compounds. Dr. T. Irie, Sapporo Japan. We have recently sent him structural alternatives to published structures on constituents of Laurencia Glandulifera (Tetrahedron Letters, 821 (1974)) and expect to continue discussions on this problem. Dr. C. J. Persoons, Delft. We have corresponded with him on structural alternatives for cockroach sex pheremones (Periplanone-B (Tetrahedron Letters, 2055 (1976)), and he has agreed to further collaboration on new problems. Dr. F. Schmitz, University of Oklanoma. We explored for hin structural alternatives for an unknown diterpenoid nydrocarbon. We obtained 25 possibilities, of which only four obeyed the isoprene rule. Dr. J. Baker, Roche Institute of Marine Pharmacology, Australia. We plan collaboration with Dr. Baker on the sterol fractions of various marine organisms and are exploring ways for him to access CONGEN. Dr. BE. VanTamelen, Stanford University. We have used the developing reaction features of CONGEN to explore structural possibilities for both chemical and biogenetic cyclization products of squalene-oxide congeners. We have suggested alternatives to proposed structures and helped to design experiments to differentiate them. Dr. J. C. Braekman, Brussels. Dr. Braekman visited Stanford as a part of continuing collaboration in marine chemistry with Dr. Tursch’s group. While at Stanford he explored use of CONGEN for use in current problems in marine natural products, and worked on the problems of Drs. Irie and Gonzales (see above). He is currently exploring access to CONGEN from Brussels, via TYMNET. J. Lederberg T2 Privileged Communication DENDRAL PROJECT Section 6.1.1 Use of CONGEN by working scientists has turned up one major area in which additional information to the user was thought to be necessary. CONGEN users unanimously indicated their desire for a method of determining what percentage of the whole problem was solved at any moment, i.e., total number of possible structures is represented by the number already generated. In a prototype system we have implemented the Cntrl-I and Cntrl-S user information interrupts, to show how far CONGEN has progressed. If, for example, someone who has generated 357 structures is told that this indicates that they have generated 1 percent of the total possible structures, they immediately know that they do not want to finish generating all the structures. Even if there were enough space, 40,000 structures would be far more than they would want to see. We implemented another user-oriented facility for an invited paper presented at the 172nd American Chemical Society meeting, in August of 1976. Special features were added for a character-oriented, screen-addressable CRT terminals to give users an informative visual interface to CONGEN, an otherwise complex The dynamic field of view provided by this type of terminal was used to advantage to give the chemist-user a continuous, graphic summary of both the information he has supplied to the program and the dynamic use of that information by the program. INTERACTION WITH OTHER SUMEX-AIM PROJECTS We have had numerous discussions with Prof. Todd Wipke’s research group in meetings of our combined groups. Because the problems of manipulating chemical graphs are much the same for both groups, frequent discussions are mutually advantageous. Almost daily contact with other Stanford-based projects provides new ideas and programming assistance. In particular, there is considerable interaction with members of the MYCIN, MOLGEN and Protein Crystallography projects. Many of our experiment planning ideas have come from discussions with the MOLGEN group. Our ideas about explaining a program’s reasoning are derived from the success of MYCIN’s explanation package. And our ideas about integrating multiple sources of knowledge in data interpretation have been enhanced through discussions with the Protein Crystallography group. The large number of excellent INTERLISP programmers in all these groups provides a pool of programming expertise that we draw on frequently also. We are collaborating with Dr. Robert Lindsay on a monograph about the DENDRAL programs, with most of our interaction and all our text preparation taking place over the SUMEX system. We have also discussed helping Dr. Lindsay with a knowledge-based reasoning program to help pathologists at the University of Michigan. CRITIQUE OF RESOURCE SERVICES Some problems have arisen as a result of the Dendral commitment to working with outside chemist users. The primary area of difficulty arises from the fact that the Dendral project, as one of the many projects which use the SUMEX facility, is allocated a certain portion of system resources. Therefore, support Privileged Communication 73 J. Lederberg Section 6.1.1 DENDRAL PROJECT of an extensive body of outside users means that resources to support these users must be diverted from the research goals of the project. In encouraging new users, Dendral must be careful to state that aecess to Dendral programs might have to be restricted in the future if system loading becomes extensive. Understandably then, some scientists are reluctant to invest time in learning to use a complicated, although potentially useful program which they may well only be able to use on a temporary basis. One solution to this problem is to make the available programs as efficient as possible, and/or to make it possible to distribute copies of the program to other sites. The interactive computing environment provided by the SUMEX-AIM resource and the power of the INTERLISP language give us the capability of building and debugging complex programs rapidly. These are the best tools currently available for AI research. Because these tools are available and they are almost always available on command, our researchers are working at the frontier of applied artificial intelligence. The SUMEX staff does an outstanding job of keeping the computer and peripheral devices running reliably: without this professional Support we would not be able to build, enlarge, and test programs as complex as the DENDRAL programs. The large number of persons who use the resource is our single biggest source of frustration. Several of the DENDRAL programmers work frequently from midnignt to 8:00 a.m. just to avoid computing during the day. Although this minimizes their interaction with the rest of the research sroup, it allows them to work on large, cycle-intensive programs without competing for resources during "prime-time" hours. Tift. USE OF SUMEX DURING THE FOLLOW-ON GRANT PERIOD (8/78-7/83) LONG-RANGE GOALS Our primary goal is to build reliable, useful tools for biomolecular structure characterization and make them available for widespread use. The CONGEN program is fartnest along in this respect. We will extend its scope and add features to make it easier to use, while working on the problems of increasing its availability. By building onto CONGEN we will develop a broader set of tools with capabilities for helping biomedical scientists in many ways. By increasing the generality of Meta-~DENDRAL we intend to-provide tools for model-directed learning from empirical data that will complement purely statistical tools. At the same time we are building tools we are also exploring basic AI issues of knowledge representation, use, and acquisition in complex reasoning programs. These are fundamental issues for knowledge-based programs, such as those currently running on SUMEX. J. Lederberg 74 Privileged Communication DENDRAL PROJECT Section 6.1.1 JUSTIFICATION FOR CONTINUED USE OF SUMEX The research goals and methods of the DENDRAL project fit well within the Stated AIM criteria. We are building knowledge-based programs, and extending the art of applying AI to medicine to the benefit of both working biomedical scientists and other groups building similar tools. We need the SUMEX-AIM resource for our work because of its excellent environment for symbolic computing. The interactive computing facilities and the features of the INTERLISP language on SUMEX give us a several-fold increase in productivity over our previous batch computing environment using LISP-360. Privileged Communication 15 J. Lederberg Section 6.1.2 HYDROID PROJECT 6.1.2 HYDROID PROJECT HYDROID - Studies in Distributed Processing and Problem Solving Prof. Gio Wiederhold Computer Science and Electrical Engineering Stanford University I. Summary of Research Program A. Technical Goals The objective of this research is the development of a methodology for the analysis and implementation of alternatives in distributed processing and problem Solving. One of the primary reasons for interest in this area is its potential to break through the speed limitation barriers imposed by uniprocessing systems. If such a breakthrough can be achieved then the viability of the methods being developed by other projects using. the SUMEX-AIM resource will be enhanced. The rapid development of microprocessor and communications technology has given rise to a large number of proposed implementations of networks employing multiple processors. The computations to which these distributed systems are to be applied include heuristic decision-making problems, mathematical modelling, data reduction, and database search, as well as general purpose multi-access computing. There is however a lack of an adequate global understanding of the computational tradeoffs implied by network architectures. In order to complement the experimental results of other investigators and broaden their applicability to the system-design decision-making process, we are developing a general framework for the study of processor interaction in distributed processing systems. The framework consists of rules to obtain parameters from programs which specify the computations, rules to parameterize descriptions of networks of processors, and procedures to calculate expected system performance from these parameter sets. The framework is to be sufficiently powerful so that, when it is validated, the methods will be able to assist in the a priori assessment of the potential performance of new system alternatives or of systems with improved system components. One of the primary tools we are using to analyze the interaction between computations and distributed processor networks is simulation. The behaviour of processor network nodes, interprocessor control and task flow, and problem decomposition all require simulation at different levels of abstraction. Analytic queuing models may provide insight into relationships in networks, but are not adequate to provide quantitative results. Simulation is not seen as the end product of the study, but as a means to develop and assess the validity of our model of the interaction of computations and processor network architecture. Where possible, mathematical results will be used to assess the validity of model simulations. J. Lederberg 76 Privileged Communication HYDROID PROJECT Section 6.1.2 A number of large computational applications are being analyzed in order to assess their potential for decomposition into modules for distributed processing. The current candidate applications are: a) b) c) d) a) b) e) Programs which use heuristic methods in decision-making. Heuristic programs frequently employ recursive decomposition of problems into subsidiary problems which themselves may be suitable for distributed processing. Programs which use multi-faceted databases to retrieve and abstract information. The process of intelligent data retrieval and analysis often depends on data or knowledge sources which are being maintained at geographically distributed processing sites. Programs which acquire data from multiple, possibly dissimilar, sensors and attempt to reduce this data to simpler hypotheses. Programs which solve large numerical problems, such as those found in image processing applications. Parameters which describe the computations to be simulated include: The computational kernel size: the cycle and memory demand of a computational unit between interprocessor reference requirements. The computation definition message size: the amount of data required to transmit sufficient information to initiate a computational kernel. The database size: the amount of data or program text required to sustain a computational kernel, and its availability and residence in the network. Tne behaviour of the system can be varied throuzh the adjustment of other parameters. These parameters may be set to reflect the architecture of specific hardware systems, or may be varied to obtain optimum performance. In addition to obvious parameters (as the number and power of the processors), we expect the following parameter types to be important in developing an understanding of the Spectrum of distributed processor architectures: a) bd) e) Privileged Communication TT Interconnection density. As the density decreases, the message delay and congestion increase. This parameter will provide a high level abstraction of multi-processor connectivity schemes. Geographical distribution will increase message delay and transmission cost. Computational locality. A high degree of locality (of database or procedural information in the network) will enhance the probability that relevant knowledge exists in closely linked nodes, thus counteracting the effects of a low interconnection density. Database viscosity. A database, including the programs required to carry out the computations at a node, may be more or less fixed to one specific node. This therefore encourages the use of certain nodes for specific funetions. Many current processor networks are completely rigid in this sense, and for these networks optimal initial program and database J. Lederberg Section 6.1.2 HYDROID PROJECT allocations may be determined. However, we hypothesize that a greater degree of dynamic resource allocation is desirable to cope with changing loads and in order to enhance reliability. For this reason this parameter needs to be included. d) Redundancy. In order to assess the cost and benefits in terms of responsiveness and reliability, the redundancy of database and computations will also be made a parameter. In order to utilize the redundancy well, the computational resources (programs or data) which effect system performance most must be identifiable. e) Error rate. In order to test the effectiveness of reliability strategies, node and communications channel failures will be simulated. An important aspect of this model is that we intend to keep the abstractions at a sufficiently high level to allow analytic and intuitive verification of the model behaviour when applied to well understood computations. Computations have been mapped into specific parallel machines, but these results are not easily transferred to new architectures. The distributed processor systems now being built may have characteristics with unpredicted effects on system behaviour. We expect to be able to use the model to find potential bottlenecks, which then will define areas where extra design attention has a high payorfr. We do not intend to build hardware which is based literally on the abstract model. We hope to verify results obtained from the model using existing distributed processor systems and, assuming that our model (with appropriate parameters describing the load and architecture) matches the given system, be able to advise on system utilization or development aspects. A local resource of this type may be the Stanford I processor, now being built under ERDA sponsorship. In addition, if we determine that a certain, yet untried, architecture is promising, we would like to encourage and participate in its implementation. B. Medical Relevance and Collaboration Many applications at SUMEX consume large quantities of computational resources, The use of multiple distributed processors may provide a means to gain the required processing capabilities in an economic manner. In this sense the medical relevance of this study is indirect. We are attempting to develop tools which will be of use in medical computation problems. Our studies in distributed data base applications have a more direct medical relevance. To this end, we are maintaining contact with Dr. Jim Fries, whose ARAMIS database network collects data for the analysis of disease progress and treatment efficacy in rheumatoid arthritis from a variety of institutions. Sharing of data to provide a broader base for analysis is also a feature of programs in cardiology and oncology in which physicians at Stanford participate. In each of these instances the distributed nature of the data resources leads to differences in the meaning of data items, so that simple aggregation of the data may not be valid. Distributed processing may provide a powerful alternative. J. Lederberg 78 Privileged Communication HYDROID PROJECT Section 6.1.2 C. Progress Summary The HYDROID project got underway in the fall of 1975. We have been involved Since that time in developing a basic understanding of important problem areas in distributed processing and problem solving. A weekly research seminar, begun in Dec. 1976 has brought together members of the faculty and students from a variety of disciplines, and has included several speakers from application areas where distributed processing may be beneficial. We have developed a formalism in which to express the control of distributed problem solving in loosely-coupled processor networks. This CONTRACT NET protocol makes the cost of interprocessor interactions explicit. It is this cost which appears to generate one of the performance boundaries for distributed processor systems, We have written a basic simulator with which to investigate the merits of the formalism together with problem solving methods applicable in the distributed processing environment. To this end the simulator is currently being tested with Small search problems as a means of determining the necessary information that must be transferred from node to node in a distributed processor system for such problems together with the advantages to be accrued via a distributed approach. The simulator is being developed to cover a greater variety of computational interactions. D. Publications 1) H. Garcia-Molina and Gio Wiederhold, "Application of the Contract Net Protocol to Distributed Data Bases", HPP-77-21, Heuristic Programming Project, Stanford University, April 1977. 2) R. G. Smith, "The Contract Net: A Formalism for the Control of Distributed Problem Solving", HPP-~77-12, Heuristic Programming Project, Stanford University, February 1977 (also submitted to the Fifth International Joint Conference on Artificial Intelligence). E. Funding The HYDROID project is currently funded as part of ARPA Contract DAHC 15- 73-C-0435. Other potential funding sources are currently being contacted for Support of the specific areas of Hydroid application and interest. II. Interactions with SUMEY-AIM SUMEX-AIM currently provides all computing resources for the project. We thus enjoy a high degree of interaction witn other projects involved in the problems which result from construction of large programs. Other points of contact are related to the use of the same programming languages as well as the abundance of AI expertise residing around the resource. This latter point is Privileged Communication 79 J. Lederberg Section 6.1.2 HYDROID PROJECT especially important considering that one of our aims is discovery of suitable mappings of well understood AI methods onto nighly parallel asynchronous processor networks. SUMEX-AIM is also an excellent medium for informal transmission of reports, recent results and bulletins to users with related interests and problems. The powerful sereen-oriented editors available greatly enhance our capabilities for writing both text and programs. Finally, the development of simulation programs generally requires a highly interactive computing environment - the sort of environment we feel is provided by SUMBEX-AIM. J. Lederberg 80 Privileged Communication MOLGEN PROJECT Section 6.1.3 6.1.3 MOLGEN PROJECT MOLGEN - An Experiment Planning System for Molecular Genetics Prof. J. Lederberg (Genetics, Stanford) Prof. N. Martin (Computer Science, U. of New Mexico) Prof. E. Feigenbaum (Computer Science, Stanford) I. Summary of Research Program A. Technical Goals The goal of the MOLGEN project is to develop an experiment planning system for the domain of molecular genetics. In order to accomplish this, we hope to create and apply innovative methods of knowledge management and hierarchical planning. Experiments in molecular genetics are concerned with the study and manipulation of DNA molecules. The MOLGEN knowledge base will inelude both declarative and procedural information about such structures and the laboratory tools and techniques which experimental geneticists use. Also represented will be much of the strategic information required to join individual experimental steps into a meaningful whole. We are using the uniform method of schemata for representation of all types of knowledge within MOLGEN. We believe this will facilitate knowledge acquisition and explanation and provide a consistent means of storing hierarchical and other relations among objects and rules in the system. We hope to make the underlying knowledge base flexible enough to allow for experimentation with a wide variety of specific planning strategies. B. Medical relevance and collaboration Molecular geneties has at least two major connections to medical research. Learning about tne basic mechanisms which control the operation and transmission of genetic information is necessary to understand and treat the wide range of diseases (and health conditions like aging) which are genetically controlled. Also, recent developments in molecular genetics offer the promise of using genetic mechanisms to produce essentially limitless amounts of drugs and other biomedical substances. The MOLGEN project will develop a system designed to aid the molecular geneticist in planning experiments of these types. The MOLGEN project is a joint effort of the Computer Science Departments of Stanford and the University of New Mexico and the Genetics Department of Stanford. Major participants are Professor Nancy Martin of the University of New Mexico, Professor Edward Feigenbaum, Peter Friedland, Jonathan King, and Mark Stefik of Stanford Computer Science, and Professor Joshua Lederberg and Jerry Feitelson of Stanford Genetics. Privileged Communication 81 J. Lederberg Section 6.1.3 MOLGEN PROJECT C. Accomplishments MOLGEN is in the first year of formal funding as an independent entity. We have devoted this year to learning and analyzing the basic knowledge of experimental molecular genetics and to building part of the central structure of the knowledge base management system. A wide variety of experiments have been Studied with the aim of extracting knowledge about tne genetic objects and operators used as well as the higher-level know-ledge used to form the overall experimental plan. The object level knowledge is currently being organized into the schemata formalism for an initial attempt at a molecular genetics knowledge base. A representation method for DNA structures and an interactive structure editing and entry system (EDNA) has been built and tested successfully with geneticist users. Work is proceeding on the schemata storage and access routines and on routines for acquiring and editing the rules which describe the procedural knowledge of the domain. We plan to have the basic MOLGEN system operational for the purpose of testing object and operator knowledge (the practical goal of experiment checking) by the end of July 1977. D. Publications 1) N. Martin, P. Friedland, J. King, M. Stefik, "Knowledge Base Management for Experiment Planning in Molecular Genetics," suomitted to Fifth International Joint Conference on Artificial Intelligence 2) M. Stefik and N. Martin, "A Review of Knowledge Based Systems as a Basis for a Genetics Experiment Designing System," Feb. 1977 Stanford CS Report STAW-CS- 77-596, HPP77-5 3) N. Martin, P. Friedland, M. Stefik, "MOLGEN Knowledge Base I: Object System" To appear as HPP Working Paper 4) N. Martin, P. Friedland, M. Stefik, "MOLGEN Knowledge Base II: Rule System" To appear as HPP Working Paper BE. Funding MOLGEN research is supported by NSF grants MCS76-11649 and MCS76-11935 for the two year period from June 1975 - June 1978. II. Interactions witn SUMEX-AIM All system development has taken place on the SUMEX-AIM facility. We have used the system not only for programming, but also as a major aid in writing and transmitting among ourselves the wide variety of formal and informal reports which are necessary in the MOLGEN design phase. We believe the availability of good interactive text editing facilities like TV-Edit increases our productivity Significantly. J. Lederberg 82 Privileged Communication MOLGEN PROJECT Section 6.1.3 Active collaboration with remote users at the University of New Mexico will begin in September 1977 (Prof. Nancy Martin has been visiting at Stanford this year). We expect this collaboration to occur over the ARPA network. We hope also to maintain a collaboration with Dusko Ehrlich, formerly a Stanford geneticist and now doing research at The Institut de Biologie Moleculaire Faculte de Science in Paris over a TYMNET link to Sumex. , We have benefited enormously from the collected expertise in both knowledge-based systems and general programming and design problems available from other SUMEX-AIM projects. We have especially strong ties to the knowledge management expertise of the MYCIN project, but we also share common objectives with parts of the DENDRAL, SECS, and protein crystallography projects. We have also benefited from the intense interaction with many other projects at the AIM conferences. Finally, we have provided small amounts of SUMEX resources to geneticist users as part of a quid pro quo relationship for helping us understand that subset of genetic knowledge necessary for our initial knowledge base. The most outstanding example of this sort of collaboration occurred with Prof. Larry Kedes” group at the VA hospital in Palo Alto who are using SUMEX to determine the feasibility of automated assistance in analyzing complex DNA base sequences. Privileged Communication 83 J. Lederberg Section 6.1.4 MYCIN PROJECT 6.1.4 MYCIN PROJECT MYCIN - Computer-based Consultation in Clinical Therapeutics S. N. Cohen, M.D. (Pharmacology) and B. G. Buchanan, Ph.D. (Computer Science) Stanford University I) Summary of research Technical goals The Mycin project is aimed at the development of a computer program capable of functioning as an expert consultant on a range of medical decision making problems. In particular, we have been working on the construction of a system that provides consultative advice on the diagnosis and therapy selection for a number of infectious diseases. Current areas of competence of the system include bacteremia and meningitis, and work is currently underway to extend this to urinary tract infections, pulmonary infections, and prophylactic use of / antibiotics. Our work has been guided by three fundamental objectives: (1) A major goal of the MYCIN system has been to provide a computer-based therapeutic tool designed to be clinically useful, one that would be used eventually in the clinical setting. This goal requires development of a System that has a medically sound knowledge base, and that displays a high level of clinical competence in its field. The program must first convince clinicians of the quality of the information it is providing before they will be willing to use it. (2) Since many clinicians are not likely to accept the advice provided by a computer-based system unless they can understand why the recommended therapy has been selected, the system has to do more than just give advice dogmatically. It should have the ability to explain the reasoning behind its decisions, and should be able to do so in terms that suggest to the physician tnat the program approaches the problem in much the same way that he does. This permits the user to validate the program’s reasoning, and modify (or reject) the advice if he believes that some step in the decision process is not justified. It also gives the program an inherent instructional capability that allows the physician. to learn from each consultation session. (3) A third major goal is to provide the program with capabilities that enable augmentation or modification of the knowledge base by clinical experts in infectious disease therapy, in order to improve the validity of future consultations. The system therefore requires some capability for acquiring knowledge by interacting with experts in the field, and for incorporating this knowledge into its knowledge base. J. Lederberg 84 Privileged Communication MYCIN PROJECT Section 6.1.4 Three separate parts of the MYCIN system accomplish these goals. The consultation system uses the knowledge base, along with patient-related data entered by the physician to generate therapeutic advice. The explanation system has the ability to explain the reasoning used during the consultation, and to document the motivation for questions asked or the rationale for conclusions reached. Finally, the knowledge acquisition system enables experts in antimicrobial therapy to update MYCIN’s knowledge base, without requiring that they know how to program a computer. We have also sought to use Mycin as a framework for understanding the process of medical decision making and the nature of clinical judgment. Physicians are constantly faced with the necessity of making decisions based on information that is both incomplete (missing historical data or test results not yet available) and inexact (results are rarely definitive). In addition, those decisions are often based on rules that are only approximate (e.g., "a gram- negative aerobic rod in the blood is probably a bacteriodes"). But decisions are made despite these problems, and the results often proven later to be valid. We nave attempted to understand how this is done by developing in our system a parallel set of capabilities. We have relied on the "production rule" encoding of information, in which individual decision rules are specified in an "if/then" format. For example, the rule indicated just above is encoded in the system as: If 1) the gram stain of the organism is gran negative, and 2) the morphology of the organism is rod, and 3) the aerobicity of the organism is anaerobic, Then there is suggestive evidence (.6) that the identity of the organism is Bacteroides. This encoding of knowledge offers a number of advantages over some of the more traditional approaches to diagnosis like decision trees, Bayesian analysis, and utility theory. Unlike decision trees, it can deal with both inexact and incomplete information. Unlike the Bayesian and utility theory approaches, it does not need extensive amounts of conditional probability data. A collection of independent rules is also far easier to augment than a complex decision tree; the rules thus provide a much more flexible body of knowledge to which new information is more easily added. The rules also make possible an explanatory Capability: the system can justify any of its actions or decisions by displaying the relevant rules it invoked in reaching that decision. This provides an explanation that is far more comprehensible than any we might be able to provide by recapping the actions of a program based solely on statistical considerations. A more specific goal of our research involves understanding the process of infectious disease diagnosis and therapy selection. This process is not as yet well understood, and we believe that by dissecting it down to individual decision rules, we can gain insight into how it works. In addition, the resulting set of rules may prove to be a useful compendium of knowledge about the task. Since we believe this set of rules will also be quite large, we are studying the problems of accumulating, managing, and using large stores of such task-specifie knowledge. We are working on a range of techniques to provide Capabilities like insuring the consistency of the set of rules and making it easy to modify existing rules or add new ones. Privileged Communication 85 J. Lederberg Section 6.1.4 MYCIN PROJECT Finally, since computer consultants are designed for use by people who might not otherwise make use of computers, we have devoted a great deal of attention to the issue of human engineering, and the "habitability" of the system. This ranges from such minor items as the automatic correction of misspelled answers, to the range of sophisticated explanation capabilities available. . Medical relevance and collaboration A number of recent studies indicate a major need to improve the quality of antimicrobial therapy. Almost one-half of the total cost of drugs spent in treating hospitalized patients is spend on antibiotics [1,2], and if results of a number of recent studies are to be believed, a significant part of this therapy is associated with serious misuse [2,3,4,5], Some of the inappropriate therapy involves incorrect selection of a therapeutic regimen [4], while another serious problem is the incorrect decision to administer any antibiotic [2,4,5]. One recent study concluded that one out of every four people in the United States was given penicillin during a recent year, and nearly 90% of these prescriptions were unnecessary [6]. Other studies have shown that physicians will often reach ‘therapeutic decisions that differ significantly from the decisions that would have been suggested by experts in infectious disease therapy practicing at the same institution. Nonexperts sometimes choose a drug regimen designed to cover for all possibilities, prescribing either several drugs or one of the so-called "broad spectrum" antibiotics, even though appropriate use of clinical data might have led to more rational and less toxic therapy. Within a hospital environment in which professional resources are often overburdened, and in environments where expert sources are not readily available, a computer-based consultant will be highly useful. Such a system will also have broad fringe benefits in its educational impact on staff physicians and in providing a framework for quality eontrol and peer-review evaluations. Antimicrobial therapy appears to be an especially suitable area for the initial development of a computer-based system to assist pnysicians with decisions in clinical therapeutics. The components of the decision making process in antimicrobial therapy are more readily definable than in many other areas of medicine, and the consequences of the physician’s decision can usually be assessed in terms of the direct therapeutic action. Nevertheless, the general approach used here is applicable to other areas of clinical decision making. The basis of rational antimicrobial therapy decisions is identification of the microorganisms causing the infectious disease, Accurate identification is important because of the specificity of antibiotic action: drugs that are highly effective against certain organisms are often useless against others. The patient’s clinical status and history (including information such as prior infections and treatments) provide data that may be valuable to the physician in identifying the disease-causing organisms. However, bacteriological cultures tnat use specimens taken from the site of the patient’s infection usually provide the most definitive identifying information. Initial culture reports from a microbiological laboratory may become available within 12 hours from the time a clinical specimen is obtained from the J. Lederberg 86 Privileged Communication MYCIN PROJECT Section 6.1.4 patient. While the information in these early reports often serves to classify the organisa in general terms, it does not often permit precise identification. It may be clinically unwise to postpone therapy until such identification can be made with certainty, a process that usually requires 24 to 43 hours, or longer. Thus it is commonly necessary for the physician to estimate the range of possible infecting organisms, and to start appropriate therapy even before the laboratory is able to identify the offending organism and its antibiotic sensitivities. In this setting MYCIN plays two roles: (a) providing consultative advice that will assist the physician in making the best therapeutic decision that can be made on the basis of available information, and, (b) by its questioning of the physician, pinpointing the items of clinical data that are necessary to increase the validity of the clinical decision. Our project is an interdisciplinary effort involving the joint effort of computer scientists from the Stanford Computer Science Department, and clinicians from both the Department of Clinical Pharmacology at Stanford and the Department of Infectious Disease at the University of Arizona. The task of the clinicians has been to specify the decision rules necessary for diagnosis and therapy selection, while the computer scientists have been devising ways to represent and use this information in the computer. The systea is then tested by the clinicians using real cases obtained from journals and medical records. A complete listing of the staff is given below. Stanley N. Cohen, MD, Clinical Pharmacology Bruce G. Buchanan, PhD, Computer Science Stanton Axline, MD, Infectious Disease (now at University of Arizona) Randall Davis, PhD, Computer Science Frank Rhame, MD, (to 9/75), Infectious Disease Edward Shortliffe, MD PhD (to 6/76, returning 6/77), Infectious Disease Victor Yu, MD, Infectious Disease Rudolpho Chavez—Pardo, MD, (to 9/75), Clinical Pharmacology A. Carlisle Scott, MS, Computer Science Sharon Wraith, BS, Clinical Pharmacology Jan Aikins, BS, Computer Science Robert Blum, MD, presently in Computer Science William Clancey, AB, Computer Science Larry Fagan, AB, Computer Science William van Melle, AB, Computer Science Progress Report Period covered: June 1, 1974 through September 30,1976 Sunmary Over the past three years we have designed, built and partially evaluated a computer program capable of diagnosis and therapy selection for certain varieties of infectious diseases. The program is intended to function as a consultant, and "interviews" a doctor about his patient, requesting information on clinical findings and results of laboratory tests. It relies on a store of judgmental knowledge (obtained from experts in infectious disease) to determine the Privileged. Communication 87 J. Lederberg Section 6.1.4 MYCIN PROJECT conclusions which can be drawn from the answers it receives. This judgemental knowledge is in the form of some 400 decision rules dealing with the wide range of topics that must be considered in determining the likely identity of causative organisms and selecting appropriate antimicrobials. MYCIN is composed of the three systems described earlier (the consultation, explanation, and knowledge acquisition systems), all of which reference the knowledge base of decision rules. The program is currently capable of dealing with bacteremia and meningitis infections. It can diagnose the likely presence of more than 35 different organisms and can recommend therapy for 190 organisms, selecting drugs from a "pharmacopoeia" of 30 antimicrobials. The system can tailor its therapy recommendations to a specific organism and infection, can adjust dosage levels and durations in response to impaired renal status, and can combine drugs to create combination therapies, giving it a wide range of clinical applicability. Detailed Report Our work in the past several years has been organized around five main areas of investigation. We have a) increased the system’s competence in existing areas of clinical expertise while expanding its scope b) developed a number of user-oriented features to inerease the progran’s attractiveness to clinicians ec) developed a range of knowledge acquisition capabilities to speed the process of expanding the system’s clinical competence d) solved a number of technical problems to insure that the program does not outgrow the computer resources available to it e) evaluated the system’s level of expertise. Clinical Capabilities Since the primary qualification for any clinical consultant is competence in the domain, we have devoted significant effort to expanding MYCIN’s knowledge base and widening its scope of competence. For instance, the system was directed initially at patients with positive blood cultures, the basic methodology was generalized to support a much broader approach to the problem. MYCIN has now gained the ability to deal with infections from which the causative pathogen hasnt been isolated (e.g., pneumonia), or which haven’t even been cultured (e.g., brain abscess). With this broadening of scope, it has also become necessary to be able to evaluate the meaningfulness of isolates for cultures taken from sites other than blood. For urine and sputum isolates, for example, the systen gained the ability to base its evaluation of sterility of an isolate on both the method of collection and the user’s estimation of conscientiousness of collection. J. Lederberg 88 Privileged Communication MYCIN PROJECT Section 6.1.4 An extensive review of the program’s approach to drug selection has led to a major revision in the basis for therapy selection during the course of progran development. The program was given the ability to consider both the infectious disease diagnosis and the significance of the organism as further determinants of therapy, in addition to organism identity. These three together have become the primary factors in drug selection, with drug toxicity and ecological factors as secondary considerations. The result is a more appropriate, more sharply focussed drug selection that also includes dose, route, and duration, While the initial development of the knowledge base focussed on rules concerned with the diagnosis and therapy for blood infections (bacteremia), the complexity of infectious disease therapy and the frequent occurrence of multiple infections in a single patient requires a broader knowledge if the system is to be clinically useful. In response we have extended MYCIN’s knowledge base, while at the same time improving the degree of sophistication with which the system deals with bacteremia. The second major area has been the diagnosis and treatment of meningitis, and more than 100 rules were added to provide the ability to deal with it. In the processs the program was also extended beyond bacteria, as it gained the ability to consider and treat both fungi and viruses, This area has proved to be an especially useful donain because it has presented several new challenges. In particular, meningitis requires the ability to deal with a disease that is often diagnosed on clinical grounds alone, before any specific microbiological evidence is available (by comparison, the diagnosis of bacteremia on clinical grounds alone is far less certain, and usually requires establishment of the fact that bacterial growth nas occurred in blood cultures.) For this reason, extension of the project into the meningitis area has made it necessary for MYCIN to consider a larger range of clinical factors, and has resulted in a system which has a broader picture of the whole patient. Other contributions to the system’s competence have come from expansion of the knowledge base to include information about normal bacteriological flora for a wide range of culture sites. This enables the program to distinguish between normal and pathological flora, and it can as a result decide more precisely on whether to treat. User Oriented Features Clinicians traditionally shun computer programs, and we believe this is in large measure due to insufficient attention paid to user oriented features. As a result, we have devoted significant effort to insuring that MYCIN is responsive to its users in a number of unique ways. The development of the explanation and question answering capabilities have been a essential for this work, and both have grown extensively in power. The system’s ability to explain the motivations for its questions, for instance, underwent a major design revision. It is now based on a more powerful approach that relies on the program’s knowledge of its own control structure and ability to examine its own rules. The user can now fully explore the system’s current line of reasoning, rather than just a single level, as initially implemented. Privileged Communication 89 J. Lederberg Section 6.1.4 MYCIN PROJECT The language understanding capabilities of the question answering system have also been extensively revised. They now allow a broader range of questions to be asked and offer more precise answers. The use of this feature was also Simplified so that the user no longer needs to classify his questions. A comprehensive review of the kinds of questions asked by users of the system has led to a number of important features. MYCIN can now answer a much wider range of questions, and can, in particular, explain why it did not take a specific action, as well as why positive conclusions were reached. It is our feeling that capabilities such as these are of great importance in enabling the project’s staff and clinical experts to understand the program’s rationale for its actions in instances where its recommendations do not appear to be the most appropriate and most correct. Thus, the line of reasoning of the program can be evaluated, and requirements for new or modified rules can be uncovered. These kinds of capabilities are also important in optimizing user acceptance of the system. A substantial addition to the question-~answering facility enables the system to explain the process of therapy selection. In comparison to the diagnostic process, therapy selection is complicated somewhat by the need to consider a range of different factors simultaneously, such as the total number of drugs recommended, the degree of sickness of the patient, possible interactions between drugs, toxicity and other side effects, etc. Despite this complexity, explanations of therapy selection are phrased at a conceptual level that makes tnem comprenensible to the ohysician. As before, this makes it possible for the physician to verify the validity of the system’s decisions, and makes it clear to him that the system reaches its results in much the same way that he does. The explanation consists of a step-by-step review of the reasoning which led to recommending a particular drug for a specific organism. It considers such issues as why a drug was first considered for an organism, why a drug may have been chosen as the best therapy for that organism, how the total number of drugs was reduced by considering common drug classes among the candidates, and consideration of possible contraindications based on the patient’s allergies, age, and other factors. By characterizing each drug according to this scheme, the program can explain why a drug was or wasn’t prescribed, as well as why one drug is to be preferred over anotner. This offers an important explanatory capability that will make the system more attractive and acceptable to clinicians. , Several capabilities have been added to make the program easy to use. The system is now more tolerant of erroneous or inappropriate responses, and is able to provide a reworded question, along with a list of acceptable answers. In addition, it has the ability to recognize responses which are not sufficiently precise, and can rephrase its questions accordingly. We have recently added to the system the ability to modify drug dosage in eases of renal failure. Where, previously, the system only issued a warning to modify doses, it is now able to use either creatinine clearance or serun creatinine levels to compute the level of renal function. The program then uses drug-specific information (e.g., half-life, percent loss of the drug via renal excretion, etc.) to adjust the regimen. It can either (a) adjust dose levels downward and leave dosing interval unchanged, or (bd) increase dosing interval and J. Lederberg 90 Privileged Communication MYCIN PROJECT Section 6.1.4 leave levels unchanged, or (c) allow the physician to select a dose interval, for which it chooses an appropriate dose level. Since the problem of determining renal status and the proper adjustment of drug dose is important in the use of aminoglycoside antibiotics, cephalosporins, and other antimicrobial agents, the customization of drug dosage recommendations will be an important addition to the power of the system. We have found, in addition, that there is a substantial amount of information that is routinely collected in every consultation, like the date and Site of each of the cultures, gramstain and morphology results for each of the organisms that grew out, etc. Currently, the program exhaustively analyzes each culture and all of its organisms in turn. Some users of the program appear to be impatient with this method, and would much prefer to enter all the relevant data on all the cultures and organisms at once. This is faster and easier, since the information can be gathered in a single review of the chart, instead of having to review it several times as each culture is processed. In response to this, we have reorganized the consultation slightly, so that it is possible to enter all of this data at once, at the beginning. This offers two otner advantages in addition to improving the program’s acceptability to its users. First, it provides a basis for our future efforts to write rules which deal with interactions between infections (see below, "Specific Aims"), and second, it Suggests a mechanism for eventually merging our work with the product of existing efforts to organize and automate the recording and handling of medical record data. This latter development may in time make it possible for MYCIN to obtain a large part of the information it requires directly from such automated records, sharply reducing the number of questions it has to ask, and speeding up the consultation considerably. Finally, several new capabilities make the systen convenient to use, in anticipation of its evaluation in the clinical setting. Among these are the option of the user to type a comment about system performance at any time during the consultation. His comment is recorded in a special file which is reviewed periodically by our medical staff, and provides an on-going opportunity for users to offer feedback aimed at improving the usefulness of the system. The user can also indicate his belief that the system has "broken down" in some way and he is invited to describe the problem. His description is saved along with information about the current state of the program, so that our systems programmers can deal with the problem later. Knowledge Acquisition A preliminary knowledge acquisition program was completed in the middle of 1974, and demonstrated the feasibility of having a physician teach the system new rules using a rather stylized subset of English. Building on the experience gained here, work began on a revised program designed to allow the user to examine and modify the program’s knowledge and behavior as a single, unified action. This program was designed to make the explanation and knowledge acquisition capabilities available together, to make use of the fact that the nature of the explanations requested can give a clear hint about the content of a new rule. The program was also designed to advise the user about the effect of his rule on the original deficiency, indicating, for instanee, whether or not it corrects the problem he noticed. Privileged Communication 91 J. Lederberg Section 6.1.4 MYCIN PROJECT Work on a preliminary version of this new program was completed in 1976, making available a broad range of useful features enabling our clinical experts to add rules to the system without requiring that they have a knowledge of programming. If the expert finds that MYCIN’s handling of a particular problem is at variance with his own expert knowledge, he can use the explanation capabilities to discuss the line of reasoning in use at that time, can add or modify rules in the knowledge base, and can determine the effects of the changes on MYCIN’s subsequent performance. (Quality control is maintained on the overall system by regular meetings of our clinical and pharmacological experts who determine the "official" MYCIN knowledge base.) Technical Issues As MYCIN’s clinical capabilities have expanded, efficiency has improved as a result of a number of modifications to the system’s technical capabilities. Early in our work, for instance, a comprehensive review and modification of the control structure was undertaken to improve efficiency and generality. The resulting program was both more direct, and faster. More recently, modifications have been made so that the the large English dictionary can be kept on the disk and accessed only as needed, rather than keeping it in core, which slows down the system’s response speed. The self documenting features of the program have also been improved to make them faster, and the system’s interaction with the terminal has been made more uniform, to prepare for the time when different users of the system may have various different kinds of terminals. Evaluation Activities Since clinicians are likely to require documentation of MYCIN’s competence and utility before seeking its advice, considerable time has been spent on evaluating the system and on implementing a ranse of program features to support these efforts. In the past two years we have obtained many useful suggestions from clinicians when the system was presented to several different conferences. In February 1975 it was presented to the Western Society for Clinical Research, in September 1975 to the International Symposium on Clinical Pharmacy and Clinical Pharmacology, and more recently (June 1976), it was presented to the Drug Information Association. A large scale formal study and evaluation of MYCIN’s performance was begun in January 1976. The same set of clinical data was provided to both MYCIN and a set of experts in infectious disease therapy. [Five of the experts were nationally recognized authorities in the field, the other five were clinical fellows in the Infectious Disease Division at Stanford. A complete list of names, titles and affiliations is found in Appendix B.] The judgments of the program and the experts were compared, and the experts were asked to evaluate iYCIN’s performance. J. Lederberg 92 Privileged Communication MYCIN PROJECT Seetion 6.1.4 To do this, we first designed a form to allow us to separate the variables requiring analysis. The parameters evaluated include A. the "quality" of the interaction ~ were any questions irrelevant or missing B. the program’s ability to determine organism identity C. the program’s ability to determine organism significance D the program’s ability to select proper therapy E. overall performance evaluation F. potential impact as a clinical tool or teaching facility Tne evaluation form was designed to be informative yet simple to complete. It was tested in a pre-evaluation trial run, then used for the formal study. Consecutive patients with positive blood samples were evaluated for inclusion in the study by project personnel, until we obtained at least 10 patients for which MYCIN recommended therapy, and 15 patients overall (patients were rejected if they were outpatients when the sample was drawn, if they had a previous blood culture in the preceding seven days, or if they had a diagnosis of meningitis or infectious endocarditis.) For each of the patients accepted, a one to two page clinical summary was prepared and combined with a summary of the laboratory test data as of the time when the first blood culture was obtained. This information was then used to obtain a therapeutic evaluation from MYCIN. Bach of the participating experts received a set of fifteen evaluation forms (one for each patient). Each form contained: (a) the clinical summary and lab data; (b) space for the expert to record his conclusions about the nature of the infection, likely causative organisms, and appropriate therapy; and (c) a transcript of the MYCIN consultation along with space for the expert to record his opinion of various aspects of MYCIN’s performance. By presenting the information in this order, we obtained a therapeutic regimen from the expert based on the same information supplied to MYCIN. This allowed us to compare the expert’s answers to MYCIN’s, and also gave us the expert’s opinion of the system’s performance. In the past few months a sufficient number of the forms have been returned that we were able to do a preliminary analysis. The figures below are based on the nine (out of ten) which have been returned. Since it is difficult to select a single number which summarizes performance, we have in general measured each of the parameters listed above in three ways: (i) the percent of instances in which the program was judged exactly correct, (ii) the percent of instances in which the progran’s performance was judged exactly correct or an acceptable alternative, and (iii) the pereent of cases in which a majority of the experts judged its performance exactly correct or an acceptable alternative. By using all three measures, we obtain a range of figures which give a good picture of the program’s performance. All of these attempts to evaluate performance are complicated by the fact that (as expected) the experts” own choices about each patient were not unanimous. Thus, we cannot ask whether MYCIN’s answers were "correct" in any absolute sense, since there was no agreement on what constitutes "correct". Instead, we ask how often each individual expert rated the program’s responses as Privileged Communication 93 J. Lederberg Section 6.1.4 MYCIN PROJECT correct. But given the variation among experts themselves, the program can never be expected to reach 100%, and depending on the extent of the intra-group variation, the absolute limit may in fact be much lower. Thus the ideal question to ask is "Do experts rate MYCIN’s performance correct at least as often as they rate each other’s performance correct?" This would give a good indication of how close the system’s performance was to that of the group of experts as a whole. We have been able to do this in a few isolated cases, but in general it requires more information than we were able to collect. This is discussed in more detail below, but in general terms the problem is that we were able to ask each expert for his choices for each patient, and ask him to rate MYCIN’s choices. But, without a second round of questionnaires, which would ask each expert to rate the acceptability of the other 9 experts” responses, we lack direct information about intra-expert variability. The figures below should be reviewed with this caveat in mind. A. "Quality" of the interaction To measure the first item, the experts were instructed to mark any questions in the consultation which they felt were irrelevant, and to note any questions which they felt were omitted by the system. Overall MYCIN did quite well, as there were no consultations in which a majority of the experts felt that any particular question was irrelevant or omitted. On the average, there were 0.53 questions judged irrelevant and 0.55 indicated as omitted. Table I summarizes the next four measurements. J. Lederberg 94 Privileged Communication MYCIN PROJECT Section 6.1.4 MYCIN ist choice MYCIN 1st choice MYCIN 1st choice identical to an identical to or an identical to or an expert’s ist choice acceptable alternative acceptable alternative to an expert’s ist judged by a majority choice of experts wen ee ee ee ee ht a a a a pen we en np ORGANISM 56 .3% i 75.6% ' 81.8% | IDENTITY | | { N= 414 i N= 414 { N= 11 i ee ee i ee ee Fi ee nn a np en a ee ne np | i i ORGANISM 91.7% | NA { 100% t SIGNIFICANCE i i i N= 36 { i N= 4 i te ee ee ee pe a ee ef we ep | THERAPY 12% | 75% I 91% : SELECTION i | i N= 99 i N= 99 { N= 11 { mt rn ht ee en pie nn ee ee ee i I { OVERALL 17 .0% 59.3% { 50 .0% | PERFORMANCE 1 i i N= 135 \ N= 135 | N= 15 { TO ena ne Sen mt ay a ON OE Ah SO A ee me On ND we Te ee a a a a ee mm en a me mae ae em an ee ee oe ce ae a ee Table T Summary of nine experts” responses to MYCIN’s performance on 15 cases B. Organism Identity For organism identity, the experts were asked to rate each of MYCIN’s selections as exactly correct (they agreed that the organisa was likely to be present), an acceptable alternative (they had not chosen that organism, but agreed it might be present), or an unacceptable choice (they disagreed with its selection). Since 11 of the cases were not contaminants, and there was a total of 46 organisms chosen by the system, with 9 experts rating each of those choices we have an N of 414 for the first two columns and 11 for the third. In 56% of the instances the system’s choices were identical to the experts”, 75% of them were either identical or acceptable alternatives, and in 82% of the cases, its results were acceptable to a majority of the experts. In addition, the experts were asked to indicate which organisms they feit MYCIN nad overlooked in its diagnosis. For the 11 non-contaminant cases, the experts indicated an average of only 0.35 organism identities that were overlooked by the system. In no case did a majority of experts feel that any particular organism had been overlooked, suggesting that even the 0.35 figure is a result of intra-expert variation. Privileged Communication 95 J. Lederberg Section 6.1.4 MYCIN PROJECT C. Organism Significance The first question on the evaluation form gave the expert a chance to indicate that he felt the patient did not need to be treated. The first column of the second row indicates the number of times the expert indicated no treatment was necessary for a case in which MYCIN also judged the organism to be a contaminant. (Tnere is no number in the second column since we did not ask about a "close call" on whether or not to treat. In addition, the measurement is based only on the contaminant cases, since in many of the cases where both MYCIN and the expert determined that treatment was necessary, they based that decision on different organisms. We felt that it would be misrepresentative to call these situations "agreements".) As the figures show, in only three out of 35 instances was there any disagreement with the system’s decision on whether or not to treat. D. Therapy Selection The expert was asked to select therapy for the organisms which he felt were likely to be present before looking at MYCIN’s therapy recommendation. He was then asked to judge MYCIN’s choice of therapy for that patient. Since MYCIN was selecting therapy for the organisms which it felt were present (which may have differed from those chosen by the expert), this provides a fundamental comparison of performance - it compares therapy selection performance of the two when they are faced with the same clinical situation. This comparison is a difficult one to make, since it is complicated by the difficulty noted above, of variability in the experts’ performance and the need to judge MYCIN with respect to that variability. Looking only at exact agreements (i.e., two identical therapies) produces the figure in the first column, which indicates that 12% of the time MYCIN’s recommendation was identical to that of an expert. Comparing each expert’s therapy choice with the other 8 indicates that 35% of the time (N= 396) any pair of experts chose identical regimens. The experts were also asked to judge whether MYCIN’s therapy was an acceptable alternative (if it was not identical to their own), producing the figure in the second column. This indicates that it was either identical, or they felt it was an acceptable alternative 75% of the time. (Unfortunately, we have no reliable way of judging the intra-expert variability here, without a second round of questionnaires which asked each expert to rate the acceptability of the other experts’ choices.) [As an alternative, we have attempted to develop a measure of how "far apart" two non-identical regimens are. But the problem is difficult: for example, for gram negative rods with salmonella most likely, is gentamycin and chloramphenicol "very different" from gentamycin and ampicillin? We have been working on a "drug metric" to solve this problem, attempting to base the "difference" between two drugs on factors like organism susceptibility, toxicity, and drug efficacy, but this work is still in prozgress.] The figure in the third column gives a crude overall measure of therapy selection performance, and indicates that in 91% (10 out of 11 cases), a majority of the experts rated MYCIN’s regimen as either identical to their own or an acceptable alternative. J. Lederberg 96 Privileged Communication MYCIN PROJECT Section 6.1.4 [The evaluation form also asked each expert to choose a regimen for the organisms which MYCIN had selected. The intent here was to compare the system’s performance against the expert when both were faced with the same set of organisms (rather than compared with the same clinical situation, as above). Unfortunately, inconsistent answers on the part of the experts indicated that they were not answering the question according to the instructions. It appeared that they were not able to suspend their own judgments about organism identity sufficiently to select a regimen based on MYCIN’s organisms alone. For this reason, we believe the data to be unreliable, and have not included it here.] BE. Overall Performance At the end of each evaluation form, the expert was asked to rate the system’s overall performance as either excellent, good, fair, or poor. The first two columns of the last row indicate that 17% of these evaluations were "excellent", and almost 60% were either "excellent" or "good" (only 13% were "poor"). In 60% of the cases (9 out of 15), a majority of the experts felt that MYCIN’s overall performance was either "excellent" or "good", F. Present Utility and Future Potential Finally, after completing the entire set of 15 patients, each expert was asked to rate MYCIN’s present utility and future potential as a clinical tool and as an educational tool, rating it as having "considerable", "some", or "no" potential. The table below summarizes their response. Evaluation of Present Utility "csonsiderable" "some" "none" mtr en en eee he te a a tn oe nn ep clinical tool } 11% \ 67% i 22% | tt re eee ee Fon ee ee ee ee foe wee $e ee ee He + educational tool { 11% | 89% j Of { re ee pe et tp Evaluation of Future Potential "considerable" "some" "none" wen ee ee ee tne rn er et clinical tool 11% | 89% i 0% i men ete ee He i hn ee pen ee eet educational tool { 67% { 334 | 0% | oe ee Het pn oe ee ee ee Table II Opinions of 9 experts on MYCIN’s present utility and future potential To aid these evaluation efforts, we have also implemented a number of useful features in the system. For instance, MYCIN now keeps continuing Privileged Communication 97 J. Lederberg Section §.1.4 MYCIN PROJECT statistics of the use of rules in its knowledge base. This will help us to monitor its long term performance, to study the interrelationship between rules, and perhaps detect automatically any inconsistencies or gaps in the knowledge base. We have also designed and implemented a mechanism for "on-line" evaluation. At the end of each consultation, the system asks a few questions about the quality of its performance from the clinicians who are using it. This interchange will be brief to avoid being a burden to the user, but it is expected to represent an important addition to the other evaluation efforts. It will, for instance, make possible a new form of evaluation of the system. Rather than using a series of "prepackaged" cases as was done in our initial evaluation, the next stage will be carried out using information entered at a terminal by the evaluator. The participating panel of experts will be selecting patients in areas covered by the MYCIN knowledge base, and will engage in a dialogue with the system about those patients. Following completion of the session, the on-line evaluation feature will ask questions about system performance, and the responses will be tabulated and evaluated on-line by appropriate biostatistical programs. Specific recommendations which may point out problem areas in the consultation will be reviewed by our staff. By this process we expect to be able to maintain a continuing evaluation of MYCIN’s capabilities in various areas, and pinpoint specifie areas where performance is suboptimal . MYCIN Project Publications THESES Davis R, Applications of meta level knowledge to the construction, maintenance, and use of large knowledge bases, Thesis: PhD in Computer Science, AI Memo 283, 304 pp, Stanford University, July 1976. Shortliffe EH, MYCIN: A rule-based computer program for advising physicians regarding Antimicrobial therapy selection, Thesis: Ph.D. in Medical Information Sciences, Stanford University, Stanford CA, 409 pages, October 1974. Also, Computer-Based Medical Consultations: MYCIN, American Elsevier, New York, 1976. PAPERS Buchanan BG, Davis R, Yu V, Cohen S N, Rule-based medical decision making by computer, Proc. MEDINFO 1977, to appear. Clancey W. Chronicler: an explanation system based on set-predicate representation of computational processes, submitted to 5th IJCAIL. J. Lederberg 98 Privileged Communication MYCIN PROJECT Section 6.1.4 Aikins J 38. Use of models in a rule-based consultation system, short paper submitted to 5th IJCAIL. Davis R. Interactive transfer of expertise: acquisition of new inference rules, submitted to 5tn IJCAL. Davis R. Knowledge acquisition in rule-based systems: knowledge about representations as a basis for system construction and maintenance, to appear in Pattern Directed Inference Systems, Waterman and Hayes-Roth (eds.), Academic Press, in press. Also to be presented at Pattern Directed Inference Systems Workshop, Honolulu, May 1977. Davis R, Buchanan BG. Meta-level knowledge: overview and applications, submitted to 5th IJCAI, Cambridge, MA, August 1977. Davis R. A decision support system for medical diagnosis and therapy selection, Data Base (SIGBDP newsletter), 8 (Winter 1977) pp 58-72. Wraith S, Aikins J, Buchanan BG, Clancy W, Davis R, Fagan L, Scott AC, van Melle W, Yu V, Axline S, Cohen S, Computerized consultation system for selection of antimicrobial therapy, American Journal of Hospital Pharmacy, 33 (December 1976) pp 1304-1308 Scott AC, Clancey W, Davis R, Shortliffe EH, Explanation capabilities of knowledge based production systems, American Journal of Computational Linguistics, Microfiche 62, 1977. Also, HPP Memo 77-1, Stanford Computer Science Department, February 1977. Shortliffe EH, Davis R, Some considerations for the implementation of knowledge-based expert systems, SIGART Newsletter, 55:9-12, December 1975. Davis R, Buchanan B, Shortliffe EH, Production rules as a representation for a knowledge-based consultation System, Artificial Intelligence, 8 (Spring 1977) pp 15-45. (Also, AI Memo 266, Stanford University, October 1975). Davis R, King J J, An overview of production systems, in Elcock and Michie (Eds.), Machine Intelligence 8: Machine Representations of Knowledge, John Wylie, to appear, 1977. (Also AI Memo 271, Stanford University, October 1975). Shortliffe E H, Judgmental knowledge as a basis for computer-assisted clinical decision making, Proceedings of the 1975 International Conference on Cybernetics and Society, pp 256-7, September 1975. Privileged Communication 99 J. Lederberg Section 6.1.4 MYCIN PROJECT Snortliffe & H, Axline S, Buchanan BG, Davis R, Cohen S, A computer-based approach to the promotion of rational clinical use of antimicrobials, in Gouveia, Tognaoni and Van der Kleijn (Eds.), Clinical Pharmacy and Clinical Pharmacology, pp 259-274, Elseiver/North Holland Biomedical Press, 1976. & H Shortliffe, R Davis, S G Axline, BG Buchanan, C C Green, S N Cohen, Computer-based consultations in clinical therapeutics: explanation and rule acquisition capabilities of the MYCIN system, Computers and Biomedical Research, 8:303-320 (August 1975). E H Shortliffe and B G Buchanan, A Model of Inexact Reasoning in Medicine, Mathematical Biosciences 23:351~379, 1975. Snortliffe EH, Rname F S, Axline S G, Cohen SN, Buchanan BG, Davis R, Scott A C, Chavez~Pardo R, and van Melle W J MYCIN: A computer program providing antimicrobial therapy recommendations (abstract only). Presented at the 28th Annual Meeting, Western Society For Clinical Research, Carmel, CA, 6 Feb 1975. Clin. Res. 23:107a (1975). Reproduced in Clinical Medicine, p. 34, August 1975. Shortliffe EH MYCIN: A rule-based computer program for advising physicians regarding antimicrobial therapy selection (abstract only); Proceedings of the ACM National Congress (SIGBIO Session), p. 739, November 1974. Reproduced in Computing Reviews 16:331 (1975). mh = Shortliffe, S G Axline, BG Buchanan, S N Cohen, Design considerations for a program to provide consultations in clinical therapeutics, Presented at San Diego Biomedical Symposium 1974 (February 6-8, 1974). ic] ta Shortliffe, S G Axline, B G Buchanan, TC Merigan, S N Cohen. An artificial intelligence program to advise physicians regarding antimicrobial therapy, Computers and Biomedical Research, 6:544-560 (1973). Articles About MYCIN "Which Antibiotic?" Emergency Medicine, January 1977, pp 152-162. J. Lederberg 100 Privileged Communication MYCIN PROJECT Section 6.1.4 Current Funding Mycin is currently in the last year of a three year grant, (HS-01544, Dr. Stanley Cohen, principal investigator) from the Bureau of Health Sciences Research and Evaluation. The grant is for $149,982, and expires May 30, 1977. Applications pending A two year renewal of HS-01544 has been submitted to begin June 1, 1977, for $140,000 (direct costs) for the first year. A site visit has been held and the proposal approved but a decision for funding is still pending. A grant from NSF (Dr. Bruce Buchanan, PI) has been approved for two years, to begin June 1, 1977, for $50,000 a year (direct costs). A joint application (with Dr. Jon Heiser of UC Irvine) is currently pending with the Biomedical Engineering Division of NIH. The Stanford part of the grant (Dr. Bruce Buchanan, PI) requests a total of $145,751 over 3 years ($46,609 in the first year), to begin June 1, 1977. Dr. Heiser’s budget requests $147,655 over 3 years ($46,423 in the first year), to begin July 1, 1977. A 5-year proposal to the Biotechnology Resources Program is being prepared for submission by June 1, 1977. II) Interactions with Sumex-—Aim resource Collaborations and medical use of programs Dr. Jon Heiser We have been working with Dr. Jon Heiser of the Department of Psychiatry of the University of California at Irvine, in an effort to create a consultant for the use of psychoactive drugs. We began by creating a version of Mycin that had all of the infectious disease knowledge removed from it, and showed Dr. Heiser how to build up the required base of knowledge about the new field. He has, with his students, developed a small, but functional system that demonstrates encouraging performance on the task. Work has now begun in earnest to extend the competence of this pilot system, to produce a consultant with a useful level of performance, It is interesting to note that the explanation capabilities required no modification whatever, and worked in the new system exactly as designed for the original system, despite the change in domains. Privileged Communication 101 J. Lederberg Section 6.1.4 MYCIN PROJECT INTERNIST Project The Sumex computer has made possible a valuable interaction between researchers on the MYCIN project at Stanford University and those working on the INTERNIST project at the University of Pittsburgn. Tnese researchers are Studying the possible representations and uses for disease models in a medical diagnosis system. Both research groups have been able to run each others programs and to study the medical knowledge bases which are stored on the Sumex computer. Communication between project members has also been greatly facilitated through use of the Sumex systen. Stanford Infectious Disease Faculty Dr. Victor Yu of our group has been actively soliciting the involvement of the Stanford ID faculty in the development and evaluation of Mycin. He recently presented the system to the faculty and fellows of the Department, and has been seeking ways to involve the system in the Department’s educational activities. For instance, medical students under his supervision have used the system during their ID rotation, comparing its results and reasoning process with their own on problems encountered in patients on the wards. The Pulmonary Funetion Facility Members of the Mycin project have also been collaborating with Dr. John Osborn and his co-workers of the Presbyterian Hospital/Pacific Medical Center in San Francisco on the development of a program to interpret the results of standard pulmonary function tests. The program is designed to perform a range of tasks, including: identifying the need to repeat tests because of poor patient effort; identifying the need for additional information in order to make a more definitive diagnosis; reporting and explaining the reasons for primary and secondary diagnoses and severity of any disease state; identifying the relation between diagnosis and any referral diagnosis; and interpreting any change from previous tests, or limitations on the interpretation because of the test metnodology and the patient effort. sharing with other projects Groups at Rutgers University, the University of Pittsburgh, Rochester University, and the University of Virginia Medical Sehool have all been involved in varying degrees with running Mycin and evaluating its performance. They have suggested to us improvements in its design, and stock of medical knowledge, and made useful contributions to its development. In addition, we have made use of the programs developed at both Rutgers and Pittsburgh. The former has been instructive to us in its handling of dynamically changing situations, while the latter has helped us to develop our own ideas about the modelling and use of prototypical descriptions of disease states. The Molgen group at Stanford has also profited from much of our experience in acquiring knowledge and building large knowledge bases. Several of their J. Lederberg 102 Privileged Communication MYCIN PROJECT Section 6.1.4 techniques for accumulating knowledge about genetics are based on extensions to ideas first suggested in some of our work. In all of these cases, the use of Sumex as a national resource has clearly been a critical factor in making possible this sort of interaction. Critique of resource services Local management of the existing resources has been carried out in exemplary fashion. The utility of the facilities has consistently increased, as a direct result of the staff's efforts to identify and respond to needs of the user community. They have actively sought out user comments on current and future services and developed programs to support the research work of the community. In particular, the numerous programs for file editing, searching, manipulation, and storage allocation have helped both in data and program management, and in making the best use of available disk storage. There are, however, additions to the existing resources that would help overcome shortcomings in the available services. In particular, we feel that the addition of more main memory to the system would be an important investment with a Significant payoff. First, with the increasing size of the user community, the typical daytime load on the system has increased to the point where running anything but the smallest program requires substantial patience. Second, our project, like several others, is LISP-based, and uses a large address space. Such programs receive lower priority from the scheduler, and especially with the recently changed scheduling algorithm, our effective service level has decreased significantly. The addition of more main memory would ease both of these problems considerably for a number of users. The addition of more disk space would also be an important improvement in the existing facilities. While it is typically true that disk usage can expand to meet the storage available, we feel that once again the growth of the user community has put a strain on the available resources. We have made extensive use of the archiving facilities, and feel that additional disk space would contribute to the systen’s utility. As noted a moment ago, the recently revised scheduling algorithm has also made its impact felt. We have seen our effective service level on the system decrease, as compared to the amount of service we had been getting at a given load average. While we recognize the national scope of the Sumex charter, and the importance of providing adequate service to tne whole community, there are a number of major projects located at Stanford. The majority of large projects are thus competing for the same share of the system. It seems unreasonable for, say, three sizable LISP programs to be competing for tne same part of the machine, just because they are at Stanford, while a single remote user is receiving nearly all the remaining resource. We recognize the desirability of keeping Sumex a national resource, but wonder if there is a way this can be done without penalizing systems just because they originate at Stanford. Finally, there is a smaller scale project which would. also make a substantive contribution to the utility of the resource. Currently a prograa called PUB is the major text formatting ("word processing") program in use. It Privileged Communication 103 J. Lederberg Section 6.1.4 MYCIN PROJECT is something of an historical relic, and is quite large, not totally reliabie, and rather difficult to use. It is remarkably powerful, but most users make relatively little use of its more impressive powers. Since preparing technical reports, progress reports, and thought-pieces on proposed or in-progress work are all an integral part of doing research, facilities that ease the task can make an important contribution to the progress of work. A new program, designed along the lines of PUB, but much smaller and of proven reliability, would be an important contribution to the research efforts of the community. It would require on the order of one man-year to create, but given the anticipated drain on system resources presented by the amount of technical writing done by the community, tnis investment would quickly be paid back many times over. III) Follow-on Long range project goals The long-term goals of our project center around further development of our ideas on computer-based medical consultants. We intend, for instance, to extend both the depth and breadth of the system’s range of competence. The extension in breadth will be an important demonstration of the power of the approach we have taken, since the problem of scale is a traditional pitfall that has trapped a number of other efforts in AI. We believe that our techniques provide the basis for continued effective performance, even with a much larger knowledge base that handles a wider scope of medical problems. This can only be tested, of course, by actually enlarging tne knowledge oase and widening the program’s scope. By extending the "depth" of the program’s competence, we mean dissecting still further the concepts on which its judgments are based. The current systen, for instance, asks the doctor if the patient is "febrile due to the infection”. In practice, this is a difficult judgment to make, and it is precisely on such difficult judgmental issues that Mycin should be able to offer assistance. By asking our clinicians to specify how they decide that a patient is febrile due to an infection, we can break down this vague notion into a number of distinct decision rules. The resulting program will make fewer demands on the user, and hence will offer a more effective source of consultative advice. We also believe that the best hardware for many AI researen efforts lies in the direction of independent minicomputers arranged as a satellite to a central system, and capable of running high level languages (like LISP). A second of our long-term goals, then, is to develop a version of our program capable of running on such a system. Since there are currently a number of efforts aimed at developing both high level languages for mini-machines, and minicomputer architectures capable of running high level languages, Sumex could benefit substantially from this work if the AIM Committee begins now to plan to take advantage of these developments. We also plan to extend the generality of the system we have developed, to make it possible for experts in other medical (and medically-related) areas to J. Lederberg 104 Privileged Communication MYCIN PROJECT Section 6.1.4 use it as a framework for assembling their own set of decision rules, to create consultants for their own specialties. We have already attempted several pilot Studies along these lines (tne work with Dr. Heiser on psycnopharmacology, and with Dr. Osborn on pulmonary function). Each of these has demonstrated to us a number of generalizations that our current techniques require. We plan to make these changes, and continue to develop a system usable by a wide range of specialists, as part of our interest in the art of building expert systems. A necessary parallel development to this will be improvements in the rule- based representation of knowledge and a better understanding of the process of elinical decision making. While our decision rules offer a number of advantages, we have also seen some drawbacks in them, and plan to work on overcoming the problems without losing the advantages they offer. Our present model of decision making under uncertainty is still elementary and intuitive -- further work is needed to make it more formal and ground it firmly in well understood principles. This will also facilitate work on other problem, such as checking the internal consistency of the entire set of rules. Justification Our project is concerned with a range of problems that are central to both medical care and AI researen. Earlier sections of this report covered the significance of the specific problem of antibiotic misuse. More generally, the problem of medical decision making is one that has received much attention, and has not yet yielded to a definitive solution. Tne availability of computer-based advisors for difficult clinical problems would be a useful step in combatting the current maldistribdution of specialists. With network links to centralized machines, or mini-macnines inexpensive enough to be exported as a unit, hospitals in outlying rural areas might have available a sophisticated source of medical advice. . The development of computer~based consultants is a mainstream issue in AT researen. Its specific goals are to produce expert performance on a "real world" problem, and to make that expertise available to users who might not normally be involved with computers. Producing a system that both offers high performance and presents a reasonable interface to the user means solving a difficult problem with a number of constraints. High performance alone is not enough, since the system must be usable by a computer-naive audience. This means more than simply reasonable I/O facilities, and implies the need for such things as the explanatory capabilities currently a part of Mycin. More generally, the issue of accumulating, representing, and using large stores of task-specific knowledge is an important thrust of current AI research. Ever since the failure of the original GPS-type approach to problem solving (in which problem solving power comes from a single, domain-independent paradigm), interest has been focussed on the use of large stores of domain-specific knowledge as a source of high performance. This has been a orimary theme of the work on Mycin from the outset, and our efforts have produced a number of insights about the design and construction of such systems. We have emphasized, for instance, the importance of keeping a sharp distinction between the base of task. specific knowledge and the interpreter which uses that information to solve problems. This design pays off both by easing the task of building the knowledge Privileged Communication 105 J. Lederberg Section 6.1.4 MYCIN PROJECT base, and by increasing the range of applicability of the underlying system (i.e., different knowledge bases can be "plugzed in" to the same underlying system). Finally, a number of other projects have been "spun off" as a direct result of ours. The pulmonary function work and the work by Dr. Heiser’s group are both outgrowths of Mycin, and have both begun to produce their own substantive results. Future resource goals As noted earlier, we see the development of ninicomputers that run high level languages as an important future trend that will affect much of the work in AI. We believe it will be especially advantageous for Sumex to take advantage of these developments. Adding a small number of these minicomputers as satellites to the main system would present a number of important advances. First, many of the research efforts currently underway involve large, LISP-based programs that Significantly impact the system load. By providing satellite machines to which those large systems could be shifted, the system load would lighten considerably and the large systems would themselves run much faster. Second, it would mean more efficient use of resources, since adding these satellite systems would require little or no additional tapes, disks, printers, etc. Finally, many projects are in a situation parallel to ours, in that work proceeds on two fronts Simultaneously. One one hand, new ideas are being generated about how a progran should work, or what tasks it might perform. These are implemented and tried out in a test version of the program. On the other hand, once those ideas prove practical, there is often an extensive period of development that requires a more stable version of the program. The architecture suggested here, of a main system with satellite machines, offers an excellent environment for this work, since smaller test versions of a program can be used as a "proving ground" on the main machine, wnile the larger, stabilized versions are further developed by running them on the satellite machines. The sort of arrangement is most effective when transition between systems is almost invisible --~ that is, when little or nothing need be done to shift from the central machine to a satellite. This is easiest to do when there are high- bandwidth data links betwen machines, and satellite machines capable of running the same programming language as the central machine. We believe it would be important to provide Sumex support for both the software as well as the hardware problems involved in creating this sort of environment. One effort in this direction (Mainsail) is currently underway, and parallel efforts at other locations are involved in producing a version of LISP that will run on small machines. While there is no need to duplicate these latter efforts, we feel it would be important for Sumex to stay closely coupled fo them, so that their results can easily and quickly be implemented here. Given the number of projects which could make significant use of these results, and the impact those projects currently nave on the system, we believe the investment in time and effort would pay off quite well. J. Lederberg 106 Privileged Communication MYCIN PROJECT Section 6.1.4 References [1] Reiman H H, D’ambola J, The use and cost of antimicrobials in hospitals, Arch Environ Health, 13:631-636 (1966). [2] Kunin C M, et.al., Use of antibiotics: a brief exposition of the problem and some tentative solutions, Anns Int Med, 79:555-560 (1973). [3] Sheckler WE, Bennett J V, Antibiotic usgae in seven community hospitals, J Amer Med Assoc, 213:264-267 (1970). [4] Roberts A W, Visconti J A, The rational and irrational use of systemic antimicrobial drugs, Amer J Yosp Pharm, 29:828-834 (1972}. [5] Simmons H E, Stolley P D, This is medical progress? Trends and consequences of antibiotic use in the United States, J Amer Med Assoc, 227:1023-1026 (1974). [6] Kagan BM, Fanin § L, Bardie F, Spotlight on antimicrobial agents, JAMA, 226 : 306-310 (1973). Privileged Communication 1907 J. Lederberg Section 6.1.5 PROTEIN STRUCTURE PROJECT 6.1.5 PROTEIN STRUCTURE PROJECT Protein Structure Modeling Project Prof. J. Kraut and Dr. S. Freer (Chemistry, U. C. San Diego) and Prof. &. Feigenbaum and Dr. R. Engelmore (Computer Science, Stanford) I. Summary of research program A. Technical goals The goals of the protein structure modeling project are to 1) identify critical tasks in protein structure elucidation which may benefit by the application of AI problem-solving techniques, and 2) design and implement programs to perform those tasks. We have identified two principal areas which have both practical and theoretical interest to both protein erystallographers and computer scientists working in AI. The first is the problem of interpreting a three-dimensional electron density map. The second is the problem of determining a plausible structure in the absence of phase information normally inferred from experimental isomorphous replacement data. Current emphasis is on the implementation of a program for interpreting electron density (e.d.) maps. B. Medical relevance and collaboration Tne biomedical relevance of protein erystallography has been well stated in a recent textbook on the subject (Blundell & Johnson, Protein Crystallography, Academic Press, 1975): "Protein Crystallography is the application of the techniques of X-ray diffraction ... to crystals of one of the most important classes of biological molecules, the proteins. ... It is known that the diverse biological functions of these complex molecules are determined by and are dependent upon their three-dimensional structure and upon the ability of these structures to respond to otner molecules by echanzes in shape. At the present time X-ray analysis of protein crystals forms the only method by which detailed structural information {in terms of the spatial coordinates of the atoms) may be obtained. The results of these analyses have provided firm structural evidence which, together with biochemical and chemical studies, imnediately suggests oroposals concerning the molecular basis of biological activity." The project is a collaboration of computer scientists at Stanford University and crystallographers at the University of California at San Diego (under the direction of Prof. Joseph Kraut) and at Oak Ridge National Laboratories (Dr. Carroll Johnson). J. Lederberg 108 Privileged Communication PROTEIN STRUCTURE PROJECT Section 6.1.5 C. Progress summary During the past year we have been designing and implementing a system of programs for interpreting three-dimensional e.d. maps. Progress has been made by attacking the problem from two directions: working upward from the primary data (i.e. the array of e.d. values) to higher Level symbolic abstractions, and working downward from the given amino acid sequence and other experimental information to generate candidate structures which can then be confirmed by the abstracted data. In the "bottom-up" area of research we have developed and implemented programs for analyzing topological features of the skeletonized e.d. map in terms of protein structural elements (e.g., side chains, chain ends, bridges, etc.), for finding local maxima, and, recently for generating a critical point network, i.e. a three-dimensional spanning tree which connects all critical points (peaks, saddle points) found in the map. In the "top-down" area we have designed and implemented, in INTERLISP, a structure inference program which generates structural nypotheses at several levels of detail. At present the program can infer, from the amino acid sequence and other chemical information, and the symbolic abstractions of the e.d. map, the location of heavy atoms, cofactors and chain ends. Those features provide toenolds, i.e. islands of certainty, from which additional structure is inferred by extension. Work is currently in progress on identification of the main chain, disambiguation of multiply connected regions and classification of side chain regions. Tne system under development is knowledge-based. Both the corpus of knowledge of the task domain and the problem-solving strategy knowledge are incorporated as production-like rules. D. List of Publications 1) Robert S. Engelmore and H. Penny Nii, "A Knowledge-Based System for the Interpretation of Protein X-Ray Crystallographic Data," Heuristic Programming Project Memo HPP-77-2, January, 1977. (Alternate identification: STAN-CS-77- 589 ) 2) E.A. Feigenbaum, R.S. Engelmore, C.K. Johnson, "A Correlation Between Crystallographic Computing and Artificial Intelligence," in Acta Crystallographica, A33:13, (1977). (Alternate identification: HPP~77-25) BE. Funding status The project recently received a renewal of its funding from the National Science Foundation. The new research period began on May 1, 1977, and is for a two year period at a funding level of $75,000 ver year. No other applications are pending. Privileged Communication 109 J. Lederberg Seation 6.1.5 PROTEIN STRUCTURE PROJECT If. Interaction with the SUMEX-AIM resource A. Collaborations The protein structure modeling project has been a collaborative effort since its inception, involving co-workers at Stanford and UCSD (and, more recently, at Oak Ridze). The SUMEX facility has provided a focus for the communication of knowledge, programs and data. Without the special facilities provided by SUMBEX the research would be seriously impeded. Computer networking nas been especially effective in facilitating the transfer of information. For example, the more traditional computational analyses of the UCSD crystallographic data are made at the CDC 7600 facility at Berkeley. As the processed data, specifically the e.d maps and their Fourier transforas, become available, they are transferred to SUMEX via the FTP facility of tne ARPA net, with a minimum of fuss. (Unfortunately, other methods of data transfer are often necessary as well -~ see below.) Programs developed at SUMEX, or transferred to SUMEX from other laboratories, are shared directly among the collaoorators. Indeed, with some of the programs which have originated at UCSD and elsewhere, our off-campus collaborators frequently find it easier to use the SUMEX versions because of the interactive computing environment and ease of access. Advice, progress reports, new ideas, general information, etc. are communicated via the message and/or bulletin board facilities. B. Interaction with other SUMEX-AIM projects Our interactions with other SUMEX-AIM projects have been mostly in the form of personal contacts. We have strong ties to the DENDRAL, Meta-DENDRAL and MOLGEN projects and keep abreast of research in those areas on a regular basis through informal discussions. Tne SUMEX-AIM worxshoop in June, 1976 provided an excellent opportunity to survey all the projects in the community. Common research tnemes, e.g. knowledge-based systems, as well as alternate problen- solving methodologies were particularly valuable to share. (That workshop was very likely the most significant conference for applied AI to be held in 1976.) C. Critique of Resource services On the whole the services provided by SUMEX nave been excellent, aonsidering the large demand on its resources. With the important exceptions of high peaks in the weekday prime-time load average, the ratio of CPU time to total wait time during program execution is usually acceptable. The facility provides a wide spectrum of computing services which are genuinely useful to our project ~~ message handling, file management, Interlisp, Fortran and text editors come immediately to mind. Moreover, the staff, particularly the operators, are to be commended for their willingness to help solve special problems (2.g., reading tapes) or providing extra service (e.g., and immediate retrieval of an archived file). Such cooperative behavior is rare in computer centers. A serious fault in the system is the lack of reliable tape drives, and the paucity of the present software for handling tape files. Much of our data from the outside world is received on magnetic tape, and almost never in the unusual J. Lederberg 110 Privileged Communication PROTEIN STRUCTURE PROJECT Section 6.1.5 PDP-10 format. We urge that the existing tape drives be replaced, and software be provided to facilitate the input of data in non-standard formats. (At the present time there is not even a program to provide a byte-by-byte dump when all else fails.) III. Use of SUMEX during the follow-on grant period (38/78 - 7/83) A. Long-range goals Our current research grant extends through April, 1979. During that time we intend to bring the structure modeling system to a level of performance that permits reliable qualitative interpretation of high resolution e.d. maps, derived from real data and a correct amino acid sequence. We also plan to exploit the flexibility of the rule-based control structure to permit investigation of alternate problem-solving strategies and modes of explanation of the program’s reasoning steps. Beyond the next two years, emphasis will >be placed on expanding and generalizing the system to relax the constraints of resolution and accuracy ia the input data. B. Justification for continued use of SUME} The biomedical relevance of the protein structure modeling project, coupled with the need for building a computational system with a significant component of symbolic inference, qualifies the project as an AIM-relevant endeavor. SUMEX provides an excellent computing environment for creating and debugging prograas (in a variety of languages), for sharing and distributing info-mation among geographically dispersed co-workers, and for keeping up with current research in other AIM areas. Our project is clearly too small to justify an independent computing facility, and other large computer centers that are conveniently accessible do not fulfill our requisites. Consequently SUMEX has been and hopefully will continue to be an integral researen tool in this project. c. Comments and suggestions Two improvements to the system which, though not critical, would appreciably upgrade the service provided: 1. Connection of SUMEX to a non-military network which permits file transfer at a reasonably high rate (at least 48090 baud). The restrictions imposed on the use of the ARPA network prohibit using it to transnit Large orogran aad/or data files between SUMEX and the UCSD computing facilities. The availability of such a connection would, for example, permit us to use their E&S interactive graphics system to display and visually exauine the structures hypothesized by our automated modeling system. 2- Addition of 256K of main memory, to give more rapid response during the peak hours. This would seem to be a natural extension to the system, to complement tne second KI-19 installed last year, and would more fully realize the potential of the second CPU. Privileged Communication 111 J. Lederberg Section 6.2 NATIONAL AIM PROJECTS 6.2 NATIONAL AIM PROJECTS The following group of projects is formally approved for access to the AIM aliquot of the SUMEX~AIM resource. Their access is based on review by the AIM Advisory Group and approval by the AIM Exeautive Conmittee. J. Lederberg 112 Privileged Connunisation ACQILSITION OF COGNITIVE PROCEDURES (ACT) Section 6 r 0.2.1 ACQUISITION OF COGNITIVE PROCEDURES CACT) Acquisition of Cognitive Procedures (ACT) Dr. John Anderson Yale University (Grant NIMH MH29353 $25,000 this year) (Contract ONR NOO14-77-6-0242 $74,000 this year) I. Summary of Researeh Program A. Technical goals: To develop a production system that will serve as an interpreter of the active portion of an associative network. To model a range of cognitive tasks including memory tasks, inferential reasoning, language processing, and problem solving. To develop an induction system capable of acquiring cognitive procedures with a special emphasis on language acquisition. B. Medical relevance and collaboration: 1. Tne ACT model is a general model of cognition. It provides a useful model of the development of and performance of the sorts of decision making that occur in medicine. 2. The ACT model also represents basic work in AI. It is in part an attempt to develop a self-organizing intelligent system. As such it is relevant to the goal of development of intelligent artificial aids in medicine, We have been evolving a collaborative relationship with Dr. James Greano and Allan Lesgold at the University of Pittsburgh. They are applying ACT to modeling the acquisition of reading and problem solving skills. We plan to make ACT a guest system within SUMEX. ACT is currently at the state where it can be shipped to other INTERLISP facilities. We have received a number of inquiries about the ACT systen. ACT is a system in a continual state of develooment but we periodically freeze versions of ACT which we maintain and make available to the national AI community. Cc. Progress and accomplishments: ACT provides a uniform set of theoretical mechanisms to model such aspects of human cognition as memory, inferential processes, language processing, and proolem solving. ACT’s knowledge base consists of two components, a propositional component and a procedural component. Te propositional component i3 provided by an associative network encoding a set of fasts known about the world. This provides the systen’s semantic menory. The orocedural component Privileged Communication 113 J. Lederberg Section 6.2.1 ACQUISITION OF COGNITIVE PROCEDURES (ACT) consists of a set of productions which operate on the associative network. ACT’s production system is considerably different than many of the other currently available systems (e.g., Newell’s PSG). These differences have been introduced in order to create a system that will operate on an associative network and in order to accurately model certain aspects of numan cognition. A small portion of the semantic network is active at any point in time. Productions can only inspect that portion of the network which is active at the particular time. This restriction to the active portion of the network provides a means to focus the ACT system in a large data base of facts. Activation can spread down network paths fron active nodes to activate new nodes anid links. To prevent activation from growing continuously there is a dampening process whieh periodically deactivates all but a select few nodes. The condition of a production specifies that certain features be true of the active portion of the network. The action of a production specifies that certain changes be made to tne network. Hach production can be conceived of as an independant "demon." Its ourpose is to see if the network configuration specified in its eondition is satisfied in the active portion. If it is, the production will execute and cause manges to menory. In so doing it can allow or disallow other productions which are looking for their conditions to be satisfied. Both the spread of activation and the selection of productions are parallel processes whose rates are controlled by "strengths" of network links and individual productions. Aa important aspect of this parallelisa is that it is possible for multiple peoductions to pe applied in a cycle throuzh the set of productions. Much of the early work on the ACT system was focused on developing conoubational devices to reflect the operation of parallel, strengthn-controlled processes and working out the logic for creating functioning systems in such a computational medium. We have successfully implemented a number of small-scale systems that model various psychological tasks in the domain of memory, languaze processing, and inferential reasoning. A larger scale effort is underway to model the language provessing mecnanisns of a young ehnild. This includes implementation of a production systen to analyze linguistic input, sake inferences, ask and answer questions, etc. Also a great deal of effort is being given to developing learning mechanisms that will acquire and organize the productions for this language processing. This learning progran attempts to acquire procedures Fron examples of the computations desired of tne orocedures. For instance, the program learns to comprehend and generate sentences by being given sentences and picture representations of the meaning of the sentences (actually hand encodings of the pictures). Although this effort is focused on induction of linguistic procedures, the hope is to develop a general model of induction of cognitive procedures and not to place any language-specificity into the induction procedures. At the time of this report, we have conpleted the F version of ACT which is the system with learning capabilities. We are currently testing and tuning tne system on a nunber of linguistic examples. Other projects which are progressins ia earlier versions of ACT include use of spreading activation to model semantic disambiguation, modeling of the reading process, and modeling of solutions to word arithmetic problems. J. Lederberg 114 Privileged Communication ACQUISITION OF COGNITIVE PROCEDURES (ACT) Section oO . N * —_ D. Current list of project publications: [1] Anderson, J.R. Computer simulation of a language acquisition systea: A second report. In D. LaBerge and S.J. Samuels (fds.). Paresotion ani Comprehensiog. Hillsdale, N.J.: L. Erlbaum Assoc., 1975. [2] Anderson, J.R. Language, Memory, and Thought. Hillsdale, N.J.: L. Erlbaum, Assoc., 1976. [3] Anderson, J.R. Induction of augmented transition networks. Cognitive science, 1977, in press. [4] Anderson, J.R. & Kline, P. Design of a production systea. Papar to be presented at the Workshop on Pattera-Directed Inference Systems, Hawaii, May 23-27, 1977. [5] Anderson, J.R., Kline, P. & Lewis, C. Language processing by production systems. To appear in P. Carpenter and M. Just (Eds.). Cognitive Processes in Comprehension. L. Erlbaum Assoc., 1977. [6] Kline, P.J. & Anderson, J.P. The ACTE User’s Manual, 1976. II. Interaction With the SUMEX-AIM Resource The SUMEX-AIM resource is superbly suited for the needs of our project. We nave made the most extensive use of the INTERLISP facilities and the facilities for communication on the ARPANET. We have found the SUMEX personnel extremely helpful both in terms of responding to our immediate emergencies and in providing advice helpful to the long-range progress of the project. Despite the fact that we are on the other side of the continent, we have felt almost no degradation in our ability to do research. We find we can easily list oa the terminal a small portion of programs under modification. The willingness of SUMEX mail Listing has also meant we can keep relatively up-to-date records of all programs under development. A unique east coast advantage of working with SUMEX is the low loading of the system during the mornings. We have been able to zet a zreat deal of work done during these hours and try to save our computer—intensive work for thesa Hours. We have found our one AIM work shop so far (1976) a very useful opportunity to meet with colleagues and exchange ideas. A particularly striking example of the utility of the SUMEX resource was illustrated in the move from Michigan. In the summer of 1975 Anderson moved to Yale and Greeno to Pittsburgh. There was no loss at all associated with naving to transfer programs fron one system to another. At Yale we wera programming the day after we arrived. The SUMfX link has also permitted continued collaboration with Greeno. From our point of view, the only stress in the SUMEX resource involves the issue of the tight file space. We are managing with some difficulty to stay Privileged Communication 115 J. Lederberg Section 6.2.1 ACQUISITION OF COGNITIVE PROCEDURES (ACT) within our allocation. We do not feel that our allocation is unfair (in light of overall availability and number of users) but we do feel that all users would be adle to work more comfortaoly with the system if there were more file space available. While we recognize the need for purges when projects exceed their allocation, we feel that it would be useful if this purge could be made more intelligent, perhaps purging according to a user defined priority. Tift. Follow-On SUMEX Grant Period (8/78-7/83) A. Long-range user project goals and plans: Qur long-range goals are: (1) continued development of the ACT systen; (2) application of the system to modeling of various cognitive processes; (3) Dissemination of the ACT system to the national AI community. 1. System Development: We are just completing the F version of the ACT system. We fully anticipate that its design will undergo considerable caange after we have explored and tested its empirical consequences. We are eurreatly applying or intend to apply the ACT system to modeling the acquisition and/or performance of cognitive skills in the areas of language comprehension and generation, inferential reasoning, reading skills, problem solving, and memory retrieval. It is hard to anticipate now the impact of these explorations for design decisions in later versions of ACT. However, it is elear now that we will want to make ACT more appropriate as a language for programming cognitive skills. This will involve such things as development of more powerful control conventions, simplication of syntax, and introduction of direct programming features (such as comparison of quantity magnitudes) that can only be obtained indirectly now. We would also like to introduce more efficient implementation teenniques to replace some of the Simple devices that were used to enable us to rapidly complete the system. These rearchitecture efforts have to be done within the constraints of psychological plausibility, but we have a theoretical commitment to the conjecture that good implementation design is predictive of good psychological mechanisms. 2. Application of Modeling Cognitive Processes: We anticipate a gradual deorease in the amount of effort that will go into system development and an increase in the amount of effort that will go into application of the system for modeling. We mentioned above the modeling efforts that we are using to assess the suitability of the ACTF system. We have long-range commitments to apply the ACT learning model to the following three topics: Acquisition of language (both first and second languaze acquisition); acquisition of reasoning and memory-management skills for geography; acquisition of problem solving skills in the domain of geometry. We find each of these topics to be considerable interest in and of themselves, but they also will serve as strong tests of the learning model. We are hopeful that the systems that are acquired by ACT will satisfy computational standards of zood artifietal intelligence. Therefore, in future years we would also be interested in applying the ACT model to acquisition of cognitive skills in medically related domains such as diagnosis or scientific inference. SUMEX would be an ideal location for collaboration on such a project. J. Lederberg 116 Privileged Communication ACQUISITION OF COGNITIVE PROCEDURES (ACT) Section 4.2.1 3. Dissemination of the ACT Project: We have a commitment to making the ACT System available to anyone in the national community woo has access to the necessary computer resources. This is partially to provide a service in that ACT is a medium for psychological modeling. However, it is also self-serving in that the use of other people make of ACT has important feedback in assessing design decisions. In light of limitations oa the SUMESX resource, we have decided not to allow extensive use of ACT by other researchers through our SUMEX account. We feel that extensive use of the ACT system in SUMEX by another researcher must have the status of an independent project and must be able to justify independently its use of the SUMEX-AIM resource. B. Justification for continued use of SUMEX: We feel that the justification for our use of SUMEX has only been strengthened since the time of our original application for user status. The project meets a number of criteria for SUMEX relevance: Project support comes from NIMH. The project is concerned with cognitive modeling which is a SUMEX goal. The project is also developing an AI tool which can be used to help automate various medically-relevant tasks. We also think we are the type of need that the SUMEX facility was designed to meet. That is, we do not have nearly as powerful computing facilities local at Yale; w2 are noa-loeal user; we are using SUMEX as a base for collaborating with scientists in other parts of the country; and we are trying to develop a system that will be of zeneral use. C. Comments and suggestions for future resource goals: We would, of course, be delighted if the computational capacity of the SUMEX facility could be increased. We suffer nost severly with the file space limitation. The other limitation is the slowness of the systen at peak hours. Tnis problem is perhaps less grievous for us than Stanford-based users because of our ability to use morning hours. We do not feel any urgent need for development of new software. Our work is growing to such a size that we would find it useful to have a local ARPANET tip. We are currently discussing this possibility with our ONR officials. Such a tip might be justifiable given additional needs of other AT people at Yale. The consequence of such a TIP for the future Dlannins of SUMEX resources is that we would then change our access to SUMEX from the TYMNET to the ARPANET, thus relieving SUMEX of the need to support our TYMSHARE costs. Privileged Communication 117 J. Lederberg ¢ Section 6.2.2 CHEMICAL SYNTHESIS PROJECT (SECS) 6.2.2 CHEMICAL SYNTHESIS PROJECT (SECS) SECS - Simulation and Evaluation of Crenaieal Synthesis W. Todd Wipxe Department of Chenistry University of California at Santa Cruz I. Summary of Hesearch Prograa A. Technical Goals. The long range goal of this project is to develop the logical principles of molecular construction and to use these in developing practical computer programs to assist investigators in designing stereospecific syntheses of complex bio-~ organic molecules. Our specifie goals this past year focused on improvement of the library of chemical transforus, completion of the perception of molecular symmetry and integrating the use of symmetry information throughout SECS including the strategy module. We also wanted to improve the execution speed of S#CS, and the speed of graphical interaction over remote communication lines. We planned to simplify the program from the user’s viewpoint by ineluding automatic file failsafing, improvement of HELP commands, and non-fatal handling of all errors, as well as production of user’s manuals for operation of the program and the writing of chemical transforms. Additionally we intended to initiate applications of SECS to the areas of biosynthesis and metabolism of compounds, as well as phosphorus chemistry. Finally we hoped to improve the strategic constraints and controls that guide SECS in growing a synthesis tree. B. Medical Relevance and Collaboration. The development of new drugs and the study of how drug structure i3 related to biological activity depends upon the chemist’s ability to synthesize new molecules as well as his ability to modify existing structures, e.g., incorporating isotopic labels into biomolecular substrates. Tne Simulation and Evaluation of Chemical Synthesis (SECS) project aims at assisting the chemist in designing stereospecific syntheses of biologically important molecules. The advantages of this computer approach over a manual approaches are manyfold: 1) greatec speed in designing a synthesis; 2) freedon fron bias of past expertence and past solutions; 3) thorough consideration of all possible syntheses using a more extensive library of chemical reacticns than any individual person can remember; 4) greater capability of the computer to deal with the many structures which result; and 6) capability of computer to see molecules in graph theoretical sense, free from bias of 2-D projection. SECS was designed to be able to apply any kind of chemical transformation, and because of this generality we see SCS finding aoplication in biogenesis and metabolism (see section II A below). The objective of using SECS in biogenesis is to predict possible biogenetic pathways for a given natural product and also J. Lederberg 113 Privileged Communication CHSMICAL SYNTHESIS PROJECT (SECS) section §.2.2 to predict related compounds which might also co-occur in nature. This can be a great aid in searcning for new natural oroducts and in structure elucidation. The objective of using SECS in metabolism is to predict the plausible metabolites of a given xenobiotic in order that they may be analyzed for possible carcinogenicity. Metabolism research may also find this useful in the identification of metabolites in that it suggests what to look for, and in the identification of possible metabolic patnways connecting a setabolite to a xenobiotic. C. Progress and Accomplishments. RESEARCH ENVIRONMENT: At the University of California, Santa Cruz, we have a GT-40 graphics terminal connected to the SUMEX-AIM resource by a 1200 baud leased line and a TI 725 thermal printing teletype comected via TYMNZT at 300 baud. UCSC has only a small IBM 370/145 and a PDP-~11/45 (limit of 12 K words per user) available, both of which are unsuitable for this research. Fron July until December our research group had to occupy temporary soace during renovation, bat is now finally in permanent space in Taimann Laboratories where we have close collaboration with other organic chemists. CHEMICAL TRANSFORMS: The library of chemical transforms has been reorganized and reevaluated during the past year by Mr. Dolata, a student of Professor D.A. Evans of Cal Tech. Wew reactions were added and the seope and limitations of others were updated and leading refecences provided. Additionally, Merck, Sharp, and Dohme Research Lavoratories orovided revisions of any transforms which a group of 25 synthetic chemists had carefully researched. SYMMETRY: An efficient algorithm for recognizing molecular symmetry was developed last year. This year that algorithm has been tested against all possible molecular point groups and a few problems which developed were corrected. Tne algorithnn has been docunented and initial studies begun on actually determining the point group of a molecule. The symmetry group is now utilized in conjunction with the symmetry of a chemical transfora so the transform is applied in all possible unique ways, to generate a non-redundant set of precursors. This symmetry of course takes into account stereochemistry of Saturated centers and double bonds. We have surveyed literature syntheses for examples of existing heuristies based on symmetry which can be used for automatically generating high level strategies. This information has never deen pulled together before and should make an interesting contribution also to organic synthesis. STRATEGIC CONTROL: Last year we began developing an inplementation of Strategic control for SECS, and a simple language for expressing strategies independent of chemical transforms. Since these strategies contain expressions wiich refer to the molecular structure, it was also necessary to incorporate syfgetry here too. For example, if a particular boad is dasisnated as sbratezia “2 Dreak, but a transform breaks another boad, the steratezy is still satisfied if t2 two bonds are equivalent by symmetry. This oprobdlea bdesonmes more complex when pairs of bonds are specified and when there are logical connectives (AND, OR, XOR, and NOT) involved. This has however been solved. Otner changes since last year include a completely new user interface to strategy to allow error Privileged Communication ¥19 J. Lederbers Seetion 4.2.2 CHEMICAL SYNTHESIS PROJECT (S8CS) correction and very easy modification of goals, Finally quantitative exper dave been performed to measure the effect of developing a syntiuesis tree wi various types of strategic constraints. Tne net result of this work is tha user can more easily constrain SECS now to work only in areas which the user decides are worthwhile, consequently fewer precursors are generated which the user would delete. USER INTERFACES: Users of SECS had difficulty understanding how to copy files into work areas in order to save or restore synthesis trees. Now SECS does all file manipulation, eliminating the problem. Further SECS now automatically failsafes the synthesis tree at key points so that in the event of machine or communication failure the user can automatically restart his analysis from the last key point. Considerable modifications were made to the graphical interface for increasing readability and speed of interaction. Over long slow communication lines (which happens to be the way most SECS users are accessLog the program) interactive graphics must be done with care, minimizing the amount and frequency of picture transmission, in order to achieve even tolerable man- macoine comaunication. Lastly, we have implemented aporooriate input proesduras to eliminate the possibility of a fatal crash from user input errors. According to user reports this was a major problem. PHOSPHORUS CHEMISTRY: Graphical input and output procedures were developed for entering the stereochemical configuration of a trigonal bipyrimid (TBP) phosphorus atom and for producing a correct structural diagram fron the machine’s internal representation. The SEMA algorithn for generating a stereochemically unique name was extended to deal with the 29 possible confisgurabions for aaa T3? ooiber, including the ability to recognize enantioners. The ALCHE4 langzuaze for rajessenting chemical transforms was extended to facilitate manipulation of TBP’s, including changes from trigonal and tetrahedral econrigurations to square base pyramid and TBP. Queries may deal with apicophilicity, and axial or equatorial orientation. The fine details of phospnorus chemistry such as the fact that groups entering or leaving the phosphorus coordination sphere nornaliy do so from the apical position. Pseudo rotation, apicophilisity, aid steaina aneesy ace considered ia evaluating the stable TBP coafiguratioas aad in ohecktag For [deatieal structures. A library of phospnorus chenistry is now Dela pespared in collaboration with a group at the University of Strasbours, COMPUTER-AIDED ELUCIDATION OF BIOGENETIC PATHWAYS: Altnouzh 4a great amount of effort has been spent on various areas of biogenesis, there have been few attempts to develop general techniques for the elucidation of biogenetic schemes. As a result, the formulation of biogenetic schemes has often been criticized for its lack of rigor and explicit criteria. Our approach is to develop general Ceahniques which lead to the postulation of plausible biozganeatie pathways, using the SECS as an aide in obtaining and analyzing solutioas to this aoaolex oroblan. Tt is our hope this application of computer vroblea solving teaiatyues witli not JQaly uncover new ways of recognizing aad evaluating biogenetic pathways out also provide added support to deductions made from biogenetic schemes, such as the generality of a scheme which may be tested in only a few species. With the proper input information and goals well defined there may be explicit rules to guide the chemist to plausible biosenetic pathways for a particular natural product. Unfortunately, the vast aajority of solutions to tnois provlem are determined oy a combiaation of the expartenast natural produets J. utderbsrg 129 Privileged Communication CHEMICAL SYNTHESIS PROJECT (SECS) Section 6.2.2 chemist’s ability to consider the most important rules involved and his unique set of experience-based prejudices. There may be some means to represent and utilize all of the known relevant rules, data and possibly even experience-based prejudices to arrive at the best plausible pathways. The most precise method for representing, developing and testing such a theory is in tne form of a computer program. To implement such a computer program, known rules and constraints must be clearly defined, then those that are applicable can be applied at each step of the analysis toward the desired goal. This will keep the solution pathways logically pure and insure that all alternatives which satisfy the rules and constraints are considered. This guarantee of completeness simply can not be made using hand analysis. A new reaction library containing biogenetic transformations have been written, After inputting a natural product the program will apply the biogenetic transforms which fit the natural product. This generates a set of plausible biogenetic precursors to the target natural product. By continuing this process with the precursors generated, the plausible bioganetic pathways for the natural product can be discovered. The structures of marine natural products were entered into the program and the plausible biogenetic pathways for these compounds were generated and analysed. Biogenetic pathways which had been proposed in the literature were among the pathways discovered, as were other plausible pathways which would now have to be considered. The success we attained in this research effort verified tne applicability of the SECS program as an aid in the analysis of metabolic pathways. COMPUTER-AIDED PREDICTION OF METABOLITES FOR CARCINOGENICITY STUDIES: We have initiated a research project in collaboration with the Chemical Carcinogenesis group at the National Cancer Institute. The objective of this research is to establish a computer program by which a biochemist or metabolism expert can explore the metabolism of a chemical compound. The investigator enters the substrate molecule by interacting with an input and structure editing module. Tnen the program will apply the biological transforms which "fit" the structure, taking into consideration all the context information (2-D, 3-D, and electronic) available about the transform and all perceived information about the Structure. This will generate a set of metabolites which are one step away from the substrate structure. The metabolites will be ranked according to expected probability or yield. The exact parameters which should be monitored will be determined during the course of this research. An evaluation module may then sereen these metabolites according to criteria specified by the investigator, Duplicate metabolites arising from different pathways will be labelled to indicate that fact. Finally the investigator will be shown the set of metabolites together with data about the transform which produced each one and the values of the parameters being monitored. The investigator may select one metabolite for further metabolism or may request that all be processed for a specified number of steps. In this way a "tree" of metabolites is produced and displayed. The entire state of the user’s tree may be saved to permit continuation of the analysis at another time. Exploration of the metabolism tree will be predominately guided interactively by Privileged Communication 121 J. Lederberg Section 6.2.2 CHEMICAL SYNTHESIS PROJECT (SECS) the expert investigator. We feel that at this stage of development of the field of metabolism and carcinogenicity that interactive guidance by the expert is necessary. There are many areas where the tneory is very thin and a given biological transformation may have been observed for only a few substrates. When this transform is applied to a new substrate, some unrealistic metabolites may be generated owing to the deficiency of contextual information and constraints. An expert is necessary to prune the tree and prevent the automatic processing of those unreasonable intermediates. It is much more efficient for the expert to do this pruning as the tree is being grown, rather than later after an enormous tree has been completed. At some point either during tree generation or at the end, the metabolites will be passed to another program which will identify those metabolites which are identical or "similar" to known carcinogens. Those will be so marked in the tree, Presently, the major task is the aquisition of the metabolism knowledge base, i.e. the writing of the transformation library to be utilized. Metabolism experts at the National Institute of Health are gleaning this information fron both their own research and the metabolism literature. This information will be encoded and the first testing of this new application for the SECS program will begin in June 1977. E. Funding Status. Sandoz Unrestricted Grant to support Computer Synthesis $2590 National Cancer Institute Contract NO1-CP~75815 "Computer-Aided Prediction of Metabolites for Carcinogenicity Studies" $56,328 for 18 montas. Proposal RR-01059 submitted 1 March 1975 to Division of Research Resources, "Resource-HRelated Research: Biomolecular Synthesis", $227,816 for 3 years, approved 1 Oct 76, but still awaiting funding. Note: Were it not for tne leased line and computer access granted to us by SUMEX, the entire SECS project would not have been able to continue for the past 18 months. D. Current List of Project Publications W.T. Wipke and P. Gund, "Simulation and Evaluation of Chemical Synthesis. Congestion: A Conformation Dependent Function of Steric Environment at a Reaction Center. Application with Torsional Terms to Stereoselectivity of Nucleophilic Additions to Ketones," J. Am. Chem. Soc., 98, 8107(1975). We at Wipke, G. Smith and H. Braun, "SECS-~Simulation and Evaluation of Chemical Syntheses: Strategy and Planning,” ACS Symposium Proceedings, 1977. W.T. wWipke, Computer Planning of Research in Organic Chemistry, Proceedings of the Third International Symposium on Conputers in Chemical Education, Research, and Technology, Caracas, Venezuela, 1976. J. Lederberg 122 Privileged Communication CHEMICAL SYNTHESIS PROJECT (SECS) Section 6.2.2 S.A. Godleski, P.v.R Schleyer, E. Osawa, and W.T. Wipke, "The Systematic Prediction of the Most Stable Neutral Hydrocarbon Isomer,” J. Am. Chem. Soec., 99, 0000(1977). F. Choplin, R. Marc, G. Kaufmann, and W.T. Wipke, "Computer Design of Synthesis in Phosphorus Chemistry. Automatic Treatment of Stereochemistry," J. Am. Chem. Soe., 99, 0000(1977). Manuals: SECS Users Manual, June 1976. SECS Users Guide, Aug 24, 1976. ALCHEM Tutorial, Sep 21, 1976. If. Interactions with SUMEY-AIM Resource A. Examples of Collaborations and Medical use of Programs via SUMEX. SECS is available in the GUEST area of SUMEX and has been accessed experimentally by many others as well. Professor R. V. Stevens (UCLA) explored some syntheses of lycapodine while visiting Santa Cruz and as a result has requested UCLA to obtain a graphics terminal so he and others at UCLA can access SECS via SUMEX. Professor W. G. Dauben’s group (Berkeley) has utilized the SECS model builder on SUMEX is now extending the capabilities of that module of SECS. Mr. Mel Spann of the National Library of Medicine Toxicology program is collaborating with us in developing a metabolism livrary for the metabolism of catechol amines. Also collaborating with us on metabolism are Drs. Ted Gram from Guarino’s lab, Harry Gelboin, Dhiren Thakken and Harukiko Hagi from Jerina’s lab, Lance Pohl from Gillette’s lab, Sidney Nelson from Mitchell’s lab, Lionel Poirier from Weisburger’s lab, and Ken Chu and Sidney Siegel all of whom are from the National Cancer Institute. Dr. Steve Heller of the EPA and Dr. G.A. Milne of the National Heart and Lung Institute have expressed interest in putting SECS on the Cyphernetics network as a part of the NIH chemical information system. Restrictions on the allowed core image on that system nave so far held up the negotiations. For the past two years SECS has been available over TELENET from First Data Corporation and has been accessed by industry: Squibb; tlerck, Sharp and Dohme; Pfize; Searle; Lederle Labs; FMC; and recently 3M Corporation and Stauffer. opr. Beryl Dominy of rizer recently presented a paper before the Pharmaceutical Manufacturer’s Association entitled "SECS and the Information Scientist" in which he describes nis experiences with SECS, including an example where a synthetic chemist was having difficulty with a particular synthesis, he then went to SECS for possible solutions. SECS suggested another route as being better and indeed that is what he found when he tried it later in the lab. The availability of SECS on SUMEX-AIM has also served health-related research at the University of California, Santa Cruz. Model building using the SECS model builder is being performed for Professor &dward Dratz (UCSC) to generate conformations of fatty acids isolated from visual membranes ("Structure Privileged Communication 123 Jd. Lederberg Section 6.2.2 CHEMICAL SYNTHESIS PROJECT (SECS) and Function of Visual photoreceptors," EI00175), and for Professor Howard Wang (UCSC) to study how conformations of steroids may affeet the local anesthetic - membrane interaction ("Role of Membrane Proteins in Local Anesthetic Action," GM2 2242). We have assisted Professor J. E. MeMurry in his synthetic work towards Aphnidicholine and Digitoxigenin by using the model builder for predicting possible reaction pathways. An example is given below, where the conformation of the epoxy-ylide was calculated along with the strain energies of the two possible closure products. /\ Q- = oO QQ et ee ewe) ° Utilizing the SECS model builder, we have shown that attack on the epoxide to form the fused system should be much more favorable then attack to form the bicyclo compound. Similar studies have been undertaken to predict the stereochemistry resulting from the acid catalyzed cyclization of McMurry’s Digitoxigenin precursor (HL-18118 "Total Synthesis of Cardiae Aglycones.*): application of SECS using a special library of cationic sigmatropic rearrangement transforms generated the possible products which facilitated identification of some of the side products in the early cyclization experiments. We have also collaborated in the biogenesis work with Professor Phil Crews (UCSC) in marine natural product biogenesis. Dr. Wipke has also used several SUMEX programs such as CONGEN in his course on Computers and Information Processing in Chemistry. B. Examples of Sharing, Contacts and Cross-fertilization with other SUMEX-AIM projects. In collaboration with Dr. Ray Carhart and Dr. Dennis Smith of the DENDRAL/CONGEN Project, a Computers in Cnemistry Workshop was held at U.C. Santa Cruz on the weekend prior to the Fall 1976 American Chemical Society National Meeting held in San Francisco. The workshop attracted participants representing all parts of the chemical community, academia, industry and government. Mornine lecture/discussion sessions introduced the SECS and CONGEN programs running on J. Lederberg 124 Privileged Communication CHEMICAL SYNTHESIS PROJECT (SECS) Section 6.2.2 SUMEX and the afternoon and evening sessions allowed "hands-on" experience for the participants. The response of the workshop participants was a very positive one with many participants showing so much interest that future collaboration and/or use of the powerful non-numerical computing tools available on SUMEX was discussed. The SECS project has held joint research group meetings at Stanford with the DENDRAL and AI groups to discuss common problems and research goals. This has been very rewarding since the groups are complementary in orientation. These joint meetings also let the members meet in person after having met on-line on the network. Last year’s AIM Conference at Rutgers was also a valuable experience, which allowed us to meet people interested in similar problems in different disciplines. It was particularly useful to have the opportunity to talk with experts designing new languages for knowledge representation and to hear them compare their systems. C. Critique of Resource Services. We find the SUMEX-AIM network very well human engineered. The ability to leave messages on the network, and to LINK to other users on-line for advice has been extremely useful to us since we have only the network to keep us educated about what is changing on the system, ete. Tne fact that we have been able to get productive research accomplished remote from Stanford speaks well for the SUMEX-ATM concept. The SECS project finds the SUMEX-AIM staff and community extremely helpful, and anxious to extend themselves to meet our needs. SUMEX provided a leased line and modems to us and provided TYMNET access as well. Were it not for SUMEX, this research effort would have perished since there is no adequate computer facility on the UCSC campus or even in the UC systen. The only problems we have experienced are 1) until recently we were short of disk space, and 2) response time during the day can get pretty bad at times, particularly when using interactive graphics, so consequently most. interactive graphics work is done at off hours. Basically we have found that SUMEX-AIM provides a productive and scientifically stimulating environment and we are thankful that we are able to access the resource and participate in its activities. IIft. Follow-on SUMEX Grant Period (8/78-7/83) A. Long-range User Project Goals and Plans: Over this period of time the SECS project will continue research aimed at synthesis design and planning. Areas of interest include the formation of high level plans to guide the detailed chemical analysis, the capability for depth- first analysis, the evaluation of proposed synthetic pathways by forward Simulation, and pidirectional search from target to key intermediate. At some time during this period the SECS program should be reimplemented in MAINSAIL to allow renovation of the SECS control structure and allow more machine Privileged Communication 125 J. Lederberg Section 6.2.2 CHEMICAL SYNTHESIS PROJECT (SECS) independence. We also hope to incorporate an explanation system to justify the decisions made by the program, which we feel is important for the same reasons MYCIN needed this capability. A new model builder will also be implemented to inerease the speed and generality of 3-D model building. The metabolism project development will parallel the SECS project, but has special requirements for ALCHEM and aromatic chemistry, as well as for a pattern recognition module. A major problem here is how to develop and maintain such a complicated data base on metabolism. We expect to benefit from the experience gained by others in the medical diagnosis programs. We hope at UCSC to have some local data nandling capability such as printing, plotting, and tape handling to facilitate our work. Of course interactive graphics will continue to be our metnod of man-machine communication and we plan to add a GT-44 graphics terminal in the near future to expand current capability. Another graphics terminal is planned for the more distant future. dle would continue to depend on SUMEX for host computing and file storage. We would hope that higher speed communication lines might become possible in the future. B. Justification for Continued use of SUMEX by Our Project: The SECS project requires a large interactive timesharing capability with high level languages and support programs. UCSC is not likely in the future to be able to provide this kind of resource. Thus from a practical standpoint, the SECS project really needs access to SUMEX for survival. Scientifically, interaction with the SUMEX community has been extremely important to the SECS project. Many of our future goals involve incorporation of ideas from other AIM projects into the chemical synthesis project. we would like to believe some of the ideas from the SECS projects are also influencing other AIM projects. Our metabolism project requires collaboration with the metabolism experts at the National Cancer Institute 3000 miles away. The networking aspects of SUMEX-~AIM will be very valuable to this important project. Several collaborations for development of strategies in SECS are being also planned and would require networking. C. Comments and Suggestions for Future Resource Goals, Development Efforts, ete: From our standpoint multiplexing to Stanford might give us higher speed communication for graphics and file transport. Development of MAINSAIL seems important, but until that materializes, support of FORTRAN and standard DEC compatibility is crucial to the SECS project. FORTRAN-10 and LINK-10 are becoming the DEC standard and provide overlay capabilities which are needed in moving programs from machines with virtual memory to ones with limited memory. It would pe useful if there were a good file transfer program--the standard DEC FAILSAFE should be implemented so we can send out files and have their names, versions, ete preserved. It would also be convenient to have a way to send files over TYMNET and TELENET to other machines. We could use this in updating programs at First Data Corporation. J. Lederberg 126 Privileged Communication CHEMICAL SYNTHESIS PROJECT (SECS) Section 6.2.2 The SUMEX-AIM resource should have an annual workshop for the individuals actually implementing and building systems on SUMEX-~the students, postdocs, ete. The purpose of this would be to spread innovation and techniques as well as actual sharing of programs among users of SUM#X. It would also be an opportunity to plan collaborations, development software, and plans for SUMEX. Importantly, it would also develop personal contacts to compliment network contacts. This could be in conjunction with or in addition to the current annual AIM workshop. The current AIM workshop should alternate between coasts. Privileged Communication 127 J. Lederberg Section 6.2.3 HIGHER MENTAL FUNCTIONS PROJECT 6.2.3 HIGHER MENTAL FUNCTIONS PROJECT Modeling of Higher Mental Functions Kenneth M. Colby, M.D. Professor of Psychiatry and Biobehavioral Sciences University of California at Los Angeles I) Summary of Research Program A. Technical Goals: There are three technical goals of the Higher Mental Functions Project: (1) To improve and "therapeutically" experiment with a computer simulation of paranoid processes in order to make treatment recommendations to clinicians based on experience with the model. (2) To develop a new taxonomy of psychiatric patients based on the conceptual patterns appearing in accounts of their illnesses. (3) To develop an intelligent speech prosthesis for patients suffering from communication disorders. B. Medical Relevance and Collaboration: Tne Higher Mental Functions Project is located in the Neuropsychiatric Institute at UCLA. The medical relevance of its research concerns the fields of psychiatry and neurology. The Project collaborates with clinicians and investigators in psychiatry, neurology, the neural sciences and neurolinguisties. C. Progress Summary: we have improved the paranoid model to the point where it can be utilized for therapy experiments. (The model has now passed a true Turing Test in which it cannot be distinguished from real patients.) Toe taxonomy effort is just under way, using the language recognition program which serves as the front end of the paranoid model. This program will have to be added to and modified to serve the purpose of finding and classifying the conceptual patterns appearing in patients’ accounts of their illnesses, We have interfaced a micro-processor with a voice-synthesizer to provide a Speech prosthesis for patients unable to speak. The next step is to write an "intelligent" algorithm which attempts to figure out what the patient is trying to say from his partial input information. J. Lederberg 128 Privileged Communication HIGHER MENTAL FUNCTIONS PROJECT section 6.2.3 D. Funding Status: (1) Current funding. This project is currently funded by research grant NIMH MH 27132-02 and by a General Research Support Grant from the UCLA Neuropsychiatric Institute. (2) Pending applications and renewals. Four additional grant applications have been submitted and are pending at the NIH for support of the above-described research, If. Interactions with the SUMEX-AIM Resource A. Collaborations: The project collaborated with Professor Jon Heiser, Department of Psychiatry, University of California, Irvine, and consulted with Professor Robert K. Lindsay, Department of Psychology, University of Michigan, in conducting a Turing Test of the paranoid model. Other users of SUMEX have received advice and suggestions regarding their problems as well as opportunities to contrast their simulations with ours. We have benefitted greatly from others’ comments on the adequacy and inadequacy of our paranoid model. B. Sharing, etc.; Members of the project have participated in two workshops held at Rutgers, presenting several papers, chairing panels, and conducting discussion groups. Informal discussions with large numbers of workers in Artificial Intelligence in Medicine have led to a helpful sharing of ideas and techniques. SUMEX is valuable to us as a communication channel combining the advantages of a telephone and the U.S. mail without the disadvantages of either. For widely scattered researchers, it facilitates the intimate, low-level communication which is normally accomplished in hallways or around water coolers. The individual discussions are not very profound, but the cumulative effect subtly improves our research. , Tne existence of SUMEX as an independent project naturally relieves numerous researchers of the burden of separately financing and staffing a large computer facility. C. Critique of Resources: The few complaints we had regarding difficulties of network access have been remedied. The computer system performance is admirable with the staff being most receptive to suggestions. Privileged Communication 129 J. Lederberg Section 6.2.3 HIGHER MENTAL FUNCTIONS PROJECT III. Follow-on Period A. Long-range Project Goals: We anticipate working on the above-described projects for at least 5 years or more. The problems are highly complex and will require years of sustained effort to solve. B. Justification for Continued Use of SUMEZ: The paranoid model and the conceptual pattern recognizer require a large time-shared computer because of the large size (100%) of these programs written in a high-level programming language (MLIS?-UCI LISP). The speech prosthesis effort does not require a large system in itself because it stands as an independent unit. However, for constructing and developing dictionaries for types of speech prostheses, it is most efficient to do this on a large and fast system such as SUMEX. C. Comments and Suggestions for Future Research Goals: It seems that the resource fulfills all of its stated goals of facilitating researcn in the field. The only drawback is that there isn’t more of a good thing. Doubling the computing power and memory storage capabilities would not be unreasonable. D. Up-to-date List of Publications: Colby, K.M., Parkison, R.C. and Faught, B. Pattern-matching Rules for the Recognition of Natural Language Dialogue Expressions. Am. J. Computational Linguistics, Microfiche 5, Sept., 1974. Colby, K.M. Clinical Implications of A Simulation Model of Paranoid Processes. Archives of General Psychiatry, 33, 854-857, 1976. Faught, W., Colby, K.M. and Parkison, 2.C. Inferences, Affects and Intentions in A Model of Paranoia. Cognitive Psychology, 9, 153-187, 1977. Colby, K.M. An Appraisal of Four Psychological Theories of Paranoid Phenomena. J. of Abnormal Psycholocy, 85, 54-59, 1977. Parkison, R.C., Colby, K.M. and Faught, W.S. Conversational Language Comprenension Using Integrated Pattern Matching and Parsing. Artificial Intelligence (In Press) 1977. Colby, K.M., Christinaz, D. and Graham, S. A Computer-driven, Personal, Portable and Intelligent Speech Prosthesis for Aphasic Disorders. Brain and Language (In Press) 1977. Colby, K.M. On the Way People and Models Do It. Perspectives in Biology and Medicine (In Press) 1977. J. Lederberg 139 Privileged Communication HIGHER MENTAL FUNCTIONS PROJECT section 6.2.3 Heiser, J., Colby, K.uM., Faught, W. and Parkison, R.C. Testing Turing Test (Forthcoming). Faught, W.S. Conversational Action Patterns in Dialogs. Proceedings of the Workshop on Pattern-directed Inference Systems, May, 1977. Privileged Communication 131 J. Lederberg Section 6.2.4 INTERNIST PROJECT 6.2.4 INTERNIST PROJECT INTERNIST - Diagnostic Logic Project J. Myers, M.D. and H. Pople, Ph.D. University of Pittsburgh I. SUMMARY OF RESEARCH PROGRAM A. Objectives The principal objective of this research project has been and continues to be the development, evaluation, and implementation of a computer-based diagnostic consultation system for internal medicine. This work, which was initiated at the University of Pittsburgh approximately six years ago, has been supported for the past three years by a grant from the Bureau of Health Resources Development. A heuristic diagnostic program called INTERNIST has been developed, along with an extensive medical database now comprising more than four hundred disease categories and two thousand manifestations of disease. The system has been tested with a wide variety of difficult clinical problems: cases published in the medical journals, CPC’s, and other interesting and unusual problems arising in the local teaching hospitals. In the great majority of these test cases, the heuristic INTERNIST program has proved to be effective in sorting out the pieces of the puzzle and coming to a correct diagnosis. In some cases, as many as six distinct disease entities have been identified correctly. We believe that by the time of the exoiration of the BHRD grant in June, 1977, our original objective, which was to develop a system providing expert diagnostic capability with regard to the major diseases of internal medicine, will have been accomplished to the extent possible in the current laboratory framework. At that time, we propose to initiate a broader collaboration, which will invite the participation of remote users in (a) further evaluation of the INTERNIST programs and database, (b) development of specialized data-bases and procedures for various medical subspecialties, (c) refinement of the user interface. (d) investigation of alternate uses of the INTERNIST data-base. We believe that the expansion of the experience base of INTERNIST users, whicn will result from this type of collaboration, will Significantly enhance the further course of INTERNIST development. J. Lederberg 132 Privileged Communication INTERNIST PROJECT Section 6.2.4 B. Progress Summary #xpansion of the medical data-base to encompass new areas of disease is an on-going activity of the project. Much of this work is carried out by medical students who elect to take part in the project as part of their fourth year elinical rotation, with the period of participation varying from 6 to 18 weeks. Each student is assigned a group of diseases, usually in a specific clinical area, for study. Tne literature on a disease is studied exhaustively for all quantitative data available. Frequently clinical experts on the faculty are consulted, particularly about controversial data. Tne student compiles a complex list of the manifestations of the disease under study and assigns tentative measures of strength of association. The clinical principal investigator together with any other clinicians working on the project then review the data exhaustively in order to assure the appropriateness and completeness of the disease profile. The profile is then entered into the computer and tested for completeness and reliability against a typical or "textbook" example of clinical cases. If available, other cases of the disease from the floors of our university hospital and from published cases such as the clinical-patholozical conferences from the New England Journal of Medicine and the American Journal of Medicine are also used. Further refinement occurs in the course of the continued use of the data- base. In addition to this data-~base development, work on a refined diagnostic program has also been an on-going activity during this period. The present INTERNIST process employs a “problem — formation’ heuristic, which identifies one of perhaps several problems in a clinical case as its initial focus of problem-solving attention. Although only one problem is considered at a time, the process recycles after each problem is solved, thereby uncovering the entire complex of diseases present. In the zreat majority of elinical cases tested, this strategy of iterative problem formation and solution has proved to be effective in sorting out the complexities of a case and rendering a correct diagnosis. In many respects, however, it seems clear that performance could be significantly enhanced if the program were to attend to the various component problems and their inter-relationships Simultaneously. Use of a more global problem - formation strategy could be expected to yield more rapid convergence on the correct diagnosis in many cases, and in at least some cases to prevent missed diagnoses. Alternative problem formation strategies that exploit the type of pseudoparallel processing facilitated by the INTERLISP ‘spashetti stack’ are presently being investigated. We believe that this research will also set the Stage for subsequent development of a therapeutic management component of the INTERNIST consultation facility; however at the present tine it is not possible to project a precise timetable for the development of thes= additional capabilities. Privileged Communication 133 J. Lederberg Section 6.2.4 INTERNIST PROJECT C. Publications 1. Pople, H.E., Myers, J.D., & Miller, R.A., "Tne DIALOG Model of Diagnostic Logie and its use in Internal Medicine". Proceedings of the Fourth International Joint Conference on Artificial Intelligence, Tbilisi, USSR, September 1975. 2. Pople, H.E., “Artificial-Intelligence Approaches to Computer-based Medical Consultation, Proceeding IEE Intercon, New York, 1975. 3. Pople, H.E., "Tne Syntheses of Composite Hypotheses in Diagnostic Problem Solving: An Exercise in Hypothetical Reasoning". Proceedings of the Fifth International Joint Conference on Artificial Intelligence, August 1977 (forthcoming) . D. Funding Status 1. Current Funding: Granting agency ~ BHRD; Number: 1 RO1 MB 00144-0903 Total period of the award ~ 3 years (6-30-74 to 6-29-77) Current. year of the award - 1977 Current annual funding - 148,636 2. Pending Applications: 1. Granting agency - NIH; Title: Clinical Decision Systems Research Resource First year request ~ 1,023,883 2. Granting Agency -~ BHRD; Title: DIALOG: A Computer Model of Diagnostic Logic Fourth year request - 190,176 II. INTERACTION WITH SUMEX-AIM RESOURCE A. Medical Use of Programs and Collaborations Because of the research and development nature of our work on the INTERNIST system over the past several years, we have been somewhat limited in our ability to establish wide-spread collaborations. However, menbers of the medical house staff in the local hospitals having some prior experience with the project have continued to work with INTERNIST while pursuing their medical training. In addition, project staff often have occasion for interaction with individuals and groups who have interest in the characteristics of the diagnostic system fron both medical and computer science perspectives. Future plans for more extensive collaboration are discussed in section III. J. Lederberg 134 Privileged Communication INTERNIST PROJECT Section 6.2.4 B. AIM Interactions We have benefitted considerably from interactions with other members of the SUMEX-AIM community. In June °76 we participated in the AIM workshop at Rutgers, which provided an excellent perspective as to what else is going on in the field. During the past several months we have had useful exchanges with Randy Davis, Victor Yu, and Jonn Foy, three individuals participating in the MYCIN project. In addition, we rather routinely interact with SUMEX staff regarding fine points ana problems relating to our use of system facilities. Tne opportunity to keep abreast of developments in a fast changing field is one of the principal benefits to be derived from the collegial environment fostered by SUMEX-AIM. C. Critique Of Services We have found the SUMEX-AIM resource to be a superb facility for the conduct of research and development activities related to the INTERNIST project. The general high level of user services, documentation, staff support and reliable operation, which characterizes this unique resource, has contributed Significantly to the rate of progress our project has achieved. Iil. FOLLOW-ON SUMEX GRANT PERIOD (8/78 ~ 7/83) A. Long-Range User Project Goals And Plans Continued research and development of the medical data base and diagnostic programs characteristic of our past and current work at SUMEX in anticipated. We estimate that two to three years will be required to complete the medical data-base presently envisioned for INTERWIST. However, by the end of this grant period (June 30, 1977) we expect that the knowledge base should have reached "a critical mass" sufficient to allow initial clinical trial on a routine basis. Sometime in mid-1977, we intend to begin limited field trials of the INTERNIST system by installing terminals in selected wards of Presbyterian University Hospital in Pittsburgh. A number of the members of the house staff nave indicated their desire to participate in the evaluation studies, and several have expressed willingness for all cases entering their service to be run and rerun as necessary, in order to enhance our understanding of the strengths and weaknesses of the INTERNIST system. As we move from the R&D stage to this more production-oriented phase of activity, it seems inevitable that the requirements for support of INTERNIST activities will become increasingly incongruous with the general purpose nature of the facility provided by SUMEX. Privileged Communication 135 J. Lederberg Section 6.2.4 INTERNIST PROJSCT Our expectation is that on the services initially supported at Presbyterian University Hospital, there will be as many as 20 INTERNIST case analyses run each day. Based on our experience operating INTERNIST at SUMEX, we would anticipate that each of these studies would require 3 to 5 minutes of CPU time and entail an elapsed time on tne order of 30 to 50 minutes during lightly loaded periods on tne system. We have also found, however, that the only feasible time to perform such studies is in the early morning hours, and that by 11:00 or 12:00 Eastern time the response provided by SUMEX is unacceptable for such activities, While marginally capable of supporting the heavy case load anticipated in tne local evaluation studies, SUMEX-AIM will clearly not serve the more extensive collaboration - involving up to 6 remote user sites - which is presently contemplated for the second stage of field evaluation which we hope to have underway before January 1978. We believe it to be critically important during these field trials, that highest priority be given to providing a responsive system, scheduled for the convenience of those clinical personnel asked to participate in the project. This suggests that dedicated hardware facilities, which can be optimized to Support this central user service, be made available for the exclusive use of INTERNIST staff and collaborators. For this purpose, we have proposed to NIH the establishment of a Clinical Decision Systems Research Resource, which would be a node in the AIM network having DECSYSTEM-20 hardware and software, a TYMNET interface, and the specialized mission described above. Our hope is that this new facility can be in operation by January 1, 1978. B. Justification For Continued USe of SUMEX By The INTERNIST Project SUMEX will be used in the initial field trials of INTERNIST, which we hope can be accomplished without overload and interference with the work of other users. With establisnment of a dedicated INTERNIST resource, this production case load will be removed from SUMEX, but at present it is not possible to define precisely wnen this changeover will take place. In any case, a continuing research effort requiring SUMEX facilities can be expected to require approximately the same level of resource utilization as in the past. Cc. Comments and Suggestions 1) The members of the INTERNIST project agree that the plans to augment the SUMEX resource by the addition of more core memory and disk storage and retrieval facilities can be expected to provide quite tangible improvement in system performance. 2) In the experience of developing program access to the large INTERNIST data base, project members have perceived the potential value of a general system desisned to facilitate the interface of user programs and structured data- bases. We would be interested in collaborating with the SUMEX staff in such a development, which might prove beneficial for the user community at large. J. Lederberg 136 Privileged Communication INTERNIST PROJECT Section 6.2.4 3) Another potentially valuable research area would be the investigation of methods to provide support for a project’s efforts to improve real-time performance of its programs. While the design of program specific algorithms must be the concern of project staff, it is in the interest of the SUMEY community that user’s be provided with information and tools to enable efficient use of SUMEX” languages and operating system. Tais is one of few areas in wnich we have found documentation of system features and facilities to be less than adequate. Perhaps special performance worksnops, involving systems personnel from the various AIM sites, could pe convened to address these issues. Privileged Communication 137 J. Lederberg Section 6.2.5 MEDICAL INFORMATION SYSTEMS LABORATORY 6.2.5 MEDICAL INFORMATION SYSTEMS LABORATORY MISL ~- Medical Information Systems Laboratory M. Goldberg, M.D. and B. McCormick, Ph.pD. University of Illinois at Chicago Circle IT) SUMMARY OF RESEARCH PROGRAM A.) TECHNICAL GOALS The Medical Information Systems Laboratory (MISL) was established under grant’ HM-0114 in Chicago to pursue three activities: i) Construction of a database in ophthalmology, ii) Clinical knowledge system support, and iil) Network-compatible database design. Priorities in year 04 of MISL’s operation are the same as in previous years: investigations into how to construct a database in ophthalmology, and into distributed database design, are ancillary to the exploration of a clinical knowledge system to support clinical decision making. We are developing ways to get reliable clinical information into the opnthalmic database primarily because we are interested in getting out Significant clinical decision support. B) APPROACH AND MEDICAL RELEVANCE B.1) Construction of the database in Ophthalmolozy A specific aim of this project is to construct a workable database in ophthalmology, using the outpatient population of the Illinois Eye and Ear Infirmary. We view this database as a testbed for developing clinical decision Support systems. The Ophthalmology Department of the Illinois Eye and Ear Infirmary provides an excellent environment for evaluating new techniques for capturing and using clinical information. B.2) Clinical knowledge support systenu The goals for clinical knowledge system development are to provide a flexible user interface for a prototype relational database system, to devise means of accessing alphanumeric and pictorial information stored in the database system, and to provide efficient means for logically restructuring a database so that it can be adapted to different operating environments in a network- compatible distributed medical information network. No clinical database, however, has intrinsic significance beyond its ability to support the diagnosis and management of disease. Additional goals for tne clinical knowledge system are therefore to devise computer-based consultation systems for glaucoma and selected retinal/choroidal diseases, and to provide J. Lederberg 138 Privileged Communication MEDICAL INFORMATION SYSTEMS LABORATORY Section 6.2.5 formal models which permit the relational development and evaluation of rule- based consultation systems containing 2,000 + 10,0900 rules. In recognition that a continuum exists between physician-guided decision support and computer-based consultation, we choose to describe these services as a Clinical Knowledge System: a consortium of a clinical database and rules for its interpretation. C) PROGRESS SUMMARY (INCLUDING ITEMS Of INTEREST TO SUMEX-AIM COMMUNITY ONLY) C.1}) The database in ophthalmolog Physician terminals and interfaces to ophthalmic instruments have been positioned in the general eye clinic and several key ophthalmic subspecialty clinics. Systematic, modular hardware and software for clinical source data acquisition have been established. The clinical support system computer will shortly be transfered to the newly dedicated Goldberg Research Center, adjacent to the Illinois Eye and Ear Infirmary. We look forward to stabilizing the hardware configuration, telecomaunication linkages and software support. C.2) Clinical knowledge system support C.2.a) Development of the relational database includes the following: ~ A user interface through which unsophisticated users communicate with the database. ~ An intelligent coupler that serves as an intermediary between the end user and the distributed database system. The coupler listens to the user’s retrieval requests; helps the user formulate his requests correctly; efficiently translates user’s retrieval requests into a network-compatible retrieval comnand language; and obtains authorization from the system for data retrieval and/or update. - Tools for picture data management. Graphical indexing techniques are provided so that the clinical researcher and physician can easily retrieve pictorial/graphical information from the medical database. - Means for logical database synthesis, This involves conversion of the user’s view of the database into a logically coherent physical organization. C.2.b) Development of a computer-based consultation system for diagnosis and management of glaucoma. This involves on-going collaboration between Dr. Jacob Wilensky at MISL, and, through SUMEX-AIM, otner investigators around the United States. Included are the original investigators in glaucoma consultation: Dr. Casimir Kulikowski (Rutgers), Dr. Shalom Weiss (Mt. Sinai Hospital, NY), and Dr. Aaron Safir (Mt. Sinai Hospital). Privileged Communication 139 J. Lederberg Section 6.2.5 MEDICAL INFORMATION SYSTEMS LABORATORY C.2.c) Development of a consultation system for diagnosis and management of retinal/choroidal diseases. A design has been proposed (in Walser and MeCormick, see below) for MEDICO, a consultation system that advises non-expert physicians in the management of enorioretinal diseases. In addition, a major subsystem of MEDICO, responsible for mediating the acquisition and organization of rules, has been implemented. C.2.d) Formal models for consultation systens. Petri nets have been studied, primarily by Murata (see below), as a formal representation for interacting parallel processes. Petri nets are similar to causal networks, as described by Kulikowski and Weiss at Rutgers, except that, with Petri nets, cyclic activity is easily represented. The similarity between Petri nets and inference nets has also been noted (Walser and McCormick). The utility of the Petri net framework for modelling physical processes was explored by Walser, with the construction of a simulated coffee maker. Further studies are planned. D.) LIST OF MISL PUBLICATIONS Chang S. K., Donato N., McCormick B. 4., Reuss J., and Roochetti R. (1977) A relational database system for pictures. Proce. IEEE Workshop on Picture Data Description and Management, April 20-22, 1977, Chicago, Illinois. Chang S. K. and Cheng W. H. (1975) A database szeleton and its application to logical database synthesis. MISL report M.D.C. 1.1.17. Caang S. K. and McCormick B. H. (1975) An intelligent coupler for distributed database systems. MISL report M.D.C. 1.1.7. Malone, J. E. (1976) Interval generalization of structure representation. MISL report M.D.C. 1.1.22. Malone J. E. (1975) User’s guide to uniclass cover synthesis. MISL report M.D.C. Aud, Malone J. E. (1975) Addendum to AQVAL/1 (AQ7), part 1: User’s guide and program description. MISL report M.D.C. 4.4.1. Manacher G. K. (1977) The case for strong loops and selection structures in ordinary computer languages. MISL report M.D.C. 1.1.21. Manacher G. K. (1975) On the feasibility of imolementing a large relational data base with optimal performance on a minicomputer. Proc. International Conference on Very Large Data Bases, Framingham, Mass. MeCormick B. H. and Nordmann B. J. Jr. (1977) Modular asynchronous control design. Forthcoming in IEEE Transactions on Computers. Also MISL report M.D.C. 1.1.25. J. Lederberg 1490 Privileged Communication MEDICAL INFORMATION SYSTEMS LABORATORY section 6.2.5 MeCormick B. H. and Amendola R. C. (1977) Cytospectrometers for subcellular particles and macromolecules: design considerations. Presented at Workshop on Theory, Design and Biomedical Applications of Solid State Chemical Sensors, Case Western Reserve University, March 28-30, 1977. Also MISL report M.D.c. 1.1.24. McCormick B. H. and Wilensky J. (1975) Clinical knowledge acquisition: design of a relational data base in ophthalmology. Proc. Second Annual Medical Information Systems Conference, Urbana, Ill. McCormick B. H., Goldberg M. F., and Read J. S. (1974) Clinical decision-making: design of a data base in ophthalmology. Proc. First Annual Medical Information Systems Conference, Urbana, Ill. Michalski R. S. and Chang S. K. (1976) A self-model for a relational database. MISL report M.D.C. 1.1.15. Michalski R. S$. (1975) On the selection of representative samples from large relational tables for inductive inference. MISL report M.D.C. 1.1.9. Murata T. (1975) On liveness and other properties of E-Nets. MISL report M.D.C. 1.1.15. Murata T. (1975) Bibliography on Petri nets and related topics. MISL report M.D.C. 1.1.20. Murata T. (1976) A method for synthesizing marked graphs from given markings. Presented at 17th Annual Symposium on Foundations of Computer Science, October 25-27, Houston, Texas. Murata T. (1976) On deadlock and the liveness of E-nets. Presented at the 17th Annual Symposium on Foundations of Computer Science, October 25-27, Houston, Texas, Murata T. (1975) State equation, controlability, and maximal matchings of Petri mets. MISL report M.D.C. 1.1.10. Murata T. and Church R. W. (1975) Analysis of marked graphs and Petri nets by matrix equations. MISL report M.D.C. 1.1.8. Vere S. A. (1975) Induction of concepts in the predicate calculus. Proce. Fourth IJCAI. Vere S. A. (1975) Relational production systems. Forthcoming in Artificial Intelligence. Also MISL report M.D.C. 1.1.5. Walser R. L. and McCormick B. H. (1976) Organization of clinical knowledge in MEDICO. Proc. Third Illinois Conference on Medical Information Systems, Uroana, Ill. Walser R. L. and McCormick B. H. (1977) A system for priming a clinical knowledge base. fFortucoming in Proc. 1977 National Computer Conference, June 13-16, Dallas, Texas. Privileged Communication 144 J. Lederberg Section 6.2.5 MEDICAL INFORMATION SYSTEMS LABORATORY &.) FUNDING STATUS Year 03 -- 6/30/76 - 6/30/77: $228,000. Year 04 (projected, pending renewal) -- 7/1/77 - 6/30/78: $278,109. Ii) INTERACTION WITH SUMBX-AIM RESOURCE A.) COLLABORATION Major collaboration at present is through the ONET, involving the ophthalmology departments of five medical schools. Dr. Jacob Wilensky is actively engaged in evaluating and modifying the Glaucoma Consultation Program, written originally by Shalom Weiss. B.) CRITIQUE OF RESOURCE SERVICES Users at MISL are pleased with SUMEX-AIM services. The availability of up- to-date on-line documentation makes it easy to learn how to use the system and stay abreast of new developments. The on-line bulletin board is especially commendable. Since documentation is so readily available, consultation with SUMEX staff has rarely been necessary. IIT) FOLLOW-ON SUMEX GRANT PERIOD A.) LONG RANGE USER PROJECTS AND GOALS In the future, we expect to become more involved in the development of software for decision support. We also anticipate more extensive collaboration, especially sharing of databases, with investigators at other sites. 8.) SPECIFIC PROJECTS AND JUSTIFICATION FOR CONTINUED USE OF SUMEX While much of our development to date has been conducted in a minicomputer environment, we have now reached a stase at which we can benefit greatly from software available from SUMEX. Access by our staff to SUMEX facilities and opportunity for inter-institutional collaboration will be enhanced by a SUMEX (PDP-10) - MISL (PDP-11) phone connection, which we plan to implement shortly. This connection will be valuable to our decision support group, Since it will be possible to develop and test programs in INTERLISP at SUMEX, then to translate them into the lower level HARVARD LISP, which is available on our UNIX (PDP-11) operating system. It will also be possible to edit programs on our machine (which is an advantage for us since we can operate at 9600 baud), then execute the programs on the SUMEX PDP-10. Also, using SUMEX, we have recently implemented the planning systen described by Earl Sacerdoti in his thesis "A structure for plans and behavior" (Stanford, 1975).. We are impressed by the potential power of the system and are J. Lederberg 142 Privileged Communication MEDICAL INFORMATION SYSTEMS LABORATORY Section 6.2.5 considering it as a basis for our consultation system for managing chorioretinal diseases. Since our version has only been tested in a blocks world, further development is necessary, and we would, of course, require continued access to SUMEX and INTERLISP. It has also been proposed that the planning system be used to construct sequences of database retrieval statements in RAIN, a relational algebraic interpreter developed by Dr. S. K. Chang at MISL. This could benefit our user interface, since physician’s requests could be phrased at a high level, and then translated into appropriate RAIN commands. The planning system provides a convenient, procedural representation for tne database semantics necessary to make the translation from a high level language. INTERLISP is also being used by Dr. Brian Phillips and his students to code a model of knowledge developed over a period of years at the State University of New York at Buffalo, and later in the Department of Information Engineering and MISL in Chicago. While the model of knowledge is well-developed, and has been implemented at another site in SNOBOL, the INTERLISP version requires further work. It is anticipated that the implementation, when complete, will be useful to the decision support group. . C.) SUGGESTIONS FOR FUTURE RESOURCE DEVELOPMENT EFFORTS As mentioned above, we are very interested in coupling our PDP-11 based UNIX operating system with the SUMEX-AIM network. and would like to encourage Similar connections at other sites. There are several advantages. Maintaining voluminous patient-related data on minicomputer systems would provide for local security, and help to keep SUMEX secondary storage free for service and development programs and docunentation. The enhanced opportunity for inter-site collaboration and database sharing is obvious, and would be beneficial to the SUMEX-AIM community as a whole. Privileged Communication 143 J. Lederbers Section 6.2.6 RUTGERS COMPUTERS IN BIOMEDICINE 6.2.6 RUTGERS COMPUTERS IN BLIOMEDICINE Rutgers Research Resource —- Computers in Biomedicine Principal Investigator: Saul Amarel Rutgers University, New Brunswick, New Jersey T) SUMMARY OF RESEARCH PROGRAM A) Goals and Approach The fundamental objective of the Rutgers Resource is to develop a computer based framework for significant research in the biomedical sciences and for the application of research results to the solution of important problems in health care. The focal concept is to introduce advanced methods of computer science ~- particularly in artificial intelligence ~ into specific areas of biomedical inquiry. The computer is used as an integral part of the inquiry process, both for the development and organization of knowledge in a domain and for its utilization in problem solving and in processes of experimentation and theory formation. Tne Resource community includes 48 researchers - 30 members, 8 associates and 10 collaborators. Members are mainly located at Rutgers. Collaborators are located in several distant sites and they interact, via SUMEX-AIM, with Resource members on a variety of projects, ranging from system design/improvement to clinical data gathering and system testing. At present, collaborators are located at the Mt.Sinai School of Medicine, N.Y.; Washington University School of Medicine, St. Louis, Mo.; Johns Hopkins Medical Center, Baltimore, Md.; Illinois Eye and Ear Infirmary, Chicago, Ill.; and the University of Miami. Researen in the Rutgers Resource is oriented to "discipline-oriented" projects in medicine and psychology, and to "core" projects in computer science, that are closely coupled with the "discipline-oriented" studies. Work in the Resource is organized in three AREAS OF STUDY; in each area there are several projects. The areas of study and the senior investigators in each of them are: (1) Medical Modeling and Decision Making (C. Kulikowski, A. Safir). (2) Modeling Belief Systems and Common-sense Reasoning (C.F. Scamidt, N.S. Sridharan). (3) Artificial Intelligence: Representations, Reasoning and System Development {(S. Amarel) in addition, the Rutgers Resource is sponsoring an Annual National AIM Workshop, wnose main objective is to strengthen interactions between AIM activities, to disseminate research methodologies and results, and to stimulate collaborations and imaginative resource sharing within the framework of AIM. The second AIM Workshop was held near the New Brunswick Rutgers Campus on June 1-4, 1976. The third Workshop is scheduled for July 6-8, 1977. J. Lederberg 144 Privileged Communication RUTGERS COMPUTERS IN BIOMEDICINE Section §.2.6 B) Medical Relevance; Collaborations A major part of our research is focusing on the development of computer based medical consultation systems. We are using artificial intelligence approaches in problems of: knowledge acquisition from experts in a medical specialty and from their clinical experience; the representation and management of these complex and changing data bases of medical knowledge within the computer; and the development of a sufficiently rich repertoire of reasoning Strategies for diagnosis, prognosis, therapy selection, explanation and teaching. By linking such a system to a data base of prospectively chosen eases, we are in the position to provide a powerful tool for clinical research with built-in interpretative capabilities. Our approach emphasizes the development and application of clinically useful models that describe the pathophysiology and dysfunction of diseases in a variety of tasks: a) Consultation embodying expert knowledge, which is expressed in terms acceptable to the clinician; b) Clinical research aid, assisting the investigator to; ? ~~] ’ i) Summarize and incorporate his knowledge, experience, and opinions into a computer system; ii) Analyze his data, check it against that of other investigators, pooling it when appropriate to draw stronger conclusions based on the large Sample of cases; iii) Test, evaluate and modify the data base of models and decision Strategies to produce an up-to-date summary of experience in his specialty. e) Screening and diagnosis, to aid nursing or paranedical personnel in performing routine decision procedures within restricted medical environments; d) Instruction to provide practitioners and support personnel with appropriate explanation and guidance in clinical decision-making. A unique and novel aspect of our work is the creation of a network of clinical investigators to collaborate on the testing and continued development of the computer programs needed to accomplish the above tasks. During 1975, the ophthalmological network (ONET) of glaucoma investigators has grown and established itself, with several significant collaborative research projects currently underway. The consultation program for glaucoma uSing the causal associational network (CASNET) model developed within the Rutgers Resource, was jointly presented by the ONET members at the 1976 meeting of the Association for Research in Vision and Ophthalmology. An important new emohasis has been the incorporation into the consultation program of alternative expert opinions on subjects currently under debate. Dr. Douglas Anderson of the Bascon-Palmer Eye Institute at the University of Miani has joined ONET to provide such alternatives and strengthen the glaucoma model in certain important areas. The SUMEX-AIM shared computer resource has been essential to the activities of ONET. Privileged Communication 145 J. Lederberg Section 6.2.6 RUTGERS COMPUTERS IN BIOMEDICINE The knowledge base and the strategies of our CASNET glaucoma consultation system are being strengthened and refined continuously in the ONET environment. . The system is now at a point where it is considered by leading ophthalmologists as "highly competent to expert" in several subspecialties of glaucoma. The ONET group was confident enough about the system to demonstrate it at the October 1976 meeting of the American Academy of Ophthalmology and Otolaryngology. The reactions to the system were most favorable. The response of an independent sample of ophthalmologists taken at this meeting strongly emphasized the importance of the system for glaucoma research. In addition to the main glaucoma research activities, the Resource has collaborated with the Mt. Sinai-Rutgers Health Care Computer Laboratory in the development of models for refraction and visual fields. These will be used by clinical prototype programs for guiding paramedical personnel in data acquisition and decision-making. These programs run on the PDP-11 computers of the clinical ophthalmological system at Mt. Sinai, which are to be linked to the PDP-10 at Rutgers for accessing the more couplex models of disease when they are needed. The activities in conjunction with the Health Care Computer Laboratory reflect the more applied aspects of our work in the medical area. The collaboration with Dr. R. Nordyke of the Straub Clinic on thyroid disease consultation systems has continued at a low level of activity during 1976. In the area of Belief Systems, collaboration has continued with Professor Andrea Sedlak and her group at the University of North Carolina. This collaboration is focusing on developmental aspects of action perception. In the AT Area we had extensive interactions with researchers in several institutions on problems of representation, problem solving systems, natural language processing, automatic programming, data base systems, and interactive Systems. Contacts continued with the natural languaze group at BBN (Woods, Bruce) on the design of natural language processors for medical systems. Also, we had contacts with the Stanford-Xerox group (Winograd, Bobrow) which is involved in the development of KRL (Knowledge Representation Language). Following the Rand Workshop on Biomedical Modeling (February 18-20, 1975), in which S. Amarel participated, preliminary contacts started with Dr. D. Garfinkel from the University of Pennsylvania in connection with possible applications of AI methods to the modeling of metabolic processes. Our close contacts with the Stanford projects on Heuristic Programming (Drs. Buchanan, Feigenbaum, Lederberg) are continuing. The orientation and approach of these Stanford projects are very similar to ours. We continue to share with the investigators in DENDRAL and METADENDRAL a strong interest in computer-based methods of scientific inference and in AI ideas and techniques for representation of knowledge in computers, diagnostic problem solving and theory formation. One of the significant collaborative developments this period was the joint work of Ed Feigenbaum’ and his students at Stanford, and Saul Amarel and his Students at Rutgers, on the development of an AI Handbook. This handbook is deing prepared on the SUMEX-AIM and RUTGERS-10 computers, and it is intended to J. Lederberg 146 Privileged Communication RUTGERS COMPUTERS IN BIOMEDICINE Section 6.2.6 provide a network-accessible encyclopedic coverage of the AI field for the AIM community and AIM guests. C) Progress Sumnary 1. Areas of Study and Projects a) Medical Modeling and Decision-Making The consolidation of the opthalmological network (ONET) of collaborating glaucoma investigators using the SUMEX-AIM shared resource facility, the testing and improvement of the CASNET consultation system with the help of the collaborators, the design and implementation of a time-oriented database system and a set of analysis programs for aiding joint clinical research activities within ONET, and the development of a new knowledge-based consultation system (IRIS), represent the main achievements in the last year. The network of investigators in glaucoma is designed to foster development of consultation systems that embody sufficient depth for knowledge and expert opinion in a variety of subareas to be useful as research and teaching tools. The collaborative activities, coordinated by Dr. A. Safir at Mt. Sinai, bring together selected scientist-users with complementary interests and strengths in different aspects of glaucoma, and Resource investigators who are concentrating on the development of new computer science methodologies in modeling and problem solving. During this period, there has been more extensive testing of the CASNET glaucoma consultation program. The collaborators had several meetings to discuss the structure of the glaucoma model and suggested many improvements and -additions. A significant new capability of the program is the inclusion of alternative interpretations that. ecapture differences of Opinions among the experts on aspects of the model that are currently under debate. A new development during this period has been the implementation of a time— Sequenced data base for glaucoma, which has the dual purpose of aiding the clinical research of ONET collaborators and of providing a systematic means for evaluating and improving the performance of the consultation programs. In the area of general methods and systems we have developed a multilevel- Semantic network representation for characterizing disease processes, their anatomical descriptions and their taxonomic identification. This is used by a set of normative rules for diagnostic, prognostie and therapeutic reasoning, which results in a very general and flexible system for clinical consultation. A prototype model called IRIS is being developed using the glaucoma knowledge-base. We have also continued our investigations of other representation paradigms: a frame-based approach and the relationship to mathematical models of opties and refraction. Another subproject is concerned with developing methods of inference over network structures that will permit us to incorporate the results of clinical experience with different groupings of case-types into the models of consultation, aiding at the same time in the evaluation of the programs. Privileged Communication 147 J. Lederberg Section 6.2.6 RUTGERS COMPUTERS IN BIOMEDICINE b) Modeling of Belief Systems and Conmon-Sense Reasoning During this period a major achievement was the development and implementation of the AIMDS system. This is an MDS-based system that is specialized and augmented for use in modeling reasoning about actions. A noteworthy aspect of the system is the use of the MDS concepts of Consistency Conditions and Residues to guide frame instantiations and the drawing of further inferences from such frame instantiations. The BELIEVER theory is a psychological model of the processes involved in the interpretation and common-sense reasoning about observed human actions. The AIMDS system is being constructed to provide a framework for formulating, Studying and testing the BELIEVER theory. The comouter system and the psychological theory are growing together, and they are strongly influencing each other’s development. The domain of common-sense reasoning about actions represents a prototypical example of knowledge based reasoning. The richness of the psychological data that this theory must explain, namely, persons” linguistic descriptions and summarizations of everydav behavior, has forced us to think very carefully about how knowledge is to be represented and used. Out of this has emerged a general scheme that not only seems psychologically plausible but also appears to provide a useful framework for viewing a wide variety of problems of interpretation including medical diagnosis and theory-based interpretive problems involved in organic chemistry. Along with the implementation of the system, we have developed the representation of the central knowledge components of the BELIEVER theory. The central common-sense concepts of Person, Plan and Act have been represented as frames. These frames are highly articulated structures which express the core assumptions of the common-sense psychological theory. By expressing these concepts as frames we nave been able to provide a representation of these assumptions that can be used to guide and control the overall processes of reasoning about particular persons, plans an@ actions. The procedural components of the theory have been defined and are closely linked to these frames. This interplay and association between processes and highly articulated structures promises to provide a basis for strongly decomposing the knowledge of the domain. Since the interdependencies of these conceots are represented structurally rather than procedurally, the active database of our M#DS-based system provides the basis for communication and cooperation between the processes that monitor these person, plan and act frames. The definition of these central structural components together with the general system components have also provided a competence theory within which detailed predictions of the BELIEVER theory were specified. These predictions about the structure of summary protocols were tested and borne out by the data. This provides one of the few examples of the verification of predictions derived from work on the development of psychological theory using AI concepts in the process of theory formation. J. Lederberg 148 : Privileged Communication RUTGERS COMPUTERS IN BIOMEDICINE Section 6.2.6 ec) Artificial Intelligence; Representations, Reasonings and Systems Development Our work in this area continues to be oriented to collaboration with investigators in other Resource projects and to study of basic AI problems that are related to Resource applications. The collaborations involve adaptation and augmentation of existing AI methods and techniques to handle specific key problems identified in the application projects. fhe close collaboration witn investigators in the Belief Systems area has resulted this year in the development of the AIMDS System for handling problems of action interpretation of the type encountered in the domain of the BELIEVER theory. This system has provided one of the first examples of a working frame- based AI system. In addition, it has led to several important AI results, such as elucidation of the "frame problem" and unification of previous approaches to planning in heuristic problem solving. Our research in language processing has led this period to two important applications - in Medical Systems and in Belief Systems. In one project, the PEDAGLOT system is being adapted to provide a natural language interface for communicating patient case histories to our Zlaucoma systen. In a second project, PEDAGLOT is providing the basis for implementing the experimental component of a competence theory within whieh the BELIEVER theory ean be evaluated. Empirical work in this area requires the ability to process summaries and other natural language data. In the basic component of our work on language processing, we continued to develop a languaze inference system based on a "developmental paradigm" for grammar acquisition. We made progress in the area of coalescing rules of hypothesized grammars, and we started to look into ways of using semantic information to guide the hypothesis formation process. In another project, which is also focusing on hypothesis formation, we are Studying processes of computer assisted acquisition of domain knowledge from empirical data, where knowledge is in the form of weighted production rules. This type of knowledge can be represented as a stochastic sraph. This year we obtained several new results in this area. We explored the implications of these results with the help of an experimental program which constructs a stochastic graph from empirical data. Also, we wrote a program which makes use of a file of graph-structured knowledge to make decisions about a domain. In our work on theory formation in prozramminz, we developed a formation Strategy which combines a global, model-suided, approach with a local analysis of Special cases. In order to study experimentally this strategy, we are now developing a system for acquiring and handling information about programs in various stages of specification, as well as other Knowledze which is relevant to the formation task. During this period we made important vrogress in building a strong basis of AI languages for our work. The UCI-LISP and FUZZY programming languages were adapted to the RUTGERS-10 and they were further improved. The availability of these languages made possible the implementation of major sxarts of AIMDS over a relatively short period of time. Work has now started on exploring the use of Privileged Communication 149 J. Lederberg Section 6.2.6 RUTGERS COMPUTERS IN BIOMEDICINE FUZZY (including its features for effective use of incomplete and/or uncertain knowledge) and AIMDS in certain problems of medical decision making. The Second AIM Workshop took place June 1 to 4, 1976 near the Rutgers campus, and it was attended by about 150 participants. The program included reviews of recent AI developments in Medicine, Biochemistry and Psychology; lectures and panel discussions on knowledge representation and AI system design; papers summarizing recent AI work in other application areas (outside AIM); and presentations of current research on computer-based biomathematical models. The Workshop included panels on networking and shared resources; in addition, there were a number of informal meetings in which specific projects or issues were discussed in depth. Hands-on experimentation and demonstration of AI systems (which were accessed via TYMNET and ARPANET) were an important feature of the Workshop. All indications are that the Workshop was very effective in stimulating scientific interactions and in disseminating work being done in the area of AIM. In support of the AIM Workshop series we devoted considerable effort this period to systems development, to related computer and networking enhancements, “to preparation of proceedings for the first Workshop, and comprehensive Supporting documentation for the second. A panel on Applications of AI to Science and Medicine was organized for the week following the Second AIM Workshop at the National Computer Conference in New York. It was intended to further augment the dissemination activities of AIM by bringing to a wide audience of professionals in the computer field recent developments in the AIM community. D) Up-to-Date List of Publications Amarel, S. and Kulikowski, C. (1972) "Medical Decision Making and Computer Hodeling, Proc. of 5th International Conference on Systems seience, Honolulu, January 1972. Amarel, S. (1974) "Inference of Programs from Sample Computations", Proc. of NATO Advanced Study Institute on Computer Oriented Learning Processes, 1974, Bonas, France. Amarel, S. (1974) "Computer-Based Modeling and Interpretation in Medicine and Psychology: The Rutgers Research Resource", Proe. on Conference on the Computer as a Research Tool in the Life Sciences", June 1974, Aspen, by FASEB; also appears as Computers in Biomedicine TR-29. June 1974, Rutgers University, also in Computers in Life Sciences. W. Siler and D. Lindberg (eds.), Faseb and Plenum, 1975. Amarel S. (1975) Abstract of Panel on "AI Applications in Science and Medicine" in 1976 National Computer Conference Program, N.Y., June 7-10, 1975. Bruce B. (1972) "A Model for Temporal Reference and its Application in a Question Answering Program”, in "Artificial Intelligence", Vol. 3, Spring 1972. J. Lederberg | 159 Privileged Communication RUTGERS COMPUTERS IN BIOMEDICINE Section 6.2.6 Bruce, B. (1973) *A Logie for Unknown Outcomes", Notre Dame Journal of Formal Logic; also appears as Computers in Biomedicine, TM-35, Nov. 1973, Rutgers University. Bruce, B. (1973) "Case Structure Systems", Proc. 3rd International Joint Conference on Artificial Intelligence (IFCAI), August 1973. Bruce, B. (1975). "Belief Systems and Language Understanding", Current Trends in the Language Sciences, Sedelow, and Sedelow (eds.) Houton, in press. Chokhani, S. and Kulikowski, C.A. (1973) "Process Control Model for the Regulation of Intraocular Pressure and Glaucoma", Proc. IEEE Systems, Man & Cybernetics Conf., Boston, November 1973. Chokhani, S. (1975) "On the Interpretation of Biomathematical Models Within a Class of Decision-Making Procedures", Ph.D. Thesis, Rutgers University; also Computers in Biomedicine TR+43, May 1973. Fabens, W. (1972) "PEDAGLOT. A Teaching Learning System for Programming Language", Proc, ACM Sigplan Symposium on Pedagogic Languages, January 1972. Fabens, W. (1975) "“PEDAGLOT and Understanding Natural Language Processing". Proc. of the 13th Annual Meeting of the Asso. of Computational Linguistics, October 30 ~ Nov. 1, 1975. Kulikowski, C.A. and Weiss, S. (1972) "Strategies for Data Base Utilization in Sequential Pattern Recognition”, Proc. IEEE Conf. on Decision and Control, Symp. on Adaptive Processes, December 1972. Kulikowski C.A. and Weiss, S. (1973) "An Interactive Facility for the Inferential Modeling of Disease", Proc. 7JTth Annual Princeton Conf. on Information Sciences and Systems, March 1973. Kulikowski C.A. (1973) "Theory Formation in Medicine: A Network Structure for Inference", Proc. International Conference on Systems Science, January 1973. Kulikowski, C.A. Weiss S. and Safir, A. (1973) "Glaucoma Diagnosis and Therapy by Computer", Proc. Annual Meeting of the Asso. for Research in Vision and Ophthalmology, May 1973. Kulikowski, C.A. (1973) "Medical Decision-rlaxing and the Modeling of Disease", Proc. First Interntl. Conf. on Pattern Recognition, October 1973. Kulikowski, C.A. (1974) “Computer-Based Medical Consultation ~ A Representation of Treatment Strategies", Proc. Hawaii Interntl. Conf. on Systems Science, Jan. 1974. Kulikowski, C.A. (1974) "A System for Computer-Based Medical Consultation" Natl. Computer Conf., Chicago, May 1974. » Proc. Kulikowski, C.A. and Safir, A. (1975) "Computer-Based Systems Vision Care", Proceedings IEEE Interecon, April 1975. Privileged Communication 151 J. Lederberg Section 6.2.6 RUTGERS COMPUTERS IN BIOMEDICINE Kulikowski C.A. and Trigoboff, M. (1975) "A Multiple Hypothesis Selection System for Medical Decision-Making", Proc. 8th Hawaii Internatl. Conf. on Systems. Kulikowski, C. &N.S. Sridharan, (1975) "Report on the First Annual AIM Workshop on Artificial Intelligence in Medicine. Sigart Newsletter No. 55, December 1975. Kulikowski C. (1976) "Computer-Based Consultation Systems as a Teaching Tool in Higher Education, 3rd Annual N.J. Conf. on the use of Coaputers in Higher Education, March 1976. Kulikowski, C., Weiss S., Safir, A. et al (1976) "Glaucoma Diagnosis & Therapy by Computer: A Collaborative Network Approach" Proc. of ARVO, April 1976. Kulikowski, C. Weiss, S. Trigoboff, M. Safir, A., (1976) "Clinical Consultation and the Representation of Disease Processes", Some AI Approaches, ATSB Conferences, Edinburgh, July 19756. LeFaivre, R. and Walker, A. (1975) "Rutgers Research Resource on Computers in Biomedicine, H", Sigart Newsletter No. 54, October 1975. LeFaivre, R., (1976) "Procedural Representation in a Fuzzy Problem-Solving System", Proc. Natl. Computer Conf., New York, June 1975. LeFaivre,R. (1977) "Fuzzy Representation and Approximate Reasoning", submitted to IJCAI-77, MIT. Mathew, R., Kulikowski, C. and Kaplan, %. (1977) "A Multileveled presentation for Knowledge Acquisition in Medical Consultation stems", Proc. MEDINFO 77 (in press). Mauriello, D. (1974) "Simulation of Interaction Between Populations in Freshwater ' Phytoplankton", Ph.D. Thesis, Rutgers University 1974. Sehmidt, C. (1972) "A comparison of source unidimensional, multidimensional and set theoretic models for the prediction of judgements of trail implication", Proc. Eastern Psych. Asso. Meeting, Boston, April 1972. Schmidt, C.F. and D’Addamio, J. (1973) "A Model of the Common Sense Theory of Intension and Personal Causation", Proc. of the 3rd IJCAI, August 1973. Schmidt, C.F. and Sedlak, A. (1973) "An Understanding of Social Episodes", Proc. of Symposium on Social Cognition, American Psych. Asso. Convention, Montreal, August 1973. Schmidt, C.F. (1975) "Understanding Human Action", Proc. Theoretical Issues in Natural Language Processing: An Interdisciplinary Workshop in Computational Linguistics, Psychology, Artificial Intelligence, Cambridge, Mass., June 1975. Also appears as Computers in Biomedicine, TM-47, June 1975, Rutgers University. J. Lederberg 152 Privileged Communication RUTGERS COMPUTERS IN BIOMEDICINE Section 6.2.6 Schmidt C. (1975) “Understanding Human Action: Recognizing the Motives", Cognition and Social Behavior, J.S. Carroll and J. Payne (eds.), New York: Lawrence Earlbaum Associates, in press. Also appears as Computers in Biomedicine, TR-45, Juhe 1975, Rutgers University. Senmidt C.F., Sridharan, N.S., and Goodson, J.L. (1975) Recognizing plans and Sumuarizing actions. Proceedings of the Artificial Intelligence and Simulation of Benavior Conference, University of Edinburgh, Scotland, July 1976. Schmidt C. (1975) Understanding human action: Recognizing the plans and motives of other persons. In (eds. J. Carrol and J. Payne) Cognition and Social Behavior, Potomac, Maryland: Lawrence Earlbaum Associates, 1976. Schmidt, C.F. and Goodson, J.L. (1975) The Subjective Organization of Summaries of Action Sequences, 17th Annual Meeting of the Psychonomic Society, St. Louis, 1976. Sedlak, A.J. (1974) "An Investigation of the Development of the Child’s Understanding and Evaluation of the Actions of Others", Ph.D. Thesis, Rutgers University. Sridnaran, N.S. (1976) "The Frame and Focus Problems in AI: Decision in Relation to the BELIEVER System. Proceedings of the Conference on Artificial Intelligence & the Simulation of Human Benavior, Edinburgh, July 1976. Sridharan, N.S. (1976) “An Artificial Intelligence System to Model and Guide Organic Chemical Synthesis, Planning in Chemical Synthesis by Computer, American Chemical Society Press, September 1976. Sridharan, N.S. and Schmidt,C.F. (1977), Knowledze-Directed Inference in BELIEVER, Workshop on Pattern-Directed Inference Systems, Hawaii, May 1977. Srinivasan, C.V. (1973) "Tae Architecture of a Coherent Information System: A General Problem Solving System", Proc. of the 3rd IJCAI, August 1973. Trigoboff, MH. (1976) Propagation of Information in a Semantic Net", Proc. of the Conference on Artificial Intelligence and the Simulation of Behaviour, Edinburgn, Scotland, July 1976; updated version appears in CBM-~TM-57, Dept. of Computer Science, Rutgers University, 1977. Tucker, S.S. (1974) Cobalt Kinetics in Aquatic Microcosms", Ph.D. thesis, Rutgers University. Van der Mude, A. and Walker, A. (1976) "Some Results on the Inference of Stochastic Grammars", abstract in Proc. Symposium on New Directions and Recent Results in Algorithms and Complexity. Dept. of Computer Science, Carnegie-Mellon University. Vichnevetsky, R. (1973) "Physical Criteria in tne Evaluation of Computer Methods for Partial Differential Equations", Proc. 7th Internatl. AICA Congress, Prague, Sept. 1973; reprinted in Proc. of ATCA, Vol. XVI, No. 1, Jan. 1974, European Academic Press, Brussels, Belsiun. Privileged Communication 153 J. Lederberg Section 6.2.6 RUTGERS COMPUTERS IN BIOMEDICINE Vichnevetsky, R., Tu, K.W., Steen, J.A. (1974), "Quantitative Error Analysis of Numerical Methods for Partial Differential Equations", Proe. 8th Annual Princeton Conference on Information Science and Systems, Princeton University, March 1974. Walxer, A. (1975) "Formal Grammars and the Regeneration Capability of Biological systems”, Journal Comp. and Syst. Sciences, Yol. 11,No. 2, 252-261. tieiss, S. (1974) "A System for Model-Based Computer-Aided Diagnosis and Therapy", Parts I and II, Ph.D. Thesis, Rutgers University; also Computer in Biomedicine TR-27, Feb. 1974. Weiss, S., Kulikowski, C. and Safir, A. (1977) "Glaucoma Consultation Computer”, Computers in Biology and Medicine (in press). E) Funding Status 1) Granting Agency: Biotechnology Resources Program, DRR, NIH. 2) Grant number: RR-643. 3) Period of award: This is the 3rd year of the second 3-year period of the Resource. 4) Direct cost funds for the period September 1, 1976 to August 31, 1977: $336,314. 5) A proposal for a five-year extension of the Rutgers Resource was submitted in October 1976. The proposal is currently being evaluated by NIH. In our proposal we are requesting a substantially higner level of funding in order to cover increased levels of effort in all areas of the Rutgers Resource, and also to support the acquisition/ennancement of the RUTGERS-10 computer wnich we propose to use, in coordination witn the SUMEX-AIM facility, as a Shared resource for the national AIM community. II) INTERACTIONS WITH THE SUMEXY-AIM RESOURCE During the past year we have continued to use the SUMEX-AIM resource for program development and testing, for communications between collaborators distributed in different parts of the country and for preparation and running of tne AIM Workshop. We continue to access SUMEX-AIM via TYMNET, and to a smaller extent via ARPANET. SUMEX-AIM played a key role in consolidating our network of collaborators in ophthalmology (ONET) and in providing the support needed for establishing a productive collaboration among the ONSET investigators. Also, it has been most useful in communicating, planning and helping to set up the information pool for the Second AIM Workshop. Computing in the Rutgers Research Resource continues to be distributed between SUMEX-AIM and the RUTGERS-~10. The two computers are providing complementary resources for our research and for our national collaborations. At present, the distribution of our computing is about 3 to 1 between RUTGERS-10 and J. Lederberg 154 Privileged Communication RUTGERS COMPUTERS IN BIOMEDICINE section 6.2.6 SUMEX-AIM. Our total demand at SUMEX-AIM is estimated at about 5000 connect hours for the current year with most of the work done in INTERLISP (about 80% of our total connect hours) and the rest devoted mainly to communications and to limited program testing within ONET. The SUMEX-AIM facility was used for demonstrations of AIM prozrams in First year classes and in second year seminars at the Rutgers Medical School, CMDNJ; CASNET, MYCIN, INTERNIST and PARRY were interactively accessed in these classes and seminars. Another innovative use of SUMEX-AIM has been the collaborative development of the AI HANDBOOK, which is intended to provide a computer-based and network accessible encyclopedic coverage of the AI field for the AIM community and AIM guests. The AI HANDBOOK was initiated by Dr. E. Feigenbaum and his Students at Stanford. During the year, a graduate class at Rutgers, given by Dr. S. Amarel, worked on the AI HANDBOOK and contributed several articles. We find that the SUMEX-AIM bulletin board plays an imoortant role in communicating ideas and information on services among users. Since the MYCIN group at Stanford regularly posts summaries of meetings; and other technical information, on the MYCIN bulletin board, we have been able to keep track of their program and problems. This was particularly useful for our work on IRIS where concepts close to the MYCIN CF formalism are being studied. System support at SUMEX-AIM has been more than good; it has been friendly. Problems or questions concerning the system are consistently handled quickly and competently by SUMEX-AIM staff. Service is simply outstanding. The system is under heavy usage for most of the day, which causes painfully Slow response times for large jobs; thus, it is usable for Rutgers users in the early morning or in the late evening. In most days the load average stays over 7 from noon EST to about 7 p.m. During these hours, the computer is only , marginally useful for work with a large LISP system such as IRIS (currently this system has 245 pages of an INTERLISP core image). For relatively small jobs (about 70 pages), the response time has improved consequent to the changes in the scheduler in early Spring. Access to SUMEX-AIM via TYMNST has improved consideraodly. Occasionally, however, problems persist with spurious characters and with broken connections. In the last year, several new areas of collaboration between Rutgers and SUMEX-AIM have developed, mostly along the lines of systems and support software. These include the following specific efforts: |. MAINSATL. During the past year, the design of tne MAINSAIL system has been Stabilized to a great degree, and Rutgers has followed the development of the MAINSAIL effort in order to be in a position to annly it to Rutgers’ AIM activities, particularly in the ophthalmology area. We have made several passes over the MAINSAIL design during this zeriod, with particular interest to the issues of memory allocation and the possibilities of doing list processing in MAINSAIL. During April, Clark Wilcox and others from the Stanford group installed a prototype MAINSAIL system on the Rutgers PDP-19, and it is presently being Privileged Communication 155 J. Lederberg Section 6.2.6 RUTGERS COMPUTERS IN BIOMEDICINE used by a group from NIH who are interested in evaluating MAINSAIL for their own work. 2. SOFTWARE. Two text processing programs, TVEDIT and PUB, were brought over from SUMEX and installed at Rutgers, and are now being used on the RUTGERS- 10. These tools, wnich were developed at Stanford’s IMSSS and AI Laboratories, reduce the overhead in program and document preparation and maintenance. 3. ALLOCATION and ARCHIVING. The design of tne allocation and archiving systems that have been in use at SUMEX have been adopted, with some modification, for use at Rutgers. One of the important products of the SUMEX research has been the models for interaction between a variety of collaborators; the way in which tne allocation of system file space and the archiving of unneeded files have been accomplished at Stanford have been adopted at Rutgers. 4. CG: A program for Explanation of an AI System. In a somewhat different area, Prof. David Levine of the Rutgers faculty collaborated with Dr. Ray Carhart of the Stanford Heuristic Programming Project to produce a program that provides a dynamic ,display-oriented interface to the CONGEN program. CONGEN examines the chemical formulas that are possible. from a particular empirical formula, under a set of constraints on the generation of formulas. CG, the program that effects this interface, was written at Rutgers, and can run either at Rutgers or at SUJMEX-AIM; CONGEN, which is currently written in INTERLISP, runs only at SUMEX-AIM. 5. SYSTEM MODEL: The SUMEX staff has continued to be a model of cooperation and support for research. More importantly, the protocols that the SUMRY staff have developed for solving problems of system/user and user/user interaction continue to be models that we find it possible to apply to the nutgers environment. TIL. FUTURE PLANS OF THE RUTGERS RESOURCE; RELATIONS TO SUMEX-AIM Our plans for the future are to continue along the main lines of our current research. We expect our computing needs to grow at a rate of about 20% per year. About a quarter of our total computing will be done at SUMEX-AIM; most of this work will be concerned with large program development (mainly in INTERLISP) . In our application for renewal of the Resource grant (which is currently being reviewed at NIH) we propose to acquire and augment the RUTGERS-19 computer in order to provide sufficient capacity to satisfy the projected computing demand of the Rutgers Resource, and also to provide added computing capacity for the national AIM community and to enlarge the scope of the AIM resource sharing activities. We are proposing a KL-10 configuration with TOPS-20 software, which promises compatible operation with the TENEX system at SUMEX-AIM. We expect the configuration to have 50% more capacity than the present RUTGERS-10 in the first year of the renewal period. Two thirds of the enhanced systen capacity will be allocated to the Resource; this capacity share will be evenly divided between internal Resource projects and tne national AIM community. We expect the J. Lederberg 156 Privileged Communication RUTGERS COMPUTERS IN BIOMEDICINE Section 6.2.56 RUTGERS-10 to be operated in close coordination with the SUMEX-AIM facility, within a common management framework. This plan will provide an additional node to tne AIM network. We envision a move towards specialization and differentiation of functions among the nodes in the network. We propose to use the Rutgers AIM center for promotion of AI applications in clinical medicine (and in related biological modeling) with special emphasis on collaborative network- oased projects of the type that have developed within our Resource to date. In addition to our computing plans, we propose to increase our AIM dissemination and training efforts (AIM Workshops, conferences, post doctoral prograns), and to continue our system development activities with the ain of enhancing scientific communications within the AIM community and between AIM researchers and other interested scientists. We expect increased collaboration with SUMEX-AIM in these areas. Privileged Communication 157 J. Lederb : . Lederberg Section 6.3 PILOT STANFORD PROJECTS 6.3 PILOT STANFORD PROJECTS The following are descriptions of the informal pilot projects currently using the Stanford portion of the SUMEX~AIM resource pending funding, and full review and authorization. J. Lederberg 1 WA we Privileged Communication GENETICS APPLICATIONS PROJECT Section 6.3.1 6.3.1 GENETICS APPLICATIONS PROJECT Computer Science Applications in Genstics Prof. L. L. Cavalli-Sforza Department of Genetics Stanford University School of Medicine We have been quite satisfied with the use of programs such as REDUCE, MLAB, SPSS. REDUCE has been used by graduate student D. Wagener, to check algebra, and also by L. Cavalli-Sforza and has been of great help in circumstances in which algebraic manipulations were too lengthy for hand verification. Unfortunately REDUCE has a maximum length of algebraic expansions that can be manipulated by computer, which is not always generous enough for our purposes; the maximum allowed was increased but there is now no warning as of when the length of expression overruns the new limits. The penalty is the total loss of the information. If this could be mended, the program would be much more useful. MLAB is very useful for least square fitting of complex systems of equations. SPSS is widely used and well known; it is working fine in the system. Special modelling efforts involved: 1) a program of information storage and retrieval which may be useful also for analysis of multi-dimensional contingency tables. The material to which it was applied derives from anthropological and archeological survey and excavation data in Calabria, Italy by A. Ammerman. The information collected on coordinates of sites, material found, elevation, land form, soil, ecological and geological data ete. refers to hundreds of sites and will eventually be subject to analysis according: to models of growth and spread of Neolithic populations. It is eventually hoped to investizate the power of new techniques of statistical analysis, employing spectral analysis of the matrices representing the data. 2) Similar situations, on the basis of other data available from the literature, are also being investigated by means of Simulations of the population growth and spread, e.g. for the Bandkeramik populations in Central Europe. It is thus hoped to obtain, eventually, an explanation of the geographic distribution of genes in Europe, the Middle East and nearby areas, based on the hypothesis that the present distribution reflects predominantly a major radiation of a population of farmers which took place with the spread of agriculture from the Middle East, from 9900 to 5000 years ago. 3) The geographic distribution of genes, as observed today, is analyzed by means of gene frequency maps. we have developed many methods of interpolation of data for map construction, and many methods of graphical display of the maps obtained. We are currently comparing the methods of construction of maps. Some of the methods of construction are fairly sophisticated, but more work will be necessary to develop further our programs so that they can be considered to interpolate intelligently. Our tests of validity are based on eliminating each observation in turn, computing its expected value with the observed one (a sort of jack-~ knifing). It is clear that results could be improved if this procedure could be carried out simultaneously for several genes and alleles; at the moment it is done for one allele at a time. The simultaneous analysis is an ambitious progran but would considerably improve present results. At the monent, for instance, we have no way to make gene frequencies of all alleles at a locus sum to 109% (except approximately, because we cannot consider more than one allele at a Privileged Communication 159 J. Lederberg Section 6.3.1 GENETICS APPLICATIONS PROJECT time). In addition, other information on the populations (whether they are isolates, etc.) could be introduced, and verified by the program. Also, specific hypotheses on the evolutionary factors affecting the gene frequencies could be tested more directly. At the moment, the major limitation to these more sophisticated analyses is the availability of computer space. J. Lederberg 159 Privileged Communication BAYLOR-METHODIST CEREBROVASCULAR PROJECT Section 6.3.2 6.3.2 BAYLOR-METHODIST CEREBROVASCULAR PROJECT Baylor-Methodist Cerebrovascular Project John L. Gedye, M.D. Data Services Research Laboratory Department of Neurology, Baylor College of Medicine During the year the Data Services Research Laboratory has had a total of about 2,500 hours of man-effort available, of which about 5% has been devoted to activities directly related to the Sumex pilot study. I) Summary of research program A) Technical goals The general goal of the laboratory ~ the creation of a computer-based System for the support of clinical research in neurology, as described in the 1975-76 annual report ~ remains unchanged. In spite of the limited manpower available during the year, good progress has been made toward the specific goal of developing the PDP11/35-based clinical research system “CLINSYS” to a point where it can begin to give real support to Departmental projects. We have made good progress in recent weeks with the development of software which will allow easier access to the resources of SUMEX for users of our local system. It is now possible to give the command “SUMEX” to our local system executive and have the entire login procedure through to receipt of the "final" SUMEX “8° carried out automatically. Control characters allow tne user’s terminal to be switched between SUMEX and the local system, and these have been enosen to be compatible with the BANANARD control characters, so that this can be operated without interferenes. Facilities have peen provided which allow ASCII files to be be created on either system and transferred to the other. These facilities will operate under our local PDP11/35 batch system, and we have tested them by creating a test data file of about 1,000 ASCII characters on an account on the PDP11/35, and Submitting a batch job (to run at specified time) which logs into SUMEX, transfers the test data file and copies it back again onto the PDP11/35 account and logs out. It then logs in again and repeats the whole process with the latest copy of the file. In this way we hope to estimate the reliability of this form of data transmission - at present it looks as if the error rate will be less than 1 in 16,000 characters ~ and to lay the foundations for a system that will allow us to make maximum use of SUMEYX off-peak time in the projects described below. Privileged Communication _ 161 J. Lederberg Section 6.3.2 AYLOR-METHODIST CEREBROVASCULAR PROJECT B) tiedical relevance and collaboration The development of CLINSYS has continued on the general lines described in the 1975-76 annual report. Specific data acquisition procedures have been designed and implemented for: clinical psychology -~ both conventional and automated testing techniques have been accommodated; clinical physiology - facilities for the manual entry of Xe133 inhalation regional cerebral blood flow measurements have been provided, and work is now in progress on a system for direct transnission of data to the PDP11/35 from the integral PDP11/05 which is part of the equipment ; and hematology ~ provision has been made for the acquisition of data from tests of platelet funetion. Because of it’s central importance, a major emphasis has been placed on making provision for the acquisition of suitably summarised CT scan data, and a number of exploratory studies have been carried out with the result that we hope to have the first edition of a “CT scan system” working in the near future. This will have an important part to play in future projects. No further progress has been made with the implementation of a work station incorporating the hand-held OCR wand developed by Recognition Equipment Incorporated - which was described in the 1975-76 report — but we intend to make use of such a “wand” work station in the context of a system for acquiring data from the radiologist’s “CT scan report’ as part of the ‘CT’ record. C) Progress summary The aim of our “pilot study” remains unchanged - to formulate a project relevant to the activities of the Department which will provide an acceptable and legitimate “point of entry” for artificial intelligence research, and which will allow the systematic formulation of objectives for the future. Work nas continued along the lines discussed in the 1975-76 report, using, as test data, results from 69 demented patients and 15 controls who had had regional cerebral blood flow measurements. This work has led to a promising ‘AI’ approach which is now being applied to CT scan data, and when the feasibility of this has been demonstrated the way will be open for work to go head on the implementation of a general purpose program. D) Publications There are as yet no publications dealing with the “pilot study’ as such. Certain aspects of the work referred to in this report have been mentioned in pudlications but these are all currently “in press”. Details are available on request. J. Lederberg 162 Privileged Communication BAYLOR-METHODIST CEREBROVASCULAR PROJECT Section 6.3.2 E) Funding status 1) Current funding The work is currently supported by a section of the 3-year grant for the Center for Cerebrovascular Research, but at the present time this is only approved up to January 31st, 1977. 2) Pending applications and renewals Work is currently in progress on a grant application for submission by July ist for support for the laboratory from April 1st, 1978. This will concentrate on the use of CLINSYS to support of the study of brain-behavior relations in demented patients using CT scan data and the results of automated behavioural assessment. II) A) B) C) II A) B) Privileged Communication 163 Interactions with the SUMEX-~ATM resource Little has so far been achieved by way of collaborations through the network, although the SNDMSG facility has been useful for keeping in touch with contacts made at the 1975 workshop. It is hoped though, that in the future we may be able to test out the concept of a CT scan archive created by the joint efforts of a dispersed community of users. For some reason I did not hear about the 1975 workshop until it was over, and so far have heard nothing about a 1977 one. I found the 1975 workshop very useful, and would strongly support the continuation of the workshops in some form - particularly if one could get down to fundamentals with people working on similar problems. I have kept in close contact with Paul Blackwell at Columbia, Missouri since the 1975 workshop, and we last met at an N.S.F. Conference on “MATHEMATICAL STRUCTURG IN THE HUMAN SCIENCES” at Penn State in March. I have no criticisms of resource services beyond the usual one of slowness of response time at peak periods. Follow on SUMEX grant (8/78 -7/83) The main long range user goal of relevance is the establishment of a demonstration CT scan reference archive using the resources of SUMEX. It is not. clear just wnat resources this will need, but at the present time it looks as if the feasibility of the approach could be established with an allocation of 500 pages of storage, and possibly less. The main justification for continued use of SUMEX is that it provides a unique opportunity to exolore the possibility of setting un a dispersed CT scan research community with a reasonably high chance of being able to J. Lederberg Section 6.3.2 BAYLOR-METHODIST CEREBROVASCULAR PROJECT demonstrate something of potential clinical value in the relatively short term. C) I would like to see attention given to the communications potential of the SUMEX resource. We have not been able to make full use of this in last two years because of a lack of local resources, but now that we have our local system interfaced we are beginning to get a real feel for the potentialities. We have also found that visitors to our laboratory are very impressed with the ease of setting up the interface, and many - including computer company representatives - have confessed to being unaware of the possibilities provided by the existing technology. In particular we have found little experience of the use of autodiallers. J. Lederberg 164 Privileged Communication COMPUTER ANALYSIS OF CORONARY ARTERIOGRAMS Section 6.3.3 6.3.3 COMPUTER ANALYSTS OF CORONARY ARTERIOGRAMS Computer Analysis of Coronary Arteriograns Donald C. Harrison, M.D., Edwin L. Alderman, M.D., and Lynn Quam, Ph.D. Division of Cardiology, Stanford University Medical School The goal of this project is to develop computer techniques for automatic aquisition of the anatomic distribution of coronary arteries and a quantitation of tne degree of narrowing of these vessels. In order to do this, two different types of image processing techniques will be developed. First, a three- dimensional representation of the coronary arterial tree will be automaticaly constructed from coronary arteriograms taken sequentially from several different views. Second, the amount of stenosis will be measured by combining information from multiple sequential frames in order to improve resolution and reduce radiographic noise. BACKSROUND: Coronary arteriography is the definitive test for the evaluation of patients with coronary artery disease. There is no other test currently available which provides information concerning the location and severity of coronary narrowings and the distribution of coronary blood vessels in the myocardium. Numerous studies document that prognosis in patients with coronary disease reflects the severity of anatomic disease. Coronary vascular anatomy and the extent of lesions are, in a epidemiologic sense, more precise indicators of prognosis than are clinical symptoms. At the present time, catezorization of the extent of coronary vascular disease is based somewhat simplistically on the number of major coronary vessels involved and a rough estimate of the percentage obstruction. Computer representation of the coronary tree, coupled with either interactive or automatic entry of degree of stenosis will permit the development of more precise indices of anatomic disease of the myocardiua. Computer image processing techniques offer the possibility of objectively measuring the severity of coronary stenosis, both at the point of maximal narrowing and averaged over a segment of the vessel. APPROACH: An extensive set of image processing functions have been developed and applied to detect the regions of the arteriograms which correspond to the arterial tree. These regions are then transformed to a "skeleton® which roughly corresponds to the midlines of the vessels in the arterial tree. This skeleton is then transformed to a graph representation which ean be topologically and Seometrically analyzed to distinguish vessel intersections (in the 2-d projection, not real 3-space intersections) from vessel bifurcations. The result is a graph structure interpretation of the arterial tree with quantitation of the Privileged Communication 1 ao wi J. Lederberg Section 6.3.3 COMPUTER ANALYSIS OF CORONARY ARTERIOGRAMS locations (2-d) of bifurcations, and for each vessel segment the path of the vessel midline and the vessel diameter. The computer algorithms are described in more detail in the following sections. Data Aquisition: We have digitized a number of 35 mm cine frames from three subjects using both an Optronics film scanner and a Dicomed film digitizer operating at 25 and 50 micron pixel resolution. For each subject frames are manually selected to provide good contrast in the proximal vessels from both LAO and RAO projections and be approximately synchronized within the cardiac cycle. pre-processing: The digitized frames are computer enhanced using high frequency filtering to eliminate the x-ray exposure gradient and emphasize sharp edges which tend to correspond to the vessels. High contrast areas in the enhanced frames are detected by a simple threshold region detector. Currently, many regions are detected which do not correspond to the arterial tree, but are caused by background features such as vertebra. We are in the process of digitizing another set of frames which have been chosen to include time synchronized pre-injection frames in order to permit background subtraction. The result of this step is a binary image corresponding to high density areas in the frame. The root of the arterial tree is manually specified by the operator, and a connected point region grower finds all points connected to the root. This usually finds all medium and large sized vessels, and some smaller vessels. Unconnected background is totally eliminated. Sonetimes, substantial pieces of the arterial tree are not connected to the root. When this occurs, the operator can run the region grower from new starting points. The result of this step is a binary image corresponding to most of the arterial tree. We expect that by using background subtraction we can very reliably detect the arterial tree and eliminate most of the manual "hand-holding" in the previous steps. Arterial Tree Graph Formation: The binary image of the arterial tree is "skeletonized" by computing the distance transform of the image and connecting peaks and ridges in distanee. The distance transform computes for each point in the image, the Euclidean distance to the nearest zero (point not in region). Points at vessel midlines are easily detected because they are local maxima (ridges) in distance from their vessel walls. The 2~dimensional array of ridge-peak information is next processed to form a graph structure describing the connectivity of vessel segments (distance ridges) to nodes (points where 3 or more ridges converge). J. Lederberg 165 Privileged Communication COMPUTER ANALYSIS OF CORONARY ARTERIOGRAMS Section 6.3.3 The graph is simplified by detecting and eliminating insignificant terminal segments which are usually the result of noise in the image. we have now accomplished a significant simplification of the data from the original 2-dimensional array of x-ray density data to an essentially 1- dimensional description of the vessel midlines and points of bifurcation and intersection. This data (when vessel width is included) is sufficient to completely reconstruct the binary image of the arterial tree. Topologic and Geometric Graph Analysis: The graph is next analyzed to determine the proximal-distal orientation of each vessel segment. Starting at the distal node of a vessel segment, all segments which are attached to that node must be within 90 degrees in pointing direction. Any segment violating this rule is identified as an intersection. Starting from the root of the arterial tree, all segments are classified by this procedure. Nodes whicn have been identified as intersections are now analyzed in order to correspond distal segments with proximal segments according to the a set of rules about arterial topology and geometry. Having resolved vessel intersections, we now transform the graph to a simple tree structure which corresponds topologically to the arterial tree. Future Directions: The above computer algorithms have been successfully applied to the images ina few sets of digitized data. We plan to digitize frames prior to injection to enable background subtraction, which we believe will greatly improve the reliability and accuracy of the initial vessel detection. The algorithms have not yet been tried on cases with abnormal angiograms, and we expect that as more cases are incorporated into our image library, it will be necessary to develop more rules and analytical techniques in order to properly interpret the 2- dimensional images. Based on the encouraging progress which has been made in processing coronary arteriograms and based on other areas of expertise in image processing within the Stanford University Medical Center, we have developed and submitted on November 1, 1976 to the NHLBI a new grant proposal titled "Computerized Medical Image Processing Laboratory". This proposal contains a detailed report of the progress had been made up to that time and details the further steps which we propose to pursue, USE OF SUMEX RESOURCE: Work of this project has been dependent on the SUMEX facility for several reasons. First, this project has not been funded to provide its own computer facilities. Second, although the Stanford Division of Cardiology does have minicomputer systems which could be used for this project, it is considerably Privileged Communication 167 J. Lederberg Section 6.3.3 COMPUTER ANALYSIS OF CORONARY ARTERIOGRAMS easier to develop image processing and artificial intelligence techniques on a larger scale system in which many powerful tools already exist. It is important in the research phase of this project to be able to easily and quickly perform experiments, without the difficulties of fitting the experimental programs into tne small computer memory environment. We believe that our use of the SUMEX facility is completely within the guidelines for SUMEX use, since our primary purpose is to develop image analysis and understanding techniques for the quantitation of coronary artery disease. A secondary result of this research project is the development of general purpose image analysis and modelling algorithms. SPONSORSHIP: Granting agency: NIH Grant: 5 RO 1 HL188790-02 Period of award: 06/01/76 - 05/31/78 Current annual funding: $20,807 + indirect costs J. Lederberg 158 Privileged Communication QUANTUM CHEMICAL INVESTIGATIONS Section 6.3.4 6.3.4 QUANTUM CHEMICAL INVESTIGATIONS Theoretical Investigations of Heme Proteins and Opiate Narcotics Dr. Gilda Loew Department of Genetics Stanford University (Grant, PCH 76 07324, 2 years, $20,500 this year) SUMEX is used for the calculation of various one-electron electronic properties of iron containing compounds. The programs were formulated and written by David Steinberg, Michael Chadwick and David Lo. David Lo was responsible for converting the program for interactive use on the PDP system. Slight improvements were made by Robert Kirchner and Sheldon Aronowitz has expanded the formulation to include additional spin and oxidation states of the iron atom. The properties that are calculated include the electric field gradient at the iron nucleus, quadrupole Splitting, isotropic and anisotropic hyperfine interaction, spin-orbit coupling and zero field splitting, g values and temperature dependent effective magnetic moments. The calculated values are compared directly to experimental results obtained fron published Mossbauer resonance and electron spin resonance spectra. Such a comparison determines not only the reliability with which these properties can be calculated but also gives an indication of the ability of the model of the iron active site to mimic the actual environment found in a partieular compound or iron containing protein. The major input to these properties programs is a description of the electron distribution of the compound under consideration. This description is obtained using a semi-empirical molecular orbital method employing the iterative extended Huckel procedure. Such a calculation requires up to 660K core and is performed elsewhere. When the calculated electron distribution yields a set of calculated properties in agreement with observation, we have increased faith in the description of the model of the active site and can carry the model one step further to make qualitative inferences about certain properties relevant to the biological functioning of the compound. We are currently performing a systematic study of neme proteins. The electromagnetic properties of these proteins and of synthesized model compounds which mimic the observed behavior of the proteins have been well studied experimentally. Specifically, we have addressed the following problems: (1) Cooperativity of oxygen binding to hemoglobin. Calculations have been made for high and low affinity forms of deoxyhemozglobin. This work has been submitted to Nature (Loew and Kirchner). (2) The nature of oxygen binding to the heme unit. Calculations were made of model oxyneme compounds with varyins oxygen geometry and electron configuration. This work is now in press in the Journal of the American Chenical Society. (Kirchner and Loew). Privileged Communication 159 J. Lederberg Section 6.3.4 QUANTUM CHEMICAL INVESTIGATIONS (3) The enzymatic cycle of an oxidative metabolizing heme enzyme called cytochrome P-459. This enzyme is responsible for drug metabolism and toxicity and for activation of many chemical carcinogens. Preliminary characterization of the enzymatically active state has been made. This work is in press in the Journal of the American Chemical Society (Loew, Kert Hjelmeland and Kirchner). In a completely different context, we have been using SUMEX to calculate the conformation of pentapeptides (enkephalins) which have been recently found to be endogenous opiates. The aim of this study is to determine in what way, if any, they can mimic the structure of prototype opiates such as morphine and meperidine. For this work, we use a protein conformation program with empirical interaction potentials. Quantum mechanical conformations calculations of the Same peptides are being performed by us elsewhere and the results of the two methods being compared. J. Lederberg 170 Privileged Communication PILOT AIM PROJECTS Section 6.4 6.4 PILOT ALti PROJECTS The following are descriptions of the informal pilot projects currently using the AIM portion of the SUMSX-~AIM resource pending funding, and full review and authorization. Privilezed Communication 171 J. Lederber » Lede & Section 6.4.1 COMMUNICATION ENHANCEMENT PROJECT 6.4.1 COMMUNICATION ENHANCEMENT PROJECT Communication Enhancement Project John B. Eulenberg, Ph.D. and Carl V. Page, Ph.D. Department of Computer Science Michigan State University I) Summary of research progran. A) Technical goals, The major goal of this research is the design of intelligent speech prostheses for persons who experience severe communication handicaps. Essential subsgoals are; (1) Design of input devices for persons with greatly restricted movement. (2) Development of software for text-to-speech translation. (3) Research in knowledge representations for syntax and semantics of spoken English in restricted real world domains. (4) Development of micro-computer based portable speech prostheses. B) Medical Relevance and Collaboration. We have exchanged visits and had many conversations with Dr. Kenneth Colby of UCLA who is working on similar problems for a domain of people who have apnasia. The need for such technology in the medical area is very great. Millions of people around the world lead isolated existences unable to communicate because of stroke, traumatic brain injury, cerebral palsy, and other causes. The emergence of inexpensive micro-processors and sound synthesizers makes it possible to develop devices now that can be the prototypes for widespread use. We have organized institutes to bring together the many professionals who have an interest in this area. Together with the Tufts New England Medical Center, the TRACE Center of the U. of Wisconsin, and the Children’s Hospital at Stanford, we have begun the first newsletter for dissemination in this area. Dr. John B. Eulenberg helped to organize the first Federal workshop for governmental agencies who have some interest in funding work in these areas. Represented were the Bureau of Education for the Handicapped, The Veterans Administration, NIMH, NINCDS, NSF, and others. We have also been in touch with United Cerebral Palsy associations at the state and national levels. There is much interest in this area from medical, educational, and governmental communities, but no traditional means of supporting it. J. Lederberg 172 Privileged Communication COMMUNICATION ENHANCEMENT PROJECT Section 6.4.1 C) Progress summary. Although some facets of the research have been underway at MSU for several years, we have been using SUMEX-AIM for only six weeks at this time, having received our password in March, 1977. During the last six weeks, we have: 1) Designed and built hardware and software allowing us to transmit files to SUMEX from our Nova 2/10 at 300 baud. 2) Organized a research team of 4 students posessing background in artificial intelligence led by Dr. Carl V. Page to develop a semantics- based speech generator. We expect to have a prototype running in June (written in SAIL). To this end we are concentrating on semantics associated with personal needs, small talk (weather ete.), and perhaps obtaining geographic directions. 3) Have bezun conversion of ORTHOPHONE, MSU’s large English text-to-speech program from its CDC6500 Fortran implementation to a SAIL version. 4) Obtained temporary local support for terminals and tie-lines to use the SUMEX-AIM facility. We requested these in our original proposal but were not granted them. We have to share with others in the use our tie-lines and terminals. At present the lack of a dedicated tie-line from East Lansing to Tymshare in Ann Arbor or Detroit is a problem for us during 0600 to a900 PST. During the past few months, Dr. Richard Reid of our project has: 5) Developed a personal communication system for a 10-year-old person who has cerebral palsy. It is micro-computer-based and ean accept inputs via an adaptive switch from a series of menus displayed on a TV screen, via Morse code, or by a keyboard. Its outputs can be TV display, hard copy, Morse code, Spoken English, Morse code, or musical sounds. We expect to use knowledge gained from the SUMEX-AIM semantics project to Specify the content and connection of the choice menus for this project. During the past three months, 6) We have begun to experiment with the interaction of knowledge sources (letter and word frequencies, syntactics, semantics and pragmatics) as a means of anticipating likely inputs and displaying them for a person to choose from. 7) Built and tested a myoelectric interface and used it (together with a miniature FM transmitter) for input of changing muscle potentials into a computer. There is reason to believe that this means of input may provide a higher bit rate than any other known means for those people who experience severe motoric problems due to cerebral palsy. Privileged Communication 173 J. Lederberg Section 6.4.1 COMMUNICATION ENHANCEMENT PROJECT D) Up-to date list of publications. (1974 to date) For John B. Eulenberg: "Technical Systems Development, Headend", Interim Report, April, 1975, Experimental Applications of Two-way Cable Delivery, NSF Grant No. APR 75-142586. "Interactive New Hired Information Access Systen with Both Voice and Hard Copy Output: User’s Guide to NHQUZRRY", April 11, 1976 (With Steven Kludt and Jerome Jackson (Artificial Language Laboratory Report AEB 041176) ) "Language Individualization in a Computer-Based Speech Prosthesis System", National Computer Conference, New York, June 9, 1976. “Individualization in a Speech Prosthesis System", Proceedings of 1976 Conference on Systems and Devices for the Disabled, June 19, 1976. "The LEAF Language", Interim Report, September, 1975, NSF Grant No. APR 75- 14286. "A Programmable Multi-Channel Modem Output Switch", September 22, 1976, with Joseph C. Gehman and Juha Koljonen (Artificial Language Laboratory Report AEB 092275) "SMPTE Time Code Interface and Computer-Controlled Video Switcher", with Michael Gorbutt and Dennis Phillips, Interim Report, March, 1977 NSF Grant APR 75-14285. For Carl V. Page: "Heuristics for Signature Table Analysis as a Pattern Recoznition Technique", IEEE Transactions on Systeas, Man and Cyberneties,Vol. SMC- 7, No. 2, February 1977. "Discriminant Grammars, an Alternative to Parsing". with Alan Filipski, Proceedings of the IEE Workshop on Picture Processing, Computer Graphics, and Pattern Recognition, April 22, 1977. "Pattern Recognition and Data structures". Chapter in "Data Structures in Computer Graphics and Pattern Recognition" Edited by Allen Klinger, Academic Press, 1977. During 1976 Dr. Eulenberg presented 15 lectures around the country on his research, was interviewed for TV eight times and was on radio five times. E) Funding Status. 1) Current funding. Wayne County (Detroit) Intermediate School District. $230,000. (second year) Jackson County Intermediate School District $21,500 (Second year). Both of these are on a one year at a time basis. J. Lederberg 174 Privileged Communication COMMUNICATION ENHANCEMENT PROJECT Section 6.4.1 Some of this money is being used to purecnase equipment which is the property of WCISD or JCISD for use in demonstration classrooms in the schools. Very little of it can be used to support the research goals which we have communicated to SUMEX-AIM because of other commitments in the grant. However, the special communication devices, students, and other research facilities provides the critical mass which will allow us to do the work that we nave proposed. 2) Pending applications and renewals. State of Michigan Vocational Rehabilitation Services $30,000. (application) United Cerebral Palsy Association of Michigan $50,900. (application) United Cerebral Palsy Association (National) $60,000. (For study of control by myoelectric inputs) (application) Oakland County Intermediate School District $200,000. (application) Genessee County Intermediate School District $200,000. (Being written) AS one can see from this list of sources, there is a lot of interest in this area from agencies which are not experienced in funding high-technology and research, since a mandatory special education act has become law in Michigan. II) Interactions with the SUMEX-AIM resource. Again we point out that we have been a part of this community for only about 6 weeks and we will have more to say next year. A) Examples of medical collaboration and medical use of crograms via SUMEX. The faculty in the MSU College of Human Medicine who teach medical decision making were shown a demonstration of the SUMEX system, MYCIN and PARRY. We plan to present a demonstration to advanced medical students and faculty at the Medical School in the near future. A member of our Medical School faculty, Dr. Richard Ropple, an expert on myoelectronics, is a member of of our research group. The Dean of our College of Human Medicine visited our laboratory in April, 1977 and we expect encouragement and collaboration. B) Examples of sharing, contacts, and cross-fertilization with other SUMEX-~AIM projects. 1. We have met with Dr. Kenneth Colby on many occasions ineluding the SUMEX-AIM workshop in June, 1976. Our work in many ways complements his and we have had several wortnwhile interchanges of information. We are Privilezed Communication 175 J. Lederberg Section 6.4.1 COMMUNICATION ENHANCEMENT PROJECT converting our major software programs for speech generation and adaptive inputs to the SUMEX AIM system in part so that they can be used by Dr. Colby and his group. , 2. Mr. Douglas Appelt, a doctoral student at SU-AI was our principal systems programmer last summer. He is currently doing research in the same area as ours with Dr. Gary Hendrix of SRI. We have used his knowledge of your system (via the message sending routines) to assist us in starting our project. Mr. Appelt will be working with us at MSU again this summer (June-Sept.,1977), and he will be using the SUMEX-AIM system. C) Critique of resource services. We have found the HELP files to be a lot of help. We are beginning to understand our own needs and your services to the extent that it may be helpful to meet with one of your staff. Dr. Eulenberg will be in California in early June and plans to visit your facility. However, we have found that your system is easy to use and do not feel more distant from you than from other computer installations on our own campus. III) Follow-on SUMEX grant period (8/73-7/83). A) Long-range user project goals and plans. We want to do fundamental research in artificial intelligence in the context of the generation of speech from very minimal amounts of input. This problem seems closely related to the understanding of speech. It seems that the methods of representation of knowledge used for speech or vision understanding can be used in a natural way for fluent generation of speech. Our area seems almost unique in AI in that it is socially desirable (without question). Even relatively primitive systems can improve the quality of life for hundreds of thousands of people. Major long range goals are: 1) To do research in transposing the vocal tract to another region of the body in which an individual has suitable myoelectric control for the generation of speech. 2) To define a suitable system of semantics and to encode world knowledge in that system that would be useful for the generation of speech fluently. 3) To discover primitive operations on semantics which allow new and appropriate combinations of speech to be generated. (Using other sources of knowledge.) J. Lederberg 176 Privileged Communication COMMUNICATION ENHANCEMENT PROJECT section 6.4.1 4) 5) 6) 7) 8) 9) 10) 11) To develop means for individuals who are physically unable to use standard input devices to program and personalize their own speech and environmental control system. To study means of using speech output to aid blind persons both throuzh experiments with simplified text to speech devices and through means of training blind persons to write in cursive and manuscript. To study the educational consequences of communication aid systems for individuals who, because of previous misdiagnoses as mentally impaired, have been excluded for the mainstream education system. To improve the prosodic qualities of generated speech, using its semantic aspects. To design portable speech prostheses which allow maximum use of state of the art knowledge in speech generation. To develop an experimental base for studing how the concepts which are articulated in speech are manipulated by individuals at differing states of mental organization To study the potential for speech generation systems as a means of stimulating autistic children, To develop voice recognition systems which will aid individuals with limited speech to develop their full potential. (We don’t expect to finish all of these by 1983. ) B) Justification for continued use of SUMEX by your project. 1) 2) 3) 4) 5) 6) Privileged Communication 177 we need to use many sources of knowledge represented in computers to do our work, similar to many SUMEX users. We know kindred spirits in the AI community for many of our long range goals. We have substantial hardware and software expertise which we are willing to share. We are making a substantial effort in the practical application of such research and would expect to benefit society. This area does not have a traditional means of support for research separate from development which makes your support vital at this time. Qur area is very interdisciplinary and the communication aspects of SUMEX-AIM will be increasing valuable to us. J. Lederberg Section 6.4.1 COMMUNICATION ENHANCEMENT PROJECT C) Comments and suggestions for future resource goals, ete. In view of the fact that we are new members of the community, we do not have any special suggestions for new resource goals at this time. J. Lederberg ivi Privileged Communication AI IN PSYCHOPHARMACOLOGY Section 6.4.2 6.4.2 AL IN PSYCHOPHARMACOLOGY Artificial Intelligence in Psychopharmacology Jon F. deiser, M.D. Dept. of Psychiatry and Human Benavior University of California at Irvine I. Summary Research Program A. Technical Goals 1. Privileged Communication 179 We propose to construct a computer based system embodying some of the knowledge of an expert in clinical psychopharmacology. Such a system could greatly assist physicians and students who are not specialists in the chemotherapy of mental disorders in choosing the best psychopharmacological treatment for patients for whom such treatment is indicated. The system could also serve as a teaching source of psychodiagnostic and psychopharmacological knowledge. The specific aims of this project are: Oo To develop a set of MYCIN type rules which are a model of expert clinical teaching, consulting and decision-making for clinical psychopharmacology. o To implement this set of rules in the MYCIN system, and o To evaluate the performance of the resulting system as a teaching and consulting aid. No system currently available or under development approaches the goals of the project in the field of clinical osychopharmacology. It is anticipated that the research will .fall into two distinet phases each of approximately 138 months duration. The first and current phase involves evaluating the relevance of the structure of the HYCIN system for use in clinical psychopharmacology by replacing the current infectious disease diagnosis and therapy rules and parameters with psychopharmacology rules and parameters. The second phase will involve accumulating a large body of rules and entering them into the MYCIN system and evaluating their performance. Toward the end of this phase, the behavior of the system will be compared with the behavior of recognized experts workinz on the Adult Inpatient Psychiatric Service of the UCI Medical Center. This evaluation will focus on the adequacy of the system for representing the knowledge of a skilled psychopharmacolozist rather than an actual system performance in the clinical framework. J. Lederberg Section 6.4.2 J. AI IN PSYCHOPHARMACOLOGY B. Medical Relevance and Collaboration 1. Lederberg Medical Relevance a. For many years it has been well recognized that potent, effective psychopharmacological agents are frequently used in an unsystematic and irrational manner. The most prescribed medication in the United States today is diazepam (Valium), a minor tranguilizer. The first six most prescribed medications are all psychoactive agents. In California, instances of repetitive use of psychotropic drugs have been reported by 70% of a random sample of adults. About 30% of the sample had used psychotropic drugs in the preceding twelve months. Another study showed that 20% of a medical population was taking psychoactive agents at any given time. These figures do not include alcoholic beverages or non-prescription and illicit drugs with psychoactive properties. Many persons are advised to ingest a daily pharmacologic stew consisting of one or more neuroleptic agents, an antidepressant, an anti-parkinsonian agent, one or more tranquilizers, a hypnotic and possibly a psychostimulant. These regimens are often complicated by non-prescription remedies, alcoholic beverages and illicit drugs. The inevitable drug-drug interactions affect absorption, distribution, binding metabolism and excretion of many drugs. Each year Americans spend over $700,000,000 for psychotropic drugs. In a recent year $150,000,009 was spent on the anti- anxiety agent chlordiazepoxide (Librium). Between 20 and 25 million prescriptions are written each year for diazepam. It is estimated that 170,000,000 prescriptions for psychotropic drugs were written in 1967, and that 202,000,000 prescriptions were written in 1970, more than one for every person in the United States. About 17% of all orescriptions written are for psychoactive drugs. If we include medications in which a psychotropic drug is combined with an antispasmodic vasodialator, or other agent, probably 25% of all prescriptions contain psychotropic drugs. The vast majority of these prescriptions are written by physicians who are not psychiatrists. Many physicians, including psychiatrists, who are practicing today, completed their formal medical training prior to the 1950°s when modern psychopharnacological agents first became available. Their training typically includes no instruction in modern clinical psychopnarmacolosy. Even physicians trained since the mid~1950°s cannot be expected to keep abreast of the expanding and changing field of psychopharmacology. The principles and practices recommended a few years azo are rapidly becoming obsolete. A recent study showed that the general knowledge of the pharmacology, physiology, and side effects of psycnoactive medications was low in both psychiatrists and non- psychiatrists: less than 29% of the physician subjects were able to devise a psychopharmacologically rational dosage schedule for benzodiazepines. Fifty percent of the non-psychiatrist medical 185 Privileged Communication AI IN PSYCHOPHARMACOLOGY Section 6.4.2 2. a 1. Privileged Communication 181 staff felt that doses up to one gram per day of a tricyclic antidepressant, more than three times the recommended maximum and a potentially fatal amount, might be prescribed for depressive symptoms. d. We estimate that there are at least 25 diserete syndromes currently identified in clinical psychiatry, each of which has a unique hierarchy of pharmacological treatment. Each treatment in each section has its own set of potential side effects, adverse reactions and drug-drug, drug-host, drug-age and drug-state of health interaction. In addition, for each therapeutic regimen in each hierarchy, there are several classes of drugs which typically consist of more than one agent or combination of agents which are potentially beneficial and which can be preferentially ranked dependent on several other factors in the clinical situation. Medical Collaboration 1. The principal investigator, Jon F. Heiser, 4.D., is a physician who is board certified in psychiatry and in full time teaching, research and University service. 2. Three medical students have participated in this project to date: Clifford Risk, Dana W. Ludwig, and Sue A. Clear. 3. Two resident physicians have participated in this project: Bronco R. Rnadisavljevic, M.D., and Steven J. Smith, M.D. 4. A Doctor of Pharmacy participates in the research: Pierre J. Menard, Pnarm. D. C. Progress Summary Our initial endeavors to extend a MYCIN-like system to clinical psychopharmacology have been successful. None of the envisioned or predicted problems related to the vagaries of psychiatric terninology and diagnosis have materialized. The system apoears to work promptly and accurately to diagnose psychiatric conditions, and in a very rudimentary form, to suggest appropriate pharmacological treatment. The current system consists of about 20 clinical parameters and about 50 decision rules aimed chiefly at a small number of diagnosed psychiatric conditions. J. Lederberg section 6.4.2 AI IN PSYCHOPHARMACOLOGY The following is an example of a sanple consultation but realistic case: sample Consultation with HEADMED, a Psychopoharmacology Advisory Systea Patient History A 37 year old man was brought into the emergency room by his distraught family members. They described how the patient, a well-liked, successful and cheerful but reserved history professor, had become increasingly outspoken, short-tempered, and inattentive to accepted norms of social behavior over the preceding three weeks. An unprovoked argument and physical assault upon a long-time friend motivated the family to seek professional nelp. Results of a physical exam and routine laboratory studies were within normal limits. Consultation with the psychiatric service was requested. A first year psychiatric resident re-interviewed the patient and his family, and completed a formal mental status exam prior to obtaining the following consultation from the system. {Questions are generated by the system and appear in mixed upper and lower case, user responses are in boldface capitals and follow a double asterisk. Added commentary is indented in italics.] HHA RARE REAR Special options (type for help): #* AFB Hello, Dr. Bowen Do you wish to run the consultation progran # EYES 29-OCT-76 00:47:26 wee enn PATIENT~1--~------ 1) PATIENT’S NAME [The consultation begins with a few standard questions. ] *8 JOHN DOE 2) PATIENT’S AGE IN YEARS #8 37 3) SEX: #* MALE 4) IS John Doe ORIENTED Re N [Since the patient is disoriented, the system begins to Probe a little deeper .] 5) IS John DOE DISORIENTED-TO-PERSON RE y 6) IS Jonn Doe DISORIENTED-TO-PLACE #2 N 7) IS John Doe DISORIENTED-TO-TIME KR Y J. Lederberg 182 Privileged Communication AI IN PSYCHOPHARMACOLOGY Section 6.4.2 8) PLEASE TYPE IN THE ONE WORD WHICH BEST DESCRIBES THE THOUGHT PATTERNS OR THOUGHT DISORDER OF John Doe ** CONFUSED CONFUSED not a recognized response What is THE THOUGHT PATTERN OR THOUGHT DISORDER OF John Doe Expected responses are: NORMAL PROLIXIC TANGENTIAL CONFABULATION ETC> Enter HELP for list of user options. [The initial response was incorrect, so the user was reminded of some of the correct responses. He then requests a display of the entire list. Note that both of these capabilities are part of the core of MYCIN itself. Applying it to this new domain required more than supplying the relevant list of acceptable answers. ] RX 9 Expected responses are: NORMAL PROLIXIC TANGENTIAL CONF ABULATION RETARDED LOOSE CONCRETE AUTISTIC DELUSIONAL WORD-SALAD MANICY MUTE Enter HELP for list of user options. ®* TANGENTIAL 9) Has any biological relative of John Doe been evaluated or treated for a mental or psychiatric problem #8 N 10) What is the single word which best describes John Doe’s behavior He What is the best single word description of John Doe’s behavior Expected responses are: NORMAL AVERAGE UNREMARKABLE CATATONIC-EXCITEMENT ete. Enter HELP for list of user options. ®* HYPERACTIVE 11) What is the single word which best describes John Doe’s mood ** LABILE 12) WHAT IS THE SINGLE WORD WHICH BEST DESCRIBES THE EMOTIONS OR FEELINGS OF John Doe #* SLATED 13) Does John Doe tend to feel worse or be more symptomatic for the first several hours after awakening for the day *& WHY i.e. WHY is it important to determine whether or not PATIENT-1 tends to feel worse or be more symptomatic for the first several hours after awakening for the day} [4.0] This will aid in determining the DIAGNOSIS OF PATIENT-1. if [4.1] John Doe tends to feel worse or be more symptomatic for the first several hours after awakening for the day then there is weakly suggestive evidence (.45) that THS DIAGNOSIS OF john doe is depression [ RULEO 17 ] [All the standard explanatory capabilities of tne core system extend to the new domain without additional programming effort.] Privileged Communication 183 J. Lederberg Section 6.4.2 AI IN PSYCHOPHARMACOLOGY [back to question 13...] ¥# N 14) Does John Doe have early morning insomnia #e W 15) Please specify the number of pounds that John Doe lost or gained in the 2 months prior to seeking medical or psycniatric help. Estimate if necessary. Precede weight LOSS with a MINUS sign (for example: -~8 ~ a weight loss of (approximately) 8 pounds within the 2 month period prior to seeking professional help) EHS 16) Has psychological testing been ordered on John Doe ¥% N 17) ARE THE PERCEPTIONS OF John Doe NORMAL, i.e., NEITHER DISTORTED NOR DISORDERED #% N 18) DOES John Doe HAVE DEPERSONALIZATION eH ON 19) DOES John Doe HAVE EXPERIENCES OF ALIENATION and/or EXPERIENCES OF INFLUENCE #2 N 20) DOES John Doe HAVE INCOMPLETE AUDITORY HALLUCINATIONS and/or OTHER HALLUCINATIONS #2 Y 21) DOES John Doe HAVE COMPLETE AUDITORY HALLUCINATIONS and/or THOUGHT BROADCASTING #4 N : 22) what is John Doe’s level of consciousness Hx What is THE LEVEL OF CONSCIOUSNESS OF John Doe Expected responses are: HYPERALERT VIGILANT ALERT NORMAL etc. Enter HELP for list of user options. #2 VIGILANT 23) Is this the first episode of significant mental disturbance that john doe has had **% WHY fi.e. WHY is it important to determine whether or not this is PATIENT-1’s first episode of significant mental disturbance} [8.0] This will aid in determining THE TREATMENT NAME OF PATIENT-1. if [8.1] THE DIAGNOSIS OF John Doe is mania, and [8.2] this is not John Doe’s first episode of significant mental disturbance then there is strongly suggestive evidence (.9) that THE TREATMENT NAME OF John Doe is lithium { RULEO71] [back to question 23...] [The system has finished with the diagnostic phase, and asks a question to help determine the appropriate therapy.] a% Y J. Lederberg 184 Privileged Communication AI IN PSYCHOPHARMACOLOGY Seetion 6.4.2 {And finally the system presents its diagnosis and therapy recommendations. The numbers are used to give relative weightings to the conelusions: negative numbers indicate that the evidence tends to rule out that possibility.] Tne diagnosis for patient 1 is: ORGANIC-BRAIN-SYNDROME (.84) DEPRESSION (.37) MANTA (.35) SCHIZOPHRENIA (.2) PERSONALITY-DISORDER (-.4) NEUROSIS (-.4) So the treatment should be D. Up 1. 2. Privileged EVALUATION ( .84) ANTIDEPRESSANT (.3) NEUROLEPTIC (.28) Recently work has been directed toward expanding the system and revising the representation of psychiatric diagnosis and treatment recommendation. We have also begun development of a small system to score the Minnesota Multiphasic Personality Inventory (MMPI) psychological test using empirically well established rules easily coded into the MYCIN system. to date list of publications. No reports of this work have been published to date. Heiser, J.F. Computer-Aided Diagnosis of Psychiatric Patients. Presented to the Research Meeting, Sehool of Engineering, University of California, Irvine, 7 October 1976. Brooks, R. E. and Heiser, J.F. An Application of Artificial Intelligence to Psychiatry. Presented to: (a) Indian Institute of Technology, Madris, India, 28 September 1976, and (b) Madris Christian College, Madris, India, 3 October 1976. Heiser, J.F. and Brooks, R. E. Artificial Intelligence in Psycnopharmacology. Accepted for presentation at the VI World Congress of Psychiatry, Honolulu, Hawaii, 23 August - 3 October 1976. Communication 185 J. Lederberg Section 6.4.2 J. AL IN PS¥CHOPHARMACOLOGY E. Funding Status 1. 2. Lederberg Current Funding a. Personnel x. The principal investigator, Jon F. Heiser, M.D., co- investigator Ruven E. Brooks, Ph.=., and Pharmacist, Pierre J. Menard, Pharm. D., are full time employees of the University of California, Irvine. ii. Resident pnysicians are employees of the University of California Irvine and the Long Beach Veterans Administration Hospital and have worked on this project during elective periods of their psychiatric residency for which they received training credit. iii. Medical students working on this project either participated for academic credit during elective periods or were supported by National Institute for Mental Health fellowshios for medical student research. iv. Two undergraduate students (Tnomas E. Holthus, and Darryl Hansen) are also working on this project for academic credit during elective periods.) v. Additional supporting staff such as secretaries are supplied by the University of California Irvine. Office space, supplies and equipment ineluding several data terminals with acoustic couplers, are supplied by the University of California Irvine. lo other sources of funds are currently being used. Pending applications and renewals a. A joint grant application (in collaboration with Dr. Bruce Buchanan, Stanford University) has been submitted to the Department of Health, Education, and Welfare; Public Health Service. The University of California, Irvine (Dr. Jon Heiser, Principal Investigator) part of the application requests a total budget of $147,655 over three years to begin July 1, 1977, with $46,423 requested for the first year. An additional financial support for undergraduate student Thomas E. Holthus has been requested through funds allocated to University of California Irvine by tne National Science Foundation (NSF) to assist in the development of new research workers. 136 Privileged Communication AL IN PSYCHOPHARMACOLOGY Section 6.4.2 Ii. Interactions with the SUMEY-AIM resource A. Examples of collaboration and medical use of programs via SUMEX 1. As explained fully in the attached research grant application, the MYCIN group has been working informally with Dr. Heiser on the development of a knowledge base of decision criteria for psychopharmacology over the past two years. B. Examples of sharing, contacts, and cross-fertilization with other SUMEX-AIM projects (via workshops, system facilities, personal contacts, ete.) 1. Dr. Heiser’s introduction to the SUMEX-AIM project first occurred at the first AIM workshop held at Rutgers in June 1975. 2. Although Dr. Heiser had previously neard of the MYCIN project, his official collaboration with MYCIN resulted from discussions originating at the first AIM Worksnop. 3. A collaborative experiment with Kenneth Mark Colby, M.D., and members of the PARRY project was developed, implemented and analyzed completely on SUMEX-~AIM. Enclosed is a rouzh draft of a paper reporting this "Turing Test" which was performed on-line on SUMEX, with the psychiatrist-judges located at Irvine, the patient- person at UCLA and PARRY at SUMEX. 4. Much technical support has been received freely and continuously from the SUMEX staff and members of the MYCIN team, including basic instruction in the use of SUMEX, TENEX, and MYCIN, principles of knowledge representation in MYCIN, and on-going consultation for details of implementing HEADMED in MYCIN. Much information has been obtained during three visits to to SUMRYX and NYCIN, but daily work in this project would be impossible without the ability to converse via links, messages, and telephone conversations with members of the SUMEX and MYCIN staffs. C. Critique of User Services It is difficult for naive users to acquire the necessary Knowledge and Skills to function effectively in SUMEX without making a site visit to SUHEX. Ill. Follow on SUMEX grant period (8/78 ~ 7/83) A. Longs range user project goals and plans. It will probably take at least five years to achieve the aims mentioned in "IT.A." above. If limiting conditions in the application of MYCIN to Privileged Communication 187 J. Lederberg Section 6.4.2 AI IN PSYCHOPHARMACOLOGY the domain of clinical psychopharmacology are encountered, alternative systems for achieving the same goals may be developed. If progress is straight forward and completed in less than five years, attempts will be made to enrich the system with deeper models of the nature of psychiatric disorders and the action of psychopharmacological substances. Also, if the system works well using UCI psychiatrists and patients, consultation with a panel of national experts will be developed to increase the generality and power of the rule base. B. Justification for continued use of SUMEX by your project. We believe that this collaboration between the Stanford University MYCIN group, the University of Arizona infectious disease group, and the University of California at Irvine psychopharmacology group offers a unique opportunity to study the decision-making process in two domains: the chemotherapy of infectious diseases and the chemotherapy of psychiatric disorders. We believe that this methodological approach will markedly increase the potential range of applicability of our work, C. Comments and Suggestions for Future Resource Goals, and Development Effort. For those not geographically located at the SUMEX site, a stress on additional aid to users in the form of increasing the staff of user consultants, documentation writers, ete., would be preferable to an exclusive stress on additional hardware acquisition. J. Lederberg 188 Privileged Communication ORGAN CULTURE PROJECT Section 6.4.3 6.4.3 ORGAN CULTURE PROJ#CT Application of Computer Science to Organ Culture Professor Robert K. Lindsay and Dr. Maija “ibens The University of Michigan, Ann Arbor I) Summary of research program The goal of this research project is to develop new methods for the design and analysis of organ culture experiments, using techniques of artificial intelligence, The cultivation of organ fragments is an important method for the study of disease processes. In contrast to cell culture, organ culture is designed to. lahibit outgrowth of cells and to deal with normal tissue relationships as they exist in the body, divorced from the complexities or organ interaction. The technique involves the maintenance of differentiated cells as a group within their normally associated tissues. With an ability to maintain differentiated tissues in culture, a direct histologic and biochemical assessnent of factors influencing an organ is possible. Such a biologic model would permit investigation of the structural and functional effects of various substances directly on the target organ. With a chemically defined medium, the technique would allow a simultaneous evaluation of metabolites or hormones released by the organ fragments. The research is being done in collaboration with Professors Raymond Kahn, Theodore Fischer, and William Burkel of the Department of Anatomy, the University of Michigan Medical School. We have been working on methods of image analysis of microscope slides. This has been approached from two directions. On the one hand we are writing programs for special image analysis hardware. These programs will calculate various indices of the condition of the cultivated organ fragments based upon measured morphological features. The second approach is to translate the biologist’s verbal descriptions of microscope slides into computer data structures which encode conditions not detectable by our inage analysis programs, though readily seen and reported by trained human observers. We have developed a dictionary of anatomical terms and programs for morphological analysis. At present we are working on the syntactic analysis of the scientist’s verbal descriptions. A grant application titled "Application of Computer Science to Organ Culture" has been written and will be submitted to the National Institutes of Health on June 1, 1977. Current support is from the University of Michigan with computer services supplied by SUMEX-AIM. Privileged Communication 189 J. Lederberg Section 6.4.3 ORGAN CULTURE PROJECT It) Interactions with the SUMEX-AIM resource We have had valuable contacts with memoers of the DENDRAL project and the MOLGEN project, which share certain goals and methods with our own work. The resource services received from SUMEX-AIM continue to be excellent. The staff is very helpful, and the system is well-maintained and reliable. The only serious difficulties which arise are due to system saturation and limited file space. IIIf) Follow-on SUMEX grant period Our proposal, if funded, would commit us to expanding our efforts to develop a histology knowledge base and methods to rationalize the design of organ culture experiments. This would involve heavier use by a larger group of the SUMEX-AIM resource. Our work to date, though of limited scope, is encouraging. The work is dependent upon continued availability of the SUMEX-AIM system, which we would like to see expanded not only to provide more services for present projects, but to include a wider range of relevant bio-medical and artificial intelligence research. The commonality of resource and the opportunities for communication which SUMEX-~AIM provides are extremely valuable in our view. Given the community of resource consumers attracted py SUMEX-AIM, we think it would be an excellent focus for the encouragement of new techniques, new ideas in prograuming languages, and increased variety of input and output media. J. Lederberg 199 Privilesed Communication NEUROPROSTHESES PROJECT Section 6.4.4 6.4.4 NEUROPROSTHESES PROJECT Neuroprostheses Project M. G. Mladejovsky, Ph.D., Director Division of Artificial Organs University of Utah Medical Center Salt Lake City, Utah 34112 I. Research Summary Qur research involves the investigation of artificial vision by electrical Stimulation of visual cortex and artificial hearing by electrical stimulation of the cochlea. This effort has involved the collaboration of several people from many disciplines, not only from the University of Utah, but also from the Ear Research Institute, Los Angeles; University of Western Ontario, London, Ontario; and Columbia University, New York. The instrumentation involved is controlled by a minicomputer system consisting of a PDP-8 and a PDP-11/05. Experimental protocols are implemented by programs running. in the PDP-~11. We sought access to SUMEX in order to use the BLISS~11 compiler which runs on the PDP-10. We are using BLISS-11 as the implementation language for an interactive programming system which will enable more flexible control and variation of our experiments. The base language we are using is BALM (Malcolm Harrison, "BALM Programmer’s Manual", Courant Institute, NYU, 1974). This language is defined in terms of an abstract machine called the MBALM machine. The plan of attack is as follows: 1) implement the MBALM machine in BLISS-11 2) bring up BALM, using a dummy garbage collector and no virtual menory 3) implement garbage collection and virtual memory 4) add floating point operations 5) add a graphics package 5) add real~time capabilities 7) provide an interface to PDP-11 machine language The project has progressed to the point that step 2 is almost complete. This has involved installing a new version of BLISS-11 at SUMEX, writing software to allow file transfers between SUMEX and our PDP-11 (which is connected to the Utan-TIP as a terminal), writing MBALM and various support routines in BLISS-11, implementing an I/0 package for BALM in assembly language, and performing a bootstrapping process with the BALM self-definition. Our schedule calls for completing steps 3, 4, and 5 by 1 July 1977. Steps 6 and 7 have not been planned in detail at this time. Privileged Communication 191 J. Lederberg Section 6.4.4 NEUROPROSTHESES PROJECT We are planning to run the resulting programming system on our PDP-11/05 with 28K core, GT-40 graphics system, and running the RT-11 operating system. Modifying the system to run under a different operating system should be straightforward. However, whether the system will run efficiently on a machine with less than 20K core is questionable. It is too early now to say. There have been no new publications by our group since our application was filed last year. Currently several papers are in progress but have not yet been submitted for publication. A partial list of previous publications is attached. When the BALM system has reached a stable state, we will be happy to provide documentation. and sources for it to anyone who requests them. The support for our human experiments is provided by a grant from the Max C. Fleischmann Foundation. This grant expires 30 June 1977, and a renewal proposal is now being prepared. If. Interactions with SUMEX We have been perfectly satisfied with our use of SUMEX. By far our greatest use of the system has been of text editors and the BLISS~11 compiler. We nave also become acquainted through SUMEX with the OMNIGRAPH graphics package available from NIH and have obtained a copy of the OMNIGRAPH manual. We nave not used OMNIGRAPH yet but may wish to in the future. We are considering the features of OMNIGRAPH in the design of the graphics package for our interactive system. We are quite interested in using the MAINSAIL system being developed at SUMEX and have been told that RT-11 is one of the first operating systems under which it will be available. IIIf. Long-rance Plans Our plans for the period beyond July 1978 will depend to a large extent on results of experiments which have not yet been performed. Our use of SUMEX for the purpose of developing an interactive programming system will presumably be complete sometime in 1977. It is possible that future needs will require non. real-time access to a machine of greater capabilities than our PDP-11/05 and PDP- 8. IV. Publications Dobelle, W. H., Mladejovsky, M. G., and Girvin, J.P. Artificial vision for the Dlind: electrical stimulation of visual cortex offers hope for a functional prosthesis. Science, 183, 1 February 1978, 440-444. Dobelle, W. H., and Mladejovsky, H. G. Phosohenes produced by electrical stimulation of human occipital cortex and their application to the development of a prosthesis for the blind. J. Phsiol., 243, 1974, 553-576. J. Lederbere 192 Privileged Communication NEUROPROSTHESES PROJECT Section 65.4.4 Dobelle, W. H., Mladejovsky, M. G., Evans, J. R., Roberts, T. S., and Girvin, J. P. “Braille” reading by a blind volunteer by visual cortex stimulation. Nature, 259, 15 January 1976, 111-112. MlLadejovsky, M. G., Eddington, D. K., Evans, J. R., and Dobelle, W. H. A computer-based brain stimulation system to investigate sensory prostheses for the blind and deaf. IEEE Trans. Biomed. Eng., BMZ+-23, 4 July 1976, 286- 296. Mladejovsky, M. G., Eddington, D. K., Dobelle, W. H., and Brackmann, D. E. Artificial hearing for the deaf by cochlear stimulation: pitch modulation and some parametric thresholds. Transactions of ASAIO, 21, 1975, 1-6. Privileged Communication 193 J. Lederberg e we can} r Section 6.4.5 MATHEMATICAL MODELING OF PHYSIOLOGICAL SYSTEMS 6.4.5 MATHEMATICAL MODELING OCF PHYSIOLOGICAL SYSTEMS Mathematical Modeling of Physiological Systems John J. Osborn, M.D., Director Research Data Facility The Institutes of Medical Sciences San Francisco, California 94115 The overall goal of the Institutes of Medical Sciences’s collaboration with SUMEX is the application of computer technology to clinical medicine. Our efforts during the past year have been in the fields of knowledge based engineering and mathematical modeling. We are using our available computer based physiological measurement systens to provide the basis on which physiological interpretation is being developed using knowledge engineering, and to provide the data with which mathematical models are being developed using the SUMEX modeling facility. Project support: Granting Agency: NIH Grant Number: MB00134 Total period of the award: 3 years Current year: 3 Current funding: $45,570 Granting agency: NIH Grant Number: HR42917 Total period of the award: 3 years Current year: 3 Current funding: $198,839 BIOMEDICAL KNOWLEDGE ENGINEERING IN CLINICAL MEDICINE (KEMED) The KEMED system is conceived as an application of the discipline of heuristic based programming to the interpretation of measurements made in clinical medicine. The long range goal of the project is to do research on a biomedical knowledge-based system for interpreting the clincal significance of Physiological data. This interpretation will be used to aid in diagnostic decision making and the selection of therapeutic action. Even the best measurements often go unused because of the reasonable reluctance of clinical staff to make measurements whose results they only poorly understand and whose relation to clinical management is ambiguous. We will use techniques of biomedical knowledge engineering to extract and systematize the heuristic knowledge used by experts in the practice of their clinical art. These techniques will be used to construct and utilize a knowledge base to guide inference making by computer programs. J. Lederberg 194 Privileged Communication MATHEMATICAL MODELING OF PHYSIOLOGICAL SYSTEMS Section 6.4.5 The first program in the KEMED system is designed for interpretation of standard pulmonary function laboratory test data. A knowledge base was developed for interpreting the relationship between measured flows, lung volumes, pulmonary diffusion capacity and pulmonary mechanics and the standard diagnoses of pulmonary function. The knowledge base includes interpretation of measured test results and diagnosis of the type and severity of any pulmonary disease which may be present. The program is being developed as an extension to the MYCIN formalism, and it makes extensive use of the MYCIN structures and programming system. Funding has been requested to continue this work. MATHEMATICAL MODELING OF PHYSIOLOGICAL SYSTEMS Mathematical models of the cardio-pulmonary system are being developed to extract clinical physiological information from data acquired by the patient monitoring system. two approaches are being taken: 1) parsimonious models of the dynamic behavior of CO, following an increase in inspired oxygen concentration are being developed for automated patient monitoring application, and 2) a detailed model of the regional behavior of radioactive tracers in the lung is being used as a standard for evaluation of the previous models. The MLAB (Modelling Laboratory) program, available on SUMEX is being used extensively for model development by simulating hypothesized models and for data analysis, i.e., identification of model parameters from experimental data. The CO. dilution method has been applied successfully in the ICU and additional funding requested. Two new methods for measuring regional lung function with radioactive tracers have been developed where MLAB was essential and further funding has been requested. MLAB was used to perform an error analysis of the method for measuring regional pulmonary shunt fraction. Also, using MLAB model simulation to understand the complex dynamics of 133-~Xenon in the lung-tissue systen, a method for measuring intraregional ventilation/perfusion ratio maldistribution has been developed which significantly extends the sensitivity of previous methods. A model of the oculatory system is presently being developed on MLAB in collaboration with the Smith-Kettlewell Institute of the Visual Sciences. We anticipate that their model will be used in the future for treatment of patients witn strabismus, Interface with SUMEX We use SUMEX through the Tymshare network using a terminal. The text editing facilities of SUMEX, including both text editing and message sending, are excellent additions to our in-house facilities (PDP-11 based system). The message system is particularily useful for communicating ideas and questions with other colleagues using the SUMEX system. Our principal difficulty with SUMEX is turn-around time. Both the MYCIN amd MLAB systems are interactive, and the 30-59 second time response times associated with MYCIN and MLAB jobs are at best discouraging. We have a strong desire to develop in-house capabilities in artificial intelligence. We have already invested significant numbers of hours in developing competence with the MYCIN system, and we are confident of developing an extremely capable staff in heuristic programming. An in-house ATI computational capability is a more difficult capability to conceive. Developing Privileged Communication 195 J. Lederbers Section 6.4.5 MATHEMATICAL MODELING OF PHYSIOLOGICAL SYSTEMS artificial intelligence programming facility on a PDP-11 based system remains a Significant long-term interest. The satellite capability offers both the potential of not continuing to provide additional load on SUMEX, and it offers tne potential of more rapid interaction with the user. The SUMEX facility contributed to the following grant applications and articles: Requested Funding: 1) Biomedical Knowledge Engineering in Clinical Medicine (NIH) 2) Pulmonary Function in Acute Illness (NIBH) 3) Computer Laboratory for Clinical Support (NIH) 4) Improvement in Regional VA/Q Resolution (NIH, USAF, USN) thy Bibliography 1) Simulation to Relate Measured Gas Concentrations at the Mouth to Pulmonary Mechanics and Perfusion. J.C. Kunz, R.R. Mitchell, D.H. McClung, J.J. Osborn, Submitted to the 1977 ACEMB. 2) Identifiability of Pulmonary and Recirculation Parameters Fol~lLowing sequential Bolus Inputs of 133 Xe. R.R. Mitchell, R.J. Fallat. Submitted to the 1977 ACEMB. 3) Simulation of Intraregional Ventilation-Perfusion Ratio Mal-distribution. J.C. Glaub, R.R. Mitchell, R.J. Fallat. Submitted to the 1977 ACEMB. 4) Measurement of Residual Volume and Ventilation Distribution Using Helium and a Five Vital Capacity Breath Maneuver. R.R. Mitchell, Technical Report 32, Institutes of Medical Sciences, Feb. 1977. 5) Identification of Human Oculomotor System Parameters with Application to Strabismus. N.K. Gupta, A.V. Phatak, Systems Control; R.R. Mitchell, Heart Research Institute and Carter Collins, Smith-Xettlewell Institute, Institutes of Medical Sciences. Submitted to Joint Automatic Control Conference, 197.. J. Lederberg ivi o Privileged Communication PUFF/VM PROJECT Section 6.4.6 6.4.6 PUFF/VM PROJECT PUFF/VM ~ Pulmonary Function and Ventilator Management Project John J. Osborn, M.D. The Institutes of Medical Sciences (San Francisco) and BE. A. Feigenbaum Computer Science Department, Stanford University Note: The PUFF/VM project is the outgrowth of the efforts of Prof. Feigenbaum’s group at Stanford to establish new applications areas for AI in medical research. It represents a collaboration with Dr. Osborn’s group which has been working on another AIM pilot project titled "Mathematical Modeling of Physiological Systems". A PUFF/VM proposal is currently pending with NIH and and PUPFF/YM is being reviewed in parallel by the AIM Executive Committee for separate pilot status. 1. General Problem Measurements of patient physiology have become universally accepted as important parts of the delivery of clinical medicine. Good, useful measurements often go unused, however, because of the legitimate resistance of attending staff to using measurements which they poorly Understand. Thus, technology contributes to clinical medicine if: ~- It’s so useful, economical and easy to use that everyone can use it (e.¢ SHA-12, Brain scanner, Paps) -- It’s so useful, economical and has been around long enough that many people have been trained to use it (e.g.: ECG in ICU). The dissemination of new technology in clinical medicine is limited by the ability of the system of medical care delivery to accept and assimilate the interpretation of the results of the technology. Given that the technology is useful in knowledgeable hands, this rate of assimilation is related somewhat to cost, but more to the rate at which education progresses. The new computer axial tomography systems have been accepted rapidly (two neighboring hospitals near San Francisco made headlines when each tried to purchase $209,900 devices) because the measurements they make are useful, and they are readily interpreted by staff. A system of medical technology should: -- Hake clinically important physiological measurements; -~- Get data automatically, accurately [done often]; -- Recognize irrelevant data, poor data and artifact [rarely done]; Privileged Communication (197 Section 6.4.6 PUFF/VM PROJECT ~~ Interpret clinical significance of data in light of limitations of the data collection and analysis [almost never done]; -- Operate economically. Systematic interpretation of test data is both possible (if the problem has a restricted domain) and desirable (because interpretation will be consistent for all and usable without direct supervision of a specialist). 2. Objectives 2.1. Overall Objectives: Our immediate objective is to develop a computer programming system for interpreting the clinical significance of measures of pulmonary function. We hope to develop this system for diagnostic use in the pulmonary function laboratory and to aid diagnosis and ventilator management of respiratory insufficiency in the intensive care unit. We hope to demonstrate the clinical effectiveness of such a system for improving the accuracy and timeliness of diagnosis. Our long range goal is to develop an integrated system for making and interpreting measures of pulmonary function. We believe that this is possible because of the present and potential contribution of instrumentation and data analysis systems to the diagnosis and clinical management of pulmonary distress. We believe, in addition, that the discipline of knowledge-based heuristic programming is potentially the best basis on which to develop a system for automaticaly interpreting the results of the measures of pulmonary function. We aim, in the long run, to develop an inexpensive enoush implementation that the system will find wide acceptability in the delivery of clinical care. 2.2. Pulmonary Laboratory: Our objective for this project is to develop a heuristic program for interpreting the results of standard pulmonary function tests. The program will identify the need for repeated measurements because of poor patient effort; identify the need for additional information in order to make a more definitive diagnosis; report and explain the reasons for primary and secondary diagnoses and severity of any disease state; identify the relation between diagnosis and any referral diagnosis; interpret any chanze from previous tests or limitations on the interpretation because of the test methodology and the patient effort. We propose to: implement the system using a significant extension of an existing system of heuristic methods; extend the existing system to add new pulmonary disease diagnosis decision rules; develoo models for directing program execution, achieving faster performance, and detecting and interoreting the clinical situation in terms of any inconsistent data; facilitate model acquisition. J. Lederberg 198 Privileged Communication PUFF/Vil PROJECT Section 6.4.6 2.3. Intensive Care Unit (ICU): Gur objective for this project is to develop computer programs ror a system to interpret results of tests of pulmonary function in the hospital Intensive Care Unit. The program will interpret and explain the results of test measurements used to diagnose respiratory insufficiency; suggest initial settings for a ventilator for the patient with respiratory insufficiency; diagnose need for change in ventilation for the patient on a mechanical ventilator; and diagnose appropriateness of moving forward or back in the process of weaning the patient from the ventilator. We will implement the system using a new heuristic based interpretation system capable of interpreting continuous data from the changing patient situation. The system will allow goal-oriented and data-driven invocation of interpretation rules from the knowledge base. 2.4. Progress Evaluation: Our objective for this project is to conduct major evaluations of the direction and schedule of the above projects. These evaluations will be conducted near the end of the first and second years of the project. The evaluations will help assure the soundness of the computer science and the clinical investigations. Outside experts in clinical medicine and computer science will participate in the evaluation process. 2.5. Advantages of Collaborative Effort between IMS and the Stanford Heuristic Programming Project The collaboration offers a complementary blend of medical and computer science knowledge: ~- Clinically important problems: Interpretation of pulmonary measurements, both in lab and ICU. -- Auto data collection and analysis in pulmonary lab and in ICU using computer. Data has demonstrated value in clinical medicine; Well understood procedures for collection, interpretation, use of data. -- Having computer data collection, automated interpretation is logical next step. -- Use all power computer science has available; discard excess in application specific implementation after designing into implementation the important features. -- The SUMEX charter from NIH includes exporting artificial intelligence techniques (AI) to a larger community, and IMS is an excellent potential colleague. IMS has real clinical problems which can use AI effectively; biomedical engineering, statistics, and mathematical formulation of problems to contribute to AT; strong clinical orientation to give AI practical use. Privileged Communication 199 J. Lederberg Section 6.4.6 PUFF/VM PROJECT 3. A. Specific Aims Develop an integrated knowledge-based system for interpreting standard pulmonary function test results. Develop an integrated knowledge-based system for interpreting tests and observations used for diagnosis and treatment of respiratory insufficiency in the ICU. Conduct major project evaluations, using outside experts in clinical medicine and computer science, to review progress to date and to help identify promising directions for continuing research. To these ends, we will: 1. J. Lederberg 209 Develop a knowledge base for pulmonary function laboratory test interpretation, including rules to: ~- Interpret results from spirometry of diffusion capacity for CO; , body plethosmography and measurement ~-~- Diagnose the presence and severity of obstruction, restriction and diffusion defects; -- Diagnose the presence and severity of obstructive subtypes (asthma, bronchitis, emphysema); and -- Identify poor test results and the need for new information to make a more definitive diagnosis. Implement rules for pulmonary function test interpretation using a significant extension of the existing MYCIN formalism. Heuristic and mathematical models of "prototype" disease states will be used to: -~ Identify the presence of supporting and conflicting evidence for a primary interpretation; ~~ Interpret the clinical significance of measured data both in terms of measured data , the patient history, and expected values for the typical case; -- Recognize and interpret the significance of inconsistent data; and -- Direct the invocation of rules, thereby speeding program operation. Develop a knowledge base for interpreting tests and observations relevant to diagnosis and ventilator management of respiratory insufficiency: -- Interpret results of measurements of vital capacity, blood gases, respiratory pressures, volumes, Zas concentrations; hemodynamics; -~- Recommend procedure for setting up a ventilator for a patient; Privileged Communication PUFF/VM PROJECT Section 6.4.5 -- Diagnose need for change in patient ventilation; -- Identify indications for proceeding forward or back in the process of weaning from a ventilator; and -- Make interpretations in light of measured test results, patient history, record of therapies and results of therapies and observations. 4. Implement rules for interpretation of respiratory insufficiency data with a new heuristic interpretation system including the following major features: -- Forms time-dependent hypotheses about the patient state; ~- Infers desired courses of action based on measured patient state, observations, and expectations of future course. -~- Uses models, both heuristic and mathematical, for generating an expectation of the immediate patient course; 5. Create an advisory committee, including outside experts in clinical medicine and computer science, to review the progress to date. They will review conceptual formulations, system design, scope and detail of the Clinical knowledge and system operation. The advisory group will be asked to help to identify additional important considerations for the clinical knowledge base and the computer implementation, suggest improved ways to conceptualize or implement problems, and evaluate the soundness of the results to date. N. Significance Science advances by quantitation and development of general theories. The practice of medicine advances along one path by integrating quantitative measurements and general theories into the routine of existing clinical practice. The world of clinical medicine includes a complicated interaction among human patients, complex physiology, and proud, human clinical staff. This project is based on the assertion that good, quantitative measurements of physiological state are useful if effectively related to the human and physiological complexities of the clinical world. The vest possibility we see for making new quantitative measurements far more generally useful in clinical medicine lies in knowledge-based interpretation of well understood physiologically relevant measurements. The improved care of the sick patient is our objective. This project, if successful, will directly improve the ability of the clinical starr to properly diagnose and manage the patient with respiratory insufficiency. It will lay the foundation for extension of successful methodologies of interpretation of the general problem of interpreting measurements of physiological state. Privileged Communication 201 J. Lederberg Appendix I OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCE Appendix I OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH ARTIFICIAL INTELLIGENCE RESEARCH What is it? What has it achieved? Where is it going? Excerpt from a report by Professor Edward A. Feigenbaum Stanford University May 1975 INTRODUCTION In this briefing, these questions will be discussed as succinctly as possible: I. What is the scientific field of artificial intelligence research, as seen fron various viewpoints? What are the general goals of the field? If. What are its practical workins goals? What are some achievements relative to these goals (cirea 1973)? Til. What steps (new goals, problems, potential achievements) seem to lie ahead, within a five year horizon? ARTIFICIAL INTELLIGENCE (alias INTELLIGENT COMPUTER SYSTEMS): General View; Artificial Intelligence research is that part of Computer Science that is concerned with the symbol-manipulation processes that produce intelligent action. By “intelligent action" is meant an act or decision that is goal-oriented, arrived at by an understandable chain of symbolic analysis and reasoning steps, and is one in which knowledge of the world informs and guides the reasoning, Some scientists view the performance of complex symbolic reasoning acts by computer programs as the sine qua non for artificial intelligence programs, but this is necessarily a limited view. Yet. another view unifies AI research with the rest of Computer Science. It is an oversimplified view, but worthy of consideration. The potential uses of computers by people to accomplish tasks can be "one-dimensionalized" into a J. Lederberg 202 Privileged Communication OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH Appendix I Spectrum representing the nature of instruction that must be given the computer to do its job. Call it the WHAT-TO-HOW spectrum. At one extreme of the Spectrum, the user supplies his intelligence to instruct the machine with precision exactly HOW to do his job, step-by-step. Progress in Computer Science Can be seen as steps away from that extreme "HOW" point on the spectrum: the familiar panoply of assembly languages, subroutine libraries, compilers, extensible languages, etc. At the other extreme of the spectrum is the user with his real problem (WHAT he wishes the computer, aS his instrument, to do for him). He aspires to communicate WHAT he wants done in a language that is comfortable to nim (perhaps English); via communication modes that are convenient for him (including perhaps, speech or pictures); with some generality, some abstractness, perhaps some vagueness, imprecision, even error; without having to lay out in detail all necessary subgoals for adequate performance ~ with reasonable assurance that he is addressing an intelligent agent that is using knowledge of his world to understand his intent, to fill in his vagueness, to make specific his abstractions, to correct his errors, to discover appropriate subgoals, and ultimately to translate WHAT ne really wants done into processing steps that define HOW it shall be done by a real computer. The research activity aimed at creating computer programs that act as "intelligent agents" near the WHAT end of the WHAT-~TO-HOW spectrum can be viewed as the long-range goal of AI research, Historically, AI research has always been the primary vehicle for progress toward this end, though science as a whole is largely unaware of the role, the goals, and the progress. HISTORICAL TRACE The working Goals of the science; Progress toward those goals; The root concepts of AI as a science are 1) the eonception of the digital computer as a symbol-processing device (rather than as merely a number calculator); 2) the conception that all intelligent activity can be precisely described as symbol-manipulation. (The latter is the fundamental working hypothesis of the AI field, but is controversial outside of the field.) The first inference to be drawn therefrom is that the symbol-manipulations which constitute intelligent activity can be modeled in the medium of the symbol-processing capabilities of the digital computer. This intellectual advanee--which gives realization in a pnysical system, the digital computer, to the complex symbolic processes of intelligent action and decision--with detailed case studies of how the realization can be accomplished, and with bodies of methods and techniques for creating new demonstrations--ranks aS one of the great intellectual achievements of Science, allowing us finally to understand how a physical system can also embody mind. The fact that large segments of the intellectual community do not yet understand that this advance has been made does not change its truth or its fundamental nature. Privileged Communication 203 J. Lederberg Appendix I OVERVIEW Of ARTIFICIAL INTELLIGENCE RESEARCH Three global "working goals" have dominated the AI field for the 17 years of its existence, These are: 1. Understanding heuristic search as a processing scheme sufficient to account for much intelligent problem solving behavior; and exploring the scope and pervasiveness of heuristic problem solvinz. 2. Semantic information processing: developing precise formulations of "understanding" by programs, and "meaning" of symbols that are input or stored; the acquisition, storage, and deployment of knowledge of the world in the service of symbolic problem solving. 3. Information Processing Psychology: developing precise models of human behavior in synbolic-processing tasks. The first two goals represent the fundamental paradigms that have dominated the field. The third cuts across these orthogonally, and involves intense interdisciplinary contact with Psychology, and Linguistics. GOAL 1. HEURISTIC SEARCH, HEURISTIC PROGRAMMING, SYMBOLIC PROBLEM SOLVING PROGRAMS In the first decade, the dominant paradign of AI research was heuristic searen, In this paradigm, problem solving is conceived as follows: A tree of "tries" (aliases: subproblems, reductions, candidates, solution attempts, alternatives-and-consequences, etc.) is sprouted (or sproutable) by a generator, Solutions (variously defined) exist at particular (unknown) depths along particular (unknown) paths. To find one is a "problem". For any task regarded as nontrivial, the search space is very large. Rules and procedures called heuristics are applied to direct search, to limit search, to constrain the Sprouting of the tree, etc. While some of this tree-searching machinery is entirely task-specific, other parts can be made quite general over the domain of designs employing the heuristic search paradizn. Two notions are critical. The first is that problem solvers generally face a "maze" of alternative courses of decision and action that is huge compared with their processing resourees. The second is the use of heuristic knowledge to steer carefully through large mazes toward a solution seeking the plausible and potentially fruitful avenues, avoiding the absurdities and the high-risk paths. Heuristic knowledge is usually informal knowledge--to be distinguished from formal knowledge that is assertable with the rigor of proof. Polya, the famous mathematician who wrote Patterns of Plausible Inference and other books on problem solving, calls heuristic reasoning "the art of good guessing." Heuristic knowledze is often "common sense" knowledge of the world, rules-of-thumb for generally acceptable performance, or rules of good practice in specific situations. When we speak of the "expertise" of an expert, and the "good judgment" he brings to bear on complex problems in his domain, we often are speaking of the heuristics he nas developed to search effectively. J. Lederberg 204 Privileged Communication OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH ' Appendix I Provocative essays by Polya notwithstanding, the first serious and detailed studies of heuristic problem solving ever done by Science were done as AI research in its first decade. As with any other science, progress came by the detailed examination of specific cases, from which gradually emerged both a broad picture of the nature of the phenomena being studied and, within this, more formal theories for specific parts. Three sub-goals of heuristic programming are discernable. SUBGOAL 1A. Demonstrate sufficiency of heuristic search for tasks of intellectual difficulty. These heuristic programming efforts dealt with almost "pure" symbolic reasoning tasks (i.e., tasks not requiring much coupling to real-world knowledge), and used inference schemes that were either ad-hoc or of limited scope. Notable successes during this "prove-the concept” phase were: the Logic Theory Program, that proved theorems in Whitehead & Russel’s propositional calculus; the Geometry Theorem Proving program, that proved theorems in Euclidean geometry at a level of competence exceeding that of the excellent high school geometry student; the Symbolic Integration program, that Solved college freshman symbolic integration problems about as well as MIT freshmen; chess-playing programs that play respectable "club player" C or B Class chess; a checker playing program that was virtually unbeatable, except by the country’s top few players (notable also for remarxable self-improvement in performance by analysis of its own play and "book-move" good play); and a. number of competent management science applications (assembly-—line balancing, warehouse location, job-shop scheduling, etc.). To recapitulate briefly: the key concepts are: search in problem solving; and the use of generally informal knowledge to guide search effectively. Tne AI community was the first to devote serious scientific effort to developing the idea of the use of informal knowledge in problem Solving, with notable successes. Few in Science recognize that this achievement has been made and is ready for exploitation. SUBGOAL 1B. Generality in Problem Solving Programs Generality here means the use of a small set of problem solving methods of wide applicability to solve problems of many different types. Each of the problems posed is stated to the program in a particular representation (or framework) witn which the set of methods is constructed to handle. The subgoal of generality arises first as a reaction to the array of "specialty" programs mentioned above; second, from the general observation that the ability to do a wide range of tasks is a special touchstone of intelligence; third, from a direct assessment that as tne diversity and heterogeneity of the tasks handled by an agent increases, the likelihood that it can do them all without intelligent action decreases; and fourth, from tne argument that. any ultimate intelligent agent must have wide generality, since it must take the world and its problems as they come without any intermediary, maxing generality an important independent desideratum. Privileged Communication 205 J. Lederberg Appendix I . OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH This subgoal was pursued with vigor for ten years in a number of projects, was important for its feedback value in clarifying issues for the AI field, and has temporarily (at least) been put back on the shelf as the field begins to explore knowledge-based problem solvers and issues in the representation of knowledge. There were two discernable subthemes. The first was an attempt to create abstract heuristic search methods that were divorced from any particular content. Examples were: tne General Problem Solver, which used a variant of heuristic search known as mean:-ends analysis; MULTIPLE, which introduced adaptivity in the selection of what subproblem to choose "next" in a search; and REF-ARF, which extended the generality of ordinary procedural programming languages to include the embedding of non-procedural problems of constraint satisfaction. The second subtheme was the construction of theorem provers that take problems expressed as theorems to be proved in the first-order predicate calculus. This line of work was motivated by the (correct) observation that the scope for representing real-world facts and situations in first-order predicate calculus is very great; and by the invention of the resolution method, a computational method for finding proofs for theorems in this calculus. There has been continuous improvement on the basic method, taking the form of proposing more powerful inference techniques, rather than the form of specific ways for programs to adapt to particular poroblems. The very strength of the formulation in terms of generality, namely its complete homogenization of the particular task (all tasks are seen and dealt with in the same logical formalism) turns effort away from how to exploit the particularities of special classes of tasks. But it appears that only by exploiting the particularities can significant reduction in search be achieved. From a practical point of view the only proofs produced by such problem solvers were "shallow" proofs. Much of this line of research has been temporarily "shelved", awaiting further knowledge on how best to represent knowledge for computer processing. Problems that are essentially simple when represented in their "natural" representation appear extraordinarily complicated when translated into first- order predicate calculus. The current search for theorem provers using higher-order logics is based not on the attempt to increase the raw expressive power, so to speak, of first-order logic, but on the belief that naturalness of expression will ultimately pay off. SUBGOAL 1C: High-Performance Programs that perform at near-human level in specialized areas As the heuristic programming area matured to the point where the practitioners felt comfortable with their tools, and adventuresome in their use; as the need to explore the varieties of problems posed by the real-world was more keenly felt; and as the concern with knowledge-driven programs (to be discussed later) intensified, specific projects arose which aimed at and achieved levels of problem solving performance that equalled, and in some cases exceeded, the best human performance in the tasks being studied. The J. Lederberg 206 Privileged Communication OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH Appendix I example of such a program most often cited in the Heuristic DENDRAL progran, which solves the scientific induction problem of analyzing the mass spectrum of an organic molecule to produce a hypothesis about the molecule’s total Structure. This is a serious and difficult problem in a relatively new area of analytical chemistry. The program’s performance has been generally very competent and in "world’s champion" class for certain specialized families of molecules. Similar levels of successful performance have been achieved by some of the MATHLAB programs that assist scientists in doing symbolic mathematics. The effectiveness of MATHLAB’s procedures for doing symbolic integration in calculus is virtually unexcelled. Yet another example, with great potential economic significance, involves a program for planning complex organic chemical syntheses from substances available in chemical catalogs, The program is currently being used as an "intelligent assistant" in a new and complex organic synthesis. GOAL 2. SEMANTIC INFORMATION PROCESSING (S.1I.P.) The use of the term "semantic" above is intended to connote, in familiar terns, something like: "What is the meaning of..." or "How is that to be understood..." or "What knowledge about the world must be brought to bear to solve the particular problem that has just come up? The research deals with the problem of extracting the meaning of: utterances in English; spoken versions of these; visual scenes; and other real-world symbolic and signal data. It aims toward the computer understanding of these as evidenced by the computer’s “Subsequent linguistic, decision-making, question-answering, or motor behavior. Tous, for example, we will know that our "intelligent agent" understood the meaning of the English command we spoke to it if: a) the command was in itself ambiguous; b) but was not ambiguous in context; and c) the agent performed under the appropriate interpretation and ignored the interpretation that was irrelevant in context. /In this goal of AI research, there are foci upon the encoding of knowledge about the world in symbolic expressions so that this knowledge can be manipulated by programs; and the retrieval of these symbolic expressions, as appropriate, in response to demands of various tasks. S.I.P. has sometimes been called “applied epistemology" or "knowledge engineering". To summarize: the AI field has come increasingly to view as its main line of endeavor: knowledge representation and use, and an exploration of understanding (how symbols inside a computer, which are in themselves essentially abstract and contentless, come to acquire a meaning). To classify all of the current work into a relatively simple set of Subgoals is a formidable and hazardous undertaking. Nevertheless, here is one rough cut (stated for convenience as questions). Privileged Communication 207 J. Lederberg Appendix I . OVERYI2W7 OF ARTIFICIAL INTELLIGENCE RESEARCH A. How is the knowledge acquired, that is needed for understanding and problem solving; and how can it be most effectively used? B. How is knowledge of the world to be represented symbolically in the memory of a computer? Bi. what symbolic data structures in memory make the retrieval of this information in response to task demands easy? C. How is knowledge to be put at the service of programs for understanding English? D. How is sensory knowledge, particularly visual and speech, to be acquired and understood? How is knowledge to be applied to intelligent action of effectors, such as arms, wheels, instrument controls, ete. Significant advances on all of these fronts have been made in the last decade. The area has a rather remarkable coherence--with individual projects threading through a number of the goals stated above (this makes excellent science and difficult exposition!) GOAL 2A. Knowledge Acquisition and Deployment for Understanding and Problen Solving The paradigm for this goal is, very generally sketched, as follows: a. a situation is to be described or understood; a signal input is to be interpreted; or a decision in a problem-solution path is to be made. Examples: A speech signal is received and the question is, "What was said?" The TV camera system sends a quarter-~million bits to the computer and the question is, "hat is out there on that table and in what configuration?" The molecule structure-generator must choose a chemical functional group for the “active center" of the molecular structure it is trying to hypothesize, and the question is, "What does the mass spectrum indicate is the “best guess“?" b. Specialized collections of facts about the various particular task domains, suitably represented in the computer memory (call these Experts) can recognize situations, analyze situations, and make decisions or take actions within the domain of their specialized knowledge. Examples: In the CMU Hear-Say Speech Understanding System, currently the Experts that contribute to the Current Best Hypothesis are an Acoustic-Phonetic Expert, a Granmar Expert, and a Chess =xpert (since J. Lederberg 203 Privileged Communication OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH Appendix I chess-playing is the semantic domain of discourse). In Heuristic DENDRAL, the Experts are those that know about stability of organic molecules in general, mass spectrometer fragmentation processes in particular, nuclear magnetic resonance phenomena, ete. For each of the sources of knowledge that can be delineated, schemes must be created for bringing that knowledge to bear at some place in the on-— going analysis or understanding process. The view is held that programs Should take advantage of a wide range of knowledge, creating islands of certainty as targets of opportunity arise, and using these as anchors for further uncertainty reduction. It is an expectation that always some different aspect provides the toe-hold for making headway--that is » that. unless a rather large amount of knowledge is available and ready for application, this paradigmatic scheme will not work at all. Within this paradigm lie a number of important problems to which AT research has addressed itself: a. Since it is now widely recognized that detailed specific knowledge of task domains is necessary for power in problen solving programs, how is this knowledge to be imparted to, or acquired by, the vrezgrams? al. By interaction between human expert and program, made ever more snooth by careful design of interaction tecnniques, languages "tuned" to the task domain, flexible internal representations. The considerable effort invested by the AI community on interactive time-sharing and interactive graphic display was aimed toward this end. So is the current work on situation-action tableaus (production systems) for flexibly transmitting from expert to machine details of a body of knowledge. a2. "“Custom-crafting" the knowledge in a field by the painstaking day- after-day process of an AI scientist working together With an expert in another field, eliciting from that expert the theories, facts, rules, and heuristics applicable to reasoning in his field. ‘This was the process by-which Heuristic DENDRAL’s "Expert" knowledge was built. It is being successfully used in AI application programs to: diagnosis of glaucoma eye disease, to treatment planning for infectious disease using antibiotics, to protein structure determination using X-ray erystallography, to organic chemical synthesis planning, to a military application involving sonar signals, perhaps to other areas, and of course to chess. a3. By inductive inference done by programs to extract facts, regularities, and good heuristics directly fron naturally-ocecurring data. This is obviously the path to pursue if AI research is not to spend all of its effort, well into the 21st Century, building knowledge-bases in the various fields of human endeavor in the custon- crafted manner referred to above. The most notable successes in this area have been: Privileged Communication 209 Jd. Lederberg Appendix I OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH a. GOAL 2B. ..the Meta-DENDRAL program whicn, for example, has discovered the mass spectrum fragmentation rules for aromatic acids from observation of numerous spectra of these molecules--rules previously not explicated by the DENDRAL chemists. ..a draw-poker playing program tnat inferred the heuristics of good play in the game by induction (as well as by other modes, including the aforementioned interaction with experts). By processes of analogical reasoning, by which knowledge acquired about one area can be used to solve problems in a another area if a Suitable analogy can be drawn. Our human experience tells us that this approach is rich in possibilities. One successful project can be cited (and that is a limited success); a program that discovers an analogy (in full-blown detail) between a theorem-to-be-proved in modern algebra and another theorem in algebra whose proof is known. The analogy is used to pinpoint from a large set of facts those few that will indeed be relevant to proving the new theorem. Representation of Knowledge The problem of representation of knowledge for AI systems is this: if the user has a fact about the world, or a problem to be stated, in what form does this become represented symbolically in the computer for immediate or later use? Three approaches are being pursued: Bl. the approach via formal logic. As mentioned before, first-order predicate calculus was tried, but was found to be too cumbersome to répresent ordinary situations and common-sense knowledge. Set theory and higher-order logics are currently under examination as better candidates to be a medium for homogeneous representation. The ad-hoc approach. Most problem domains have a "natural” representation that human experts use when operating in the domain. Translate that representation fairly directly for the computer, and tailor the information processes to work with it. This is the approach commonly taken, in DENDRAL, MATHLAB, in chess playing programs, visual scene analysis, and so on (almost everywhere). Though it gets the job done, it creates serious problems for the cumulation of knowledge, techniques, and programs in the seience because of the inhonogeneity that arises therefroa throughout the collection of AI projects undertaken. One way out of the dilemma is to do re search on the problem of translation (by program) from one ad-hoc representation into another (the so-called "shift of representation" problem). Little work has been done on this problem, except one excellent "pencil-and-paper" exercise in connection with a simple puzzle, and one subvrogram in DENDRAL (the Planning Rule Generator, that translates mass spectral knowledge from its form as fragmentation processes to a form useful for pattern matching) . J. Lederberg 219 Privileged Communication OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH Appendix I B3. the approach via a "computable" semantic theory. In this approach, computational linguists attempt to analyze the full range of actions, actors, objects, and their relations, of which the common-sense world is composed; then refine and formalize these into a useable computational tneory for representing facts, utterances, problems, etc. The most successful of these efforts is the Conceptual Parser (and its follow-on, MARGIE, which successfully accomplishes English paraphrase and common- sense inference). In lieu of a tight, parsimonious computable semantic theory, other more ad-hoc systems, known as semantic-net-memory models, have developed experimenting with various sorts of actor-action-object-relation data structures. Semantic-net-memory models have a ten year history relating particularly to intelligent question-answering. Perhaps most successful of these is the HAM program which combines ideas from semantic theory, semantic- net-memory structures, and more traditional linguistic analysis (all in the context of a rather good model of human sentence comprehension, validated with dozens of careful laboratory experiments) . GOAL 2C. Programs for Understanding English One can readily observe that it will be almost impossible to disentangle the skein of research on understanding natural language (English) from the coordinate efforts in representation and deployment of knowledge. Most of the state-of-the-art programs for understanding Englisn employ, in one form or another, the basic S.I.P. paradigm outlined previously. These systems have substantial linguistic components that are highly sophisticated compared with anytning done in the past. All of them incorporate linguistic theory that has an intimate and continuous tie-in between gramnar "Experts" and domain. dependent "Experts", Although the domains about which they admit discourse are still modest and discrete, they are many times richer than anything done previously. The state-of-the-art is represented by the SHRDLU progran for conducting a dialogue with a simulated robot about a world of blocks, boxes, and pyramids on a table; and the Lunar Rocks program for conducting a dialogue about properties of and transformations upon NASA moon-rock samples. The SHRDLU program, for example, will carry out commands, answer questions, and generally be aware of what it was doing, so as to answer "how" and why" questions about its behavior. The internal structure of these systems exhibits an interesting evolution over the semantic-net-memory systems, and they appear to be a lone way from the heuristic search schenes mentioned earlier. They are essentially large programs written within a programming system that provides search and matching capability. There is no factorization between a data base (i.e., semantic net) and a small set of methods that process the data base, Rather, the entire system appears to be a large collection of special purpose prograns for dealing with a multitude of special cases. They give the appearance of being a highly distributed system, in which the intellizent action resides throughout the entire program. Privileged Communication 211 J. Lederberg Appendix I OVERVIEW OF ARTIFICIAL INTELLIGENCE RESSARCH GOAL 2D. Acquiring and Understanding sensory Data. The goal here is to discover broadly applicable methods for extracting from sensory data (chiefly visual and aural) the information that is specifically responsive to users” needs. Two classes of needs may be noted: the need to facilitate communication between man and machine; and the need to apply computers to intrinsically perceptual tasks. The former is exemplified by the desire to talk, rather than type, to computers; the latter is illustrated by the task of automatically guiding an effector on the basis of visual data. To satisfy either (or both) of these needs, it is necessary to move from well-understood problems of sensing data to much more difficult problems of interpretation. SUBGOAL 2D1. Visual Scene Analysis. Computer-based analysis of visual scenes has its roots in work on optical character recognition (early to mid- Fifties) and by work in automatic photoreconnaissance. These tasks are essentially two-dimensional. Little is lost by disregarding dimensions of objects in a direction orthogonal to the picture plane. AI research on scene analysis began in the early sixties with the work of Roberts on pictures of polyhedra. This work (and its intellectual descendants) differs from the earlier two-dimensional work in two major respects: first, it explicitly considers, and capitalizes on, the three- dimensional properties of objects and their perspective representations; second, it utilizes a variety of special processing steps and decision- making criteria, in contrast to the earlier template-match/classify paradigm, Robert’s work spawned five years of intensive research on pictures of collections of polyhedra. One theme, centered on the archetypical question "Is an edge present in a given (small) region of the picture?", led to the development of edge detecting, contour following, and region finding programs. A second theme, centered on teasing out the properties of polyhedra and their representations, led to an elegant theory of permissible representations of edges and vertices, and their relations to three dimensional polyhedra - a theory not previously discovered by projective or descriptive geometers. Work in the polyhedral objects domain culminated in several programs capable of describing, in more or less complete detail, pictures of complicated collections of polyhedra, even taking into account shadows cast by these objects. At the same time, more complicated types of scenes began to be seriously studied. This has led to current interest in the use of color, texture, and range data, and has stimulated interest in program organizations capable of capitalizing on these multiple perceptual modalities. For example, in one paradign perception is viewed as a problem- solving process that uses many varieties of knowledge to select perceptual operators, to guide their application to sensory data, and to evaluate the results obtained therefrom. J. Lederberg 212 Privileged Communication OVERVIEW OF ARTIFICIAL INTELLIGENCE RESSARCH Appendix I SUBGUAL 2D2. Speech Understanding. Research on computer recognition of speech signal data began in the Fifties with work on the recognition of isolated words. Some observations will be made here on the relation between speech understanding research and the on-going body of AI research, The fundamental idea driving research on speech understanding is that "recognition" is impossible (in flowing natural speech) without understanding, and that understanding is impossible without extensive Knowledge about the domain of discourse. This view arises in part from the observation that ambiguities and omissions at both the acoustic and semantic level do not arise as bizarre or pathological exceptions but instead are commonplace events. Speech understanding research thus relies heavily on progress in the basic AI research problems of knowledze acquisition, representation, and deployment. This situation in unlikely to change regardless of advances in processing acoustic signals. GOAL 2E. Intelligent control of Effectors This goal concerns the creation of devices and control programs for bringing about specified changes in the physical world. The effectors that have attracted the most attention have been mechanical manipulators and mobile vehicles but this has been largely a matter of experimental convenience. In principle, they could as easily have been subsystems of spaceships or manufacturing tools. Early work in “intelligent” effectors dates back two decades, but Systematic work did not begin until about 1965, at which time some progress nad already been made in developing symbolic problem solving programs to control effectors. Since then there has been considerable interest in computer-controlled effectors because problems of effector control excite a set of important issues for AI research. The following is a rough characterization of the subgoals of work on effector control: SUBGOAL E1. Monitoring Real-World Execution of Problem Solutions: The special touchstone of effector control research is that a problem is never "solved" until the real, physical world has been altered in a fashion that Satisfies the task specification (in contrast to other problem solving programs whose responsibility ends with the symbolic presentation of a good solution). Tnus, an effector control program should ideally be prepared to deal with any eventuality that affects the exeaution of a theoretically correct solution, be it initial misinformation, accidental dynaaic effects, ete. These demands strongly influence all levels of progran orzanization and strategy. Problem solving and execution monitoring must be made to interact intimately. The most advanced work of this type is probably the STRIPS-PLANEX system (for the control of a mobile vehicle) that can detect and sracefully recover from a wide variety of execution difficulties. Privileged Communication 213 J. Lederberg . So Aposndix I OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH J. Lederberg 214 SUBGOAL E2. Modelling "Everyday" worlds: To control effectors by computer requires that the computer have adequate models of everyday situations. It has become important to model occlusion, obstruction, relative location, etc., and tois has been done to the extent necessary to handle various siuple manipulation and locomotion problems. SUBGOAL E3. Planning in the face of uncertainty. Problem-solving programs for the control of effectors that operate cn the physical world must be able to work routinely with incomplete and inaccurate information. This creates a need to do research on programs that can form contingency plans, can plan to acquire information, can decide when to execute actions in the physical world, even if the plan is incomplete, and so forth. Some research of this type has been done. SUBGOAL EY, Low-Level Control. By low-level control is meant: programs that interact more-or-less directly with the effector mechanism, and that do not engage in global planning or problem solving. Research on this topic is producing a new and potentially important branch of classical automatic control. Although little has been formalized to date, enough experience has been acquired to permit the construction of interesting demonstrations. Among the most impressive af these is an arn control program that can drive the arm in partially constrained ways; for example, the arm can be made to turn a crank by dynamically constraining the necessary degrees of freedom. SUBGOAL E5. Hardware Development. The manipuletors available in 1966, wnether based on prosthetic limbs or industrial put-and-take machinery, were generally too primitive to be of long-term value for AI research. This situation fostered a fairly significant hardware development effort that produced a useful arm-hand device. Sinilarly, sensinz devices received sone development efforts. Examples of this work are newly developed optical range finders, and special tactile, force, and torque sensors. Privileged Communication OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH Appendix I GOAL 3. INFORMATION PROCESSING PSYCHOLOGY: DEVELOPING DETAILED SCIENTIFIC MODELS OF HUMAN SYMBOLIC PROCESSING BEHAVTOR. Since its inception, one focus of AI research has been the study of the symbol manipulation processes capable of explaining and predicting human behavior in a wide range of cognitive tasks. As science, the endeavor is entirely classical in intent and method, employing model construction and validation. Empirical data from well-controlled laboratory experiments is obtained from psychologists or generated by the researchers in their own laboratories. Induction from this data leads to the formulation of a symbol-processing model which purports to explain the observed phenomena. This model is given a precise form as computer programs and data structures (since the conputer as a general Symbol~processing device is capable of carrying out any precisely specified symbol-manipulation process; this step is entirely analogous to the model- implementation step taken by the physicist when he translates his physical model into the form of a set of differential equations). A computer is then used to generate the complex and remote consequences of the symbol-vrocessing postulates of the model for the particular laboratory situations and stimuli being studied. These consequences and predictions are tested against empirical data; differences are noted and analyzed; the model is refined and run again; iterations continue until a satisfactory state of agreement between model’s predictions and empirical data is achieved, From one point of view, the endeavor is to be seen as Theoretical Psychology. From another point of view, it can be seen as a systematic attempt by AI research to understand intellectual activity as it occurs in nature {i.e., in humans) so that artifacts capable of performing such intellectual activity can be constructed upon the principles discovered. The interplay between these two views has been very strong. Information Processing Psychologists have usually chosen their problems in areas that have been of "classical" coneern to Psychology, though some of these areas have been reopened to serious investigation because of the successes of the information processing approaches. The following are brief sketches of some subgoals of the effort in Information Processing Psychology. SUBGOAL 3A. Functional reasoning. Analysis and modeling has been done for human behavior in solving logic problems, complex erypt-arithmetic puzzles, and chess-play problems. The models, and the predictions derived from then, are so detailed that no comoarison with previous work on the psychology of problem solving is meaningful. The work is a scientific revolution, and has had a great paradigmatic and methodological impact upon Psychology. The principal innovators, Newell and Simon, have had their contributions recognized by election to the National Academy of Science; Simon was awarded the Distinguished Scientific Contribution Award of the Anerican Psychological Association, more or less the "Nobel Prize" of Psychology. ae Privileged Communication 215 J. Lederberg Appendix I OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH SUBGOAL 3B. Rote Memory and Short-term Memory phenomena: Storage and retrieval processes for short-term memory. Rote memorization effects. Discrimination and association learning for verbal materials. These and related phenomena of verbal learning and memory have been studied intensely by experimental psychologists in this century. A few dozen solid empirical generalizations are known. A set of closely related information processing models is capable of explaining many of these (roughly speaking, 15-20 of the "classical" phenomena). SUBGOAL 3C. Long-term associative memory: Associative retrieval fron associative memory nets of several hundred to a few thousand symbols. Interaction of English sentence processing and memory. The symbolic representation of knowledge (i.e., facts about the world) in memory. The work is currently very active, highly promising, and is causing a mini-revolution in thinking among psychologists who study memory. SUBGOAL 3D. Pattern induction/concept formation. Induction of models of pattern regularities in strings of symbols. Induction of the "senerating rule" from the exhibition of instances of the rule. SUBGOAL 3E. Phenomena of neurosis. The behavior studied is neurotic symbol- processing behavior, viewed as processing distortions of otherwise "normal" linguistic and problem-solving processes. A hizhly successful model of paranoid behavior has been developed, ineorporating some English language processing. These examples are but pieces of a bigzer picture, which looks something like this: 7. It is no surprise that Psychology nas been strongly affected by the information processing concepts and tools of AI research since both sciences are concerned with the study of cognition. The magnitude of the impact is the big surprise. It is probably fair to say that the dominant paradigm currently structuring Experimental Psychology in this country is the information processing paradigm. Upon no other area of science has AI research had such a strong impact. 2. The scientific study of human thought has been accelerated greatly during the last fifteen years because of the AI impact. It is not much of an overstatement to say that the AI impact has revitalized the study of thinking by Psychology, making this scientific enterprise tractable, fruitful, and respectable. J. Lederberg 216 Privileged Communication OVERVIEW OF ARTIFICIAL INTELLIGENCE RESBARCH Appendix I VIEW OF THE FUTURE: What lies within a five year horizon? An extrapolation of the research directions previously described into the future faces at least two problems. First, there are the usual uncertainties that loom because of unpredictable advances and wishful thinking. second, the imposition by ARPA of research priorities upon the course of events that would "normally" ensue will have a large effect. Tnus, the question of "what should happen" is as big a question as “what will happen." This exposition is made difficult by the fact that the structure of the field, as outlined above in terms of Goals, will show strong confluences during the future period. Any simple presentation goal-by-goal would be misleading, and was not attempted. Instead, each identifiable focus is stated and then given an extended discussion. The main thrusts of the Artificial Intelligence community in the next five year period will be: 1. Development of applications programs that represent and use knowledge of carefully delimited portions of the real-world for high-performance problem solving, hypothesis induction, and signal data interpretation. The next period is likely to be a period of consolidation of AI’s previous gains into meaningful real-world applications. High levels of competence in the performance of difficult tasks will be the hallmark. In addition to growing attitudes toward becoming more relevant, the AT community’s current major interest in knowledge structuring and use will naturally lead it to bodies of real-world knowledge tnat are rich in Structure and challenges. An extrapolation indicates applications to domains in science (much as the DENDRAL and MATHLAB programs were developed); and in medicine (current activity includes programs that deal with Infectious Diseases and with Glaucoma); perhaps more routine aspects of architecture (e.g., space layout and design); perhaps design in electronics (e.g., layout of IC and PC electronics, actual circuit design to functional specs); management science applications (e.g., logistics management and control, crew scheduling for aircraft fleets). The most Significant application will be to computer science itself, namely the automation of many programming functions (to be discussed later). Application to some of the less routine aspects of office document processing is a likely event (discussed later). With approvriate stimulus from ARPA, or other service agencies, these application priorities could be shifted toward defense problems, oarticularly those related to signal processing (e.g., application to seismic or sonar signal interpretation). In such applications, interpretation of what the signal means is made in terms of knowledge about the Signal-generating source and the environment in which the signal occurs. All of these applications will be characterized by careful choice of domain, careful delimiting of the extent of knowledge necessary to do the job, and close coupling with human experts to gain the knowledge necessary. None of these programs will be "general problem solvers" of the old genre. Characteristic of some of Privileged Communication 217 J. Lederberg 2. J. Lederberg 218 ix I OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH these aoplications will be one-line interaction with human experts, not only to "tune" the knowledge used by the program, but also to intervene in decisions for which human expertise doninates that of the program, or where the relevant knowledge nas not been made explicit and formalized for computer processing, The development, in particular, of that area of application involving the Synthesis of computer programs (the so-called "automatic programming" problem). The particular application of AI techniques to the task of synthesizing computer programs from imprecise and non-procedural descriptions of what a user wants a computer to do for him is the AI problem area whose time has come. This area will be the subject of a separate and detailed program plan. It is an AI application of tremendous economic, and industrial importance, since computer programming is today a major bottleneck in the application of computers to technological and business problems. What is worse, virtually no advances of substantial impact upon this problem have been made in the last decade in other areas of computer science (with the possible exception of the interactive editing, debugging, and running of programs). The automatic programming problem is, furthermore, the quintessential problem that fits the WHAT-TO-HOW characterization of the nature of the science of Artificial Intellizenee. It is the meeting ground of many of the tributaries of AI research: problem solving, theorem proving, heuristic knowledge and search, understanding of English (perhaps even speech), and advanced systems work. It is an ideal problem from the viewpoint of knowledge-based systems--the main line of current AI research. The essential activity in building such systems is the extraction and formalization of knowledge of the specific task domain. In the art of programming, computer scientists are their own best experts, and for years have been engaged in formalizing what is known about programming, mathematically and in other forms. Following this line of reasoning, the programming task that may be best suited is systems programming. An example of a specific systems programming task that may be accomplishable within the period is: develooment of an automatic programming system that will produce operating system code for a minicomputer like the PDP11/45, in response to functional specifications for instrument control and data-handling, wnera the snecs are given in functional terms by a scientist putting together the instrument-computer package, not his (until now inevitable) prozranner. The extension of current ideas about the processing and understanding of English to more extensive domains of discourse and with greater flexibility, to the point of practical front-end processors for large applications programs. Privileged Communication OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH Appendix I In the coming period, programs for understanding English in limited "universes of discourse" will achieve practicality, and will be made available as the linguistic interaction vehicle in some of the larger AI applications programs, e.g., the automatic programming systems mentioned above. Since these applications programs will be domain-limited anyway, it will not be an extraordinarily difficult task to construct for then front-end processors that understand fnglish in that domain. Since currently the field has only "demonstration programs" that exhibit (limited) understanding of English, much more research will be undertaken in these directions: examining how well current techniques extrapolate to broader domains of knowledge; developing techniques for establishing context of an interaction and maintaining that context throughout the conversation; and extending methods for drawing inferences from the continually updated context. Research on semantic theory, previously mentioned in connection with representation of knowledge, will be applied to specific problems of linguistic interaction involving actors, actions, objects, and common-sense knowledge. The area of language understanding is so rich in possibilities and implications that it is not unreasonable to consider developing a separate program plan for it within the next two years. Initial exploration of office-work tasks as an area of development and application; the careful choosing and shaping of specific tasks in this enormous arena of human endeavor; and some limited applications progress on these tasks. The AI research community has been searching for problem domains of Significance to science, technology, or industry that would provide an integrating theme for the various subareas of AI work. These subareas have a considerable coherence of concepts and techniques, but the centripetal force of a real-world thene is necessary to make this coherence a practical reality. Production assembly by combinations of vision, manipulation and problem solving programs is an attempt to establish such a theme. Increasingly the feeling is growing in the AI community that the development of "intelligent assistant" programs for ordinary office work is a.useful and important focus. There are two reasons for this. First, much of current AI research fits the task area well (e.g., semantic-net-memory structures, question answering programs, natural language understanding, "intelligent assistant" interaction programs, etc.). Second, the explosion of use of the ARPA network for “office work" tasks quite apart from computation (uses such as message processing, message and document filing, information retrieval from large data bases, composing and editing of documents, ete.) provides an excellent medium in which to do tne work. The AT community, perhaps with a push from ARPA, has the capability to do significant work on the office automation problem in the next period. A carefully thought-through program plan will probably be the first output of tne field in this area (should be organized and completed within the next two years), followed by initial exploratory ventures along the lines laid down in the plan. Privileged Communication 219 J. Lederberg Appendix I OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH J. Again, as with all the knowledge-based systems of this decade, the specific tasks worked upon will of necessity be carefully delimited. The general “intelligent office assistant" is well beyond the horizon, but specific assistant-programs for handling some of the office-work flow of information on the ARPA network can be realized within five years. Intensive developmental work on the speech-understanding problema. Expansion of computer vision research to: knowledge-based program organizations; development of a. repertoire of low-level perceptual operators for color, range and texture, and exploitation of these modalities; first practical applications of scene analysis to selected tasks in industrial and biomedical settings; and use of interactive scene analysis for both research and application purposes. seene analysis programs consist of a combination of sensing-and-measuring primitive perceptual operators (like line-finders) and higher-level knowledge-based procedures (like line-proposers). Because of general awareness of the limitations of current primitive operators (at least as they are applied to monochrome pictures), the research will place increased emphasis on the acquisition and low-level analysis of color and range data. Higher level procedures will use knowledge of: three- dimensional properties of objects other than polyhedral objects; perceptual properties of objects; many varieties of contextual constraints among objects; and properties of the orimitive operators (like computational cost, reliability, and domain of applicability). Practical applications will probably focus on industrial tasks like work-piece identification and location, inspection, and manipulator control. The scene analysis research issues in these applications may turn out to be pedestrian, but concerns about cost, reliability, and reprogrammability will become prominent. Biomedical scene analysis problems will continue to stimulate research; application to medical mass-screening tasks may occur. Interactive scene analysis will be an important focus. In research settings, interactive scene analysis will be used to construct large scene-analysis systems through the incremental accumulation of knowledge; in application settings it will be used to achieve flexible scene analysis systems that can be easily "re-programmed" by users who are not computer selentists. Lederberg 220 Privileged Communication OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH Appendix I 7. &xpansion of arm-hand effector technology and associated progran control, with some practical applications of simple forms of this technology in industrial settings. There will be considerable activity in the transfer of ARPA-initiated work on effector control to industrial settings. Hardware realizations of a rich variety of mechanical effectors, with their tactile, force, and torque sensors, will appear. Visual feedback in controlling effectors will be a feature of many of the applications. Basic research on the hardware and software technology of effector control will continue, if Support from ARPA or other agencies is forthcoming. More broadly-—based research on effector control is likely to be stimulated by the appearance of relatively inexpensive experimental hardware. Researchers who are currently unable to develop one-of-a-kind devices because of their cost will enter the field. 8. Expanded basic research on acquisition, deployment and representation of knowledge to support knowledge-based systems development. Though the main thrust of AI research is in the direction of knowledzge- based programs, the fundamental research support for this thrust is currently thin. This is a critical "bottleneck" area of the science, Since (as was pointed out earlier) it is inconceivable that the AI field Will proceed from one knowledge-based program to the next painstakingly custom-crafting the knowledge/expertise necessary for high levels of performance by the programs. In the next period, the following kinds of fundamental explorations must be pursued and strongly encouraged: a. é«iditional case-study programs of hypothesis discovery and theory formation (i.e., induetion prograns) in domains of knowledge that are reasonably rich and complex. It is essential for the science to see some more examples that discover regularities in empirical data, and generalize over these to form sets of rules that can explain the data and predict future states. It is likely that only after more case-studies are available will AI researchers be able to organize, unify and refine their ideas concerning computer-assisted induction of knowledge. b. Development of interactive interrogative techniques, eoupling a program to a human expert, by means of which the program systematically elicits from the expert particular facts, useful heuristics, and generalizations (or models) in the domain of the human’s expertise. Again, specific case-studies are desirable. Their development need not await the arrival of English languase understandins programs to facilitate the interaction and interrogation. (Stylized languages designed for the specific case-study domains will serve for now.) e. Exploration of a variety of methods for bringing together disparate bodies of knowledge neld by a program to assist in the solution of Privileged Communication 221 Jd. Lederberg Appendix IL OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH ° specific problems that the program is called upon to solve. The nature of this problem was discussed earlier under Goal 2A. If there are to be a number of Experts (i.e., specialized knowledge bases) interacting in the solution of a problem, how should their interaction be arranged? Is the an Executive Program "in charge" of sequencing the activity of the Experts? If so, what is the nature of the Executive Progran’s knowledge about each Expert, and the appropriateness of calling that Expert to assist at a specific point in the process? Should the Experts be relatively independent, each with its own situation-recognizer to trigzer its activity? These particular questions are posed here not in an effort to characterize the problem completely (or even adequately), but to give the flavor of the experimental inquiry tnat needs to be pursued in the coming period - a period in which major AI programming efforts will be directed toward knowledge-based systems with multiple sources of knowledge. d. Theoretical and experimental studies of representation of knowledge. This basic and difficult problem is not one that is likely to have a "solution" in a five year period. Theoretical studies will continue to search for a logical calculus in terns of which to formalize and store knowledge in a fairly "natural" way and for logical processors that will lism. Experimental studies will attempt to deal with the usual nonhonoseneity of representation among different bodies of knowledge directly, dy orogramming translations of representations from one "natural" reoresentation to another as necessary in those situations requiring comaunication between Experts for joint problem-solving. 9. Continuing basic research on various mathematical-logical problems such as formal models for heuristic seareh, theoren proving methods, and mathematical theory of computation. Because heuristic searen has been a central thene of AI problem solving research, it is likely that attempts at mathematical formulation and analysis of heuristic search methods will continue. No existing research thrusts indicate that this work should have high priority at this time. However, the situation is unstable in the sense that a few key results (e.g., new theorems or, more likely, new formulations of heuristic search) might cause a rush of activity along lines of formal analysis. A similar situation attends theorem-proving research. There are currently no critical ideas acting as a forcing function, but nonetheless the problem appears to some scientists to be central for progress in the long run. In their view, to state that mouter can be used as a "symbolic inference engine” is equivalent to saying that it is a "logic engine"; and what makes a "logic engine” turn over is a theorem prover over the domain of some logical calculus. The searen fer appropriate logical calculi and associated tneorem provers will therefore continue. » J. Lederberg 222 Privileged Communication OVEAVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH Appendix I The work in mathematical theory of computation has been peripheral to the AI mainstream, but recently has been gaining momentum and importance, and will enter the mainstream as basic research for automatic programming efforts. To write programs capable of synthesizing programs obviously requires a thorough understanding of the nature of programs. One kind of understanding is gained by formal description and mathematical analysis (the kind of understanding we take so much for granted in some physical Sciences and engineering), To the extent that useful formal descriptions of how programs are put together and what programs do can be discovered; and to the extent tnat powerful theorems can be proved within the formalism; the work on mathematical theory of computation could aid significantly in the practical work of constructing automatic program synthesizers and verifiers. Thus, there are noteworthy "breakthrough" possibilities in this area. A prediction of the most likely course of events in these tasks of formal analysis is that they will be low-key, low cost, high risk/high payoff. 10. Continuing research on modeling of human cognitive processes using information processing techniques. At the interface between AI and the psycholozsy of human perception and thought, the research tempo has been increasing for some time. In the coming period it is likely that new methodology, new conceptual insights, and new models will have a continuing dramatic impact on Psychology. The feedback to on-going AI research will continue to be important, particularly in the areas of perception and memory. The principal developments are likely to be these: a. Methodological: analysis by program of the thinking-aloud protocols of humans solving complex problems (i.e., "data reduction" that requires some language understanding and complex inductive inference), resulting ina speed-up in this critical empirical procedure of perhaps a factor of 100. A typical complete protocol analysis of human data in a puzzle-solving task currently takes, without computer assistance, 1990 hours. b. Short-term memory. The processes of human short-term memory will be so well modeled and understood as a result of research in this period that the topic will cease to be of major theoretical interest to psychologists. c. Long-term memory. A very good model of human long-term associative menory will be developed. The orogran which realizes this model will be given a great deal of "garden variety" knowledge of the everyday world, as the basis for empirical testing. Such a model will undoubtedly prove to be an important subsystem in larger programs that attempt language understanding in contexts involving common-sense knowledge. Only the beginnings of such a memory model exist today. d. Visual perception. The most important impact of AT on Psychology in Privileged Communication 223 J. Lederbers Appendix [ OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH the coming period may be the initial formulation on an information processing theory of human visual perception of common 3-D forms, along the lines of the visual processing concepts and operators developed by AT vision research, Af vision research stands on the threshold of Psychology awaiting an intellectual pusn like the one given to problem solving in late Fifties. If the push is made, and is successful, it will noticably dent the theory of visual perception in five years and totally capture it within ten years. J. Lederoerg Nh nN 42 Privileged Communication AI HANDBOOK OUTLINE Appendix II Appendix IT Al HANDBOOK OUTLINE NOTE: The following material is a tentative outline of a handbook on artificial intelligence planned for publication. It is not to be cited or quoted out of the context of this report without the express permission of Professor F. A. Feigenbaum of Stanford University. This handbook is intended for two kinds of audience; computer science students interested in learning more about artificial intelligence, and engineers in search of techniques and ideas that might prove useful in applications programs. Articles in the first seven sections are expected to apoear in the first volume to be published in preliminary form by September 1977. The remaining articles are expected to appear in the second volume to be published in preliminary form by June 1978. The following is a brief checklist that was used to guide the computer science students engaged in writing articles for the handbook. It is, of course, only a suggested list. i) Start with 1-2 paragraphs on the central idea or concept of the article. Answer the question "what is the key idea?" ii) Give a brief history of the invention of the idea, and its use in A.I. iii) Give a more detailed technical description of the idea, its implementations in the past, and the results of any experiments with it, Try to answer the question "How to do it?. iv) take tentative conclusions about the utility and limitations of the idea if appropriate. v) Give a list of suitable references. vi) Give a small set of pointers to related concepts (general/overview articles, specific applications, etc.) vii) When referring in the text of an article to a term which is the Subject of another handbook article, surround the term by +°S; e.g. +Production Systems+. AI Handodook Articles I. INTRODUCTION A. Philosophy B. Relationship to Society C. History D. Conferences and Publications Privileged Communication 225 J. Lederberg Appendix II AI HANDBOOK OUTLINE II. HEURISTIC SEARCH A. Heuristic Search Overview B. Searen Spaces QO. Overview / 1. State-space representation. 2. State-space search 3. Problem-reduction representation 4, AND-OR trees and graphs - C. "Blind" Search Strategies 1. Overview 2. Breadth-first searching 3. Depth-first searching 4, Bi-directional searching 5. Minimaxing 6. Alpha-Beta searching D. Using Heuristics to Improve the Search 1. Overview 2. Best-first searching 3. Hill climbing 4. Means-ends analysis 5. Hierarchical search, planning in abstract spaces 6. Branch and bound searching 7. Band-width searching E. Programs employing (based on) heuristic search 1. Overview 2. Historically important problem solvers a) GPS b) Strips ¢) Gelernter’s Geom. Program III. AI Languages A. Early list-processing languages B. Language/system features O. Overview of current LP languages 1. Control structures 2. Data Structures (lists, associations, 3. Pattern Matching in AI languages 4. Deductive mechanisms C. Current languages/systens j. LISP, the basic idea . INTERLISP - QLISP (mention QA) SAIL/LEAP - PLANNER CONNIVER SLIP POP~2 SNOBOL - QA3/PROLOGUE e . . OW ON WVU EW A ~aA J. Lederberg 226 Privileged Communication AI HANDBOOK OUTLINE IV. Representation of Knowledge A. Overviews 1. 2. Survey of representation techniques Issues and problems in representation theory B. Representation Schemes Predicate calculus Semantic nets -- Quillian, Hendrix, LN& Production rules Me RLIN Procedures (SHRDLU, actors, demons) . Frames . Componential analysis Seripts KRL . Multiple Knowledge sources ~ Blackboard Query languages - FOL V. SPEECH UNDERSTANDING SYSTEMS OW S 1. Wn 4 Ww Ph -. Overview (include a mention of ac. proc. Integration of Multiple Sources of Knowledge - The ARPA speech systems HEARSAY I HEARSAY II SPEECHLIS SDC-SRI System (VDHS) DRAGON VI. Natural Language A. Overview - History & Issues B. Representation of Meaning C. Grammars and Parsing 1. Review of formal gramnars 2. Extended grammars a. Transformational grammars b. Systemic grammars ec. Case Grammars 3. Parsing techniques a. Overview of parsing techniques b. Augmented transition nets, Woods ec. CHARTS — GSP D. Text Generating systems E. Machine Translation 1. Overview & history 2. Wilks” machine translation work fF. Famous Natural Language systems Privileg 1. Early NL systems (SAD-SAM through ELIZA) 2. PARRY 3. MARGIE 4. LUNAR 5. SHRDLU, Winograd ed Communication 227 Appendix If J. Lederberg Appendix ITI VII. Applications-oriented AI research (overview) A. Chenistry 1. Mass spectrometry - DENDRAL 2. Organic Synthesis — overview B. Medicine 1. MYCIN 2. Others C. Psycnology and Psychiatry 1. Protocol Analysis (Waterman and Newell) D. Math systems J. REDUCE 2. MACSYMA (mention SAINT) &. Business and Management Science Applications 1. Assembly line/ power distrib. F. Miscellaneous 1. LUNAR 2. Education - SCHOLAR SOPHIE - SRI computer-based consultation RAND--RITA production rule system Randevous - Query languages = WwW ’ NOW VIILT. AUTOMATIC PROGRAMMING A. Overview B. Program Specification Techniques C. Program Synthesis techniques 0. Overview 1. Traces 2. Examples 3. Problem solving applications to AP a. Sussman’s Hacker b. Program Synthesis by Theorem Proving 4. Codification of Programming Knowledge 5. Integrated AP Systems D. Program optimization techniques E. Programmer’s aids F. Program verification IX. THEOREM PROVING A. Overview B, Resolution Theorem Proving 1. Basic resolution method 2. syntactic ordering stratezies 3. Semantic & syntactic refinement C. Non--resolution theorem proving QO. Overview 1. Natural deduction 2. Boyer-Moore 3. LCF D. Us 1 Oo Ss of theorem proving Use in question answering J. Lederberg NM NM co AI HANDBOOK OUTLINE Privileged Communication AI HANDBOOK OUTLINE 2. Use in problem solving 3. Theorem Proving Languages 4, Man-machine theorem proving BE. Predicate Calculus F, Proof checkers X. Human Information Processing -- Psycholosy A. Perception B. Memory and Learning 1. Basic structures and processes in IPP 2. Memory Models a. Semantic net memory models b. HAM (Anderson & Bower) ec. EPAM d. Productions (HPS) e. Conceptual Dependency Psycholinguisties D. Human Problem Solving O. Overview 1. PBG’s 2. Human chess problem solving &. Behavioral Modeling 1. Belief Systens 2. Conversational Postulates (Grice, TW) 3. PARRY QO XI. VISION A. Overview B. Polyhedral or Blocks World Vision 1. overview 2. Guzman 3. Falk 4Y. Waltz C. Seene Analysis Overview - Template Matching Edge Detection Homogeneous Coordinates Line Description Noise Removal . Snape Description Region Growing (Yakamovsky, Olander) Contour Following Spatial Filtering 11. Front. End Particulars 12. Syntactic Metnods 13. Descriptive Methods D. Robot and Industrial Yision Systems 1. Overview and State of the Art 2. Hardware E. Pattern Recognition 1. Overview 2. Statistical Methods and Applications . OOnN DW Ew hr = —_ Oe Privileged Communication 229 Appendix IT J. Lederberg Appendix II 3. Descriptive Methods and Applications F. Miscellaneous 1. Multisensory Images 2. Perceptrons AIL. ROBOTICS Overview . Robot Planning and Problem Solving Arms Present Day Industrial Robots Robotics Programming Languages ras teal kei Hi moat > CTA OH TAM VO aAW PS Learning and Inductive Inference Overview | Samuel Checker program Winston -- concept formation Pattern extrapolation problems--Sizon, Overview of Induction AQVAL (Michalski at U.I11) - Parameter adjustment of linear functions Rote learning D.A. Waterman’s machine learning of nauristics Learning by debugging Learning by parameter Adaptation Signature & move phase tables XIV. Reasoning and Planning A. Reasoning by analogy 1. Overview 2. ZORBA B. planning 1. NOAH 2. ABSTRIPS J. Lederberg 239 AI HANDBOOK OUTLINE Privileged Communication SUMMARY OF MAINSAIL LANGUAGE FEATURES Appendix ITI Apoendix Lil SUMMARY OF MAINSATL LANGUAGE FEATURES MAINSATL LANGUAGE FEATURES Clark R. Wilcox Stanford University Portable ALGOL-like language with dynamic memory support MAINSAIL is an ALGOL-like language with dynamic memory support for strings, arrays, records, modules and files. The driving force behind its design is that it provide for the development of portable software. At the same time, low-level features allow the programmer to deal with the underlying representation of data aggregates. These low-level features have made it possible for most of the runtime system to be written as MAINSAIL modules. Intended applications MAINSAIL is not oriented toward any particular application. The flexible use of memory makes it suitable for tasks with memory requirements which are difficult to predict prior to execution, as is often the case with knowledge coppesentation Tne string capabilities facilitate word proa essing applications uch as compilers, text editors and document preparation, and "friendly" interactive programming. These same facilities require runtime support, so that a MAINSAIL program is not a stand-alone body of code, and thus may not be appropriate for some primitive system utilities. Portability A primary goal is that compatible implementations be provided ona variety of computer systems. Programs which are written for portability snould be able to execute on any of the implementations with the same effect. Such programs must. adhere to reasonable constraints with regard to data and memory ranges, as described in the language manual. Programs wnich violate these constraints are not considered portable, and thus may behave differently on different implementations. This design for portability raises a nunber of questions with regard to how well MAINSAIL will fit any varticular machine. It is too early to provide a conclusive answer to such concerns, though it aonears that many machines will efficiently support MAINSAIL implementations. Modularity In addition to the more opvious effects the machine-indevendent design has on data types and operations, it also necessitates a model of runtime interactions which can be supported on a broad range of computers. In particular MATNSAIL must be able to execute in a linited address space, which means that programs must be broken into pieces (modules) which need de in memory only when executing. The inability to characterize linkaze and overlay systems in a Privileged Communication 231 J. Lederberg Appendix III SUMMARY OF MATNSATL LANGUAGE FEATURES machine-independent manner has forced MAINSAIL to take over these functions, and thus assume duties often considered part of the operating system. A MAINSAIL program consists of an open-ended collection of modules, i.e., the programmer need not specify what modules make up a program. The modules may originate from many files at execution, as contrasted to the common approach of having a single "save file" or "load module” which may contain an overlay structure. Tne modules are compiled separately and assembled into a form which does not require linkage prior to execution. MAINSAIL resolves all inter-module references at runtime. Modules are automatically brought into memory as needed. If there is insufficient room in memory for an incoming module, MAINSAIL automatically swaps out one or more resident modules to make room. This swapping could involve i/o to an external device or memory mapping. Modules are position- independent, i.e., they do not contain references to fixed memory locations. Tous they may be moved about during execution, and need not be swapped into the Same memory locations from which they were swapped out. This generalization of the traditional overlay structure will make possible the implementation of Sizeable programs in a limited address space, while at the same time utilizing the minimum possible memory on larger systems. Range of data types In order to allow efficient operation on machines with a small word size, yet access to large values when necessary, MAINSAIL offers both short and "long" data types: integer, long integer, real, long real, bits and long bits. In practice the long forms are used much less frequently than the short forms, and thus can be simulated if necessary with no major degradation in efficiency. These data ranges have been chosen ta fit the range of machines for which HAINSAIL is intended. Strings A MAINSAIL string is a variable length sequence of characters. The programmer does not need to specify a maximum length for a string as is common in many languages. Instead, MAINSAIL keeps track of the current number of characters in a string and automatically handles storage allocation. Most. existing general-purpose languages have omitted a full implementation of Strings, apparently under the assumption that they could not be efficiently implemented, and were dispensable. However, the hardware design trend is toward microprogrammed instruction sets which support string operations, in view of the increasing acceptance of computers for word-processing. Classes, records and pointers MAINSATL employs a general notion of "class" as a eollection of data and procedures fields. Classes serve two purposes: they specify the interfaces through which modules communicate with one another; and they are used as templates for the creation of and access to records. 4 record is a dynamically allocated memory area which contains data corresponding to the fields of the class to which it belongs. The fields of a record are accessed by means of a pointer to the record, combined with the name of the Field. The pointer must have been associated with the record’s class when it was declared. J. Lederberg 232 Privileged Communication SUMMARY OF MATNSAIL LANGUAGE FEATURES Appendix III The notion of "prefix class" was introduced to provide for a hierarchy of classes, A class which is declared with a prefix class is conceptually made a member of the prefix class, and inherits the fields of the prefix class as its initial fields. For example, the concept "doubly-linked list" may be represented as a class with two pointer fields, say "left" and "right". Any other class will automatically inherit these two fields if it is defined as a doubly-linked-list class. Tne language contains rules which govern the use of pointers according to the relationships between classes and prefix classes. MAINSAIL provides for secure use of pointers in the majority of cases, but allows insecure operations if desired. Arrays MAINSAIL’sS implementation of arrays is quite flexible in that it allows the programmer full control over the creation and disposal of arrays. This is to be contrasted with classical ALGOL, where array allocation is tied to block Structure. An array is actually a pointer to a record, and thus is allowed many of the same constructs provided for pointers, such as assignment, equality comparison, and parameter passing. An array may be a field of a class, so that any number of records may be allocated which contain array fields. This capability is particularly useful in image processing, where flexible array allocation can significantly simplify program logic. Procedures Procedures play a major role in MAINSAIL. Procedures may be typed for use in expressions. There are three simple paraneter passing mechanisms: USES passes tne value; PRODUCES passes a value baek to the caller; and MODIFIES passes and returns a value. Optional arguments, repeatable arguments, and generic procedures provide useful syntactic constructs. Any procedure may be invoked recursively. Otaer procedure characteristics are COMPILETINS (if all arguments are constants, the procedure is evaluated during compilation), INLINE (produces "in-line*™ code), and CODED (supports assembly language eodins). Embedded assembly laneuase A number of facilities support the use of assembly language within a MAINSAIL program: CODED procedures, the Code statement, and the various forms of encoding variable offsets. Of course assembly language cannot aopear within a machine-independent program, but nevertheless there are many instances when the target machine is known. The MAINSAIL interface to each operating system makes extensive use of the assembly language facilities. Compiletimne support Most. present-day compilers were designed to work in a sequential access mode, and suffer from the resulting limitations. The MAINSAIL compiler was designed with the understanding that the souree files would be on random-access devices, so that it need not progress throuzh the file in a strictly linear fashion. Any number of nested input files are allowed, in fact the same file may be scanned several times during compilation (contrast this with a conpiler designed for input from puncned card decks). Privileged Communication Po LA ta J. Lederberg Appendix IIt SUMHARY OF MATHSAITL LANGUAGE FEATURES Compilation involves interaction with the user in. that the programmer can put messages in a source file which are displayed during compilation. The user ean affect the course of the compilation by specifying the names of files to be compiled as requested by directives within the file being compiled, and by defining values which govern the scanning of the source text. The compiler has the ability to quickly search through a file for the text to be compiled as specified either by earlier source text, or interactively by the user. This allows a single file to be made a repository of fragments of source text needed during many different compilations, and quickly searched during a particular compilation. Conditional compilation allows an arbitrarily complicated expression (ultimately made up of constant operands) to be evaluated by the compiler to determine whether a particular segment of the source file is to be ignored. In general, the compiler will evaluate all expressions involving only constant operands (of type boolean, (long) integer, (long) bits, and string) and conpiletime procedures. These facilities are quite important when building a large parameterized system. A save and restore facility allows the current state of the symbol table to be saved. It may be restored during a later compilation to avoid recompiling unchanged text. This is particularly useful for the development of a collection of modules all of which utilize one or more comnon "header" files. A comprehensive macro facility provides for the definition of constants, arbitrary text, and arbitrary text with parameters. Many commonly used constants are predefined, especially as needed by the system procedures to Simplify passing of bits parameters consisting of predefined "flags", File system A Simple yet powerful file system has bean designed which, like all features of MAINSAIL, is guaranteed on every implementation. When a file is opened for use, the progran specifies whether it contains text or data (binary), and whether access is sequential or random. A fundamental assumption is the ability to communicate with a controlling terminal, called the tty ("teletype"). For example, error message are output to tty, and a response is expected. J. Lederberg 224 Privileged Communication MICROPROGRAMMED MAINSATL PLANS Appendix IV Appendix IV MICROPROGRAMMED MATNSATL PLANS Plans for a Microprogrammed Implementation of MAINSAIL Clark R. Wilcox stanford University In this appendix we shall discuss our plans for a microprogrammed implementation of MAINSAIL. The goal of this research is to determine the feasibility of distributing a cost-effective integrated hardware-software programming environment. A computer which operates under the control of a microprogrammable control store offers a new approach to efficient program execution which we summarize below. We feel this approach could offer the means of developing reasonably~priced computing resources with the capability of executing programs which are too demanding for present mini-computers. It appears that such machines may be widely available within a feu years, We propose to purchase the necessary hardware to enable us to develop a microprogranmed MAINSAIL implementation. fhe emulation approach to high-level language implementation Traditional implementations of high-level language involve translation to the fixed machine languages of the target machines. Such machine languages have not been designed for the efficient representation of nigh-level languages, with the result that an excessive number of overhead instruetions are required to map the high-level language into its directly-executable machine code "surrogate", Witn the advent of microprogrammable computers with writable control stores, a different approach appears to have great promise for the efficient execution of high-level languages. A micro-coded computer executes the instructions in main memory under control of the micro program. Taus the machine code may be viewed as data which is interpreted, or emulated, by the micro program, rather tnan as direct signals to the hardware. The micro program is written in a more primitive machine code called micro code, which (usually) directly controls the hardware. Most micro- coded computers have been designed for the emulation of a particular machine code, and thus the wicro-code is simply a means of reducing the complexity of the hardware while perhaps providing a "higher-level" machine code. The micro-code is placed into a high-speed memory (relative to main memory), so that many micro instructions can be executed in the time it takes to fetch a Single instruction from main memory. The same technique of interpreting a particular machine code with a micro program can be broadened to the ability to interoret an arbitrary machine code. Such a micro computer is called a "soft" machine, or “universal host", since it is not oriented toward any particular machine code. instead, the language implementor chooses a suitable machine-code representation. A compiler is constructed which translates into this representation, and a micro program is Privileged Communication 235 J. Lederberg Appendix IV MICROPROGRAMMED MAINSAIL PLANS written which interprets the representation. This approach is known as a "directly executable language", or DEL, since the high-level language has been translated into a form tailor-made for it. The unnecessary overhead instructions are eliminated, with a resulting decrease in program representation and increase in execution speed. There is evidence [3,4,6] that this approach can provide suostantial dividends. A MAINSATL Directly Executable Language (DL) We propose to design a MAINSAIL DEL and implement it on a microprogrammable computer. The goal is to evaluate the economic and technical advantages of exporting a combined hardware-software environment for program development and distribution. In particular, we want to orient MAINSAIL’s design and implementation toward such an emulation approach and compare the resulting "MAITNSAIL machine" with conventional implementations. We are interested in determining whether a "soft" machine of this sort can be provided cheaply enough to serve as a basis for the distribution of software which presently requires expensive hardware facilities. Hardware which can be specifically tailored for high-level language execution may provide the quickest route to the economically viable distribution of programs which exceed the limits of present general-purpose mini-~computers. This work will complement the on-going implementations of MAINSAIL on conventional hardware. Thus we will be in a unique position to compare the two approacnes. We expect the MAINSAIL DEL to outperform other MAINSAIL implementations in much the same way that DELtran (a DEL for FORTRAN IT) outperforms FORTRAN II [3]. Initial measurements show that the DELtran representation is less than one Fifth the size of the code generated by the PORTRAN-H optimizing compiler, and executes about five times faster. MAINSATL is perhaps better suited to the emulation approach than FORTRAN oecause of the locality of reference provided by procedures, records and modules. A preliminary DEL has already been designed for MAINSAIL, but further work is necessary before we can predict (or demonstrate) size and execution comparisons with standard implementations. There is much work to be done in determining the efricient representation of ALGOL-like lanz guages for the purpose of emulation, and providing data from actual implementations, A MAINSAIL DEL could provide faeilities which an efficient manner on conventional machines. These monitoring of the program during execution. Sines ee program written in micro code, it can be made to perform any kind of execution- time checks with no need to alter the DEL. By contrast, tne MAINSAIL compiler must generate different code depending on the amount of checking to be performed. inpossible to provide in acilities relate to the emulator is simply a ran) a The emulator can also provide execution profiles and comprehensive debugging facilities such as instruction traos and single stepping. We expect to provide several emulators which are oriented toward particular types of execution, e.g. a "fast" emulator which maxinizes execution speed, a "eareful" emulator which provides comprehensive runtime chec! a "performance monitoring" emulator which gathers information concerning prog execution, anda "debugging" emulator which allows interactive debugzing. J. Lederberg 235 Privileged Communication MICROPROGRAMMED MAINSATL PLANS Appendix IV Another advantage of the emulation approacn is the simnlifications in the compiler. Since the compiler will translate MAINSAIL to its own DEL, the code generators become almost trivial. MAINSAIL operations whicn require many instructions on existing machines can be compactly represented with the DEL. The compiler need not worry about register optimization since there will be no registers in the DEL representation. Since the MAINSAIL DEL is a close representation of the source code, there is no reason to "drop into assembly language" since any "sensible" program which could be written in the DEL could more easily be written in MAINSAIL. Hardware support To support this development, we propose the purchase of a dynamically micro-programmable machine with such supporting hardware as is necessary. This machine should be a universal host in the sense that it is not already oriented towards a particular machine code. Its software support is of little consequence Since we will design our own operating system and high-level language support. we are interested in implementing sophisticated programs, and thus require a large address space (say 24 bits) and 32-bit arithmetic. We need sufficient control store, say 16K words, to support a debugging enulator and selected parts of the operating system. The micro store must be able to quickly transfer words to and from main memory, in particular we want to be able to quickly switch emulators, There must be facilities for interface to a variety of peripherals, and to other computers, There are some machines now available along these seneral lines (e.g. f1]), with the introduction of more imminent. Indeed, manufacturers are beginning to include user-microprogrammable features with new models of their traditional hardware, e.g. Digital Equipment Corporation’s PDP-~11/50 and Data General’s Eclipse. One such machine, EMMY, has been developed by the Stanford Emulation Laboratory, under the direction of Professor Michael Flynn of the Department of Electrical Engineering [2,5]. EMMY is a universal nost machine which closely fits our needs. It is an unbiased yet efficient host for a wide range of target machine architectures. EMMY is scheduled to go into vroduction in late 1977 by ICL of England (the emulation laboratory has been involved in the development of a prototype). We feel tnat this machine would suit our needs, but further evaluation is necessary. We expect most of development of the HAINSAIL DEL to be independent of any particular micro program representation. In particular, we are not at this time proposing to carry out any hardware design to orient the host processor towards MAINSAIL, though this approach would be reasonable if a larze number of processors were to be distributed solely to support MAINSAIL execution. Privileged Communication 237 J. Lederberg Appendix IV MICROPROGRAMMED MAINSAIL PLANS References 1. oi on J. Burroughs Corp., "B-1700 Systems Reference Manual," Burroughs Corp., Detroit, Michigan, 1972. . Flynn, M. J., Hoevel, L. W., and Neuhauser, C. J., "The Stanford Emulation Laboratory," Digital Systems Lab., Technical Report No. 118, Stanford University, June 1976. Hoevel, L. W. and Flynn, M. J., "The Structure of Directly Executed Languages: A New Taeory of Interpretive System Support," Digital Systems Laboratory, Tecnnical Report No. 130, Stanford University, March 1977. Hoevel, L. W., "DELtran Principles of Operation," Digital Systems Laboratory, Technical Note No. 108, Stanford University, March 1977. Neuhauser, C. J., "An Emulation Oriented, Dynamic Microprogrammable Processor," Digital Systems Lab., Technical Note No. 65, Stanford University, October 1975. Wilner, W., "Burroughs B-1700 Memory Utilization," AFIPS Proceedings, Vol. 41- I, FUCC, 1972, pp. 579-586. Lederberg 238 Privileged Communication AIM MANAGEMENT COMMITTEE MEMBERSHIP Appendix V Appendix V AIM MANAGEMENT COMMITTEE MEMBERSHIP Tne following are the membership lists of the various SUMEX-AIM Managenent committees at the present time: Alii EXECUTIVE COMMITTER: LEDERBERG, Joshua, Ph.D. (Chairman) Department of Genetics, S331 Stanford University Medical Center Stanford, California 94305 (415) 497-5801 AMAREL, Saul, Ph.D. Department of Computer Seience Rutgers University New Brunswick, New Jersey 08993 (201) 932~3546 BAKER, William R., Jr., Ph.D. (Executive Secretary) Biotechnology Resources Progran National Institutes of Health Building 21, Room 5B43 9000 Rockville Pike Bethesda, Maryland 20014 (301) 496-5411 LINDBERG, Donald, M.D. (Adv Grp Member) §05 Lewis Hall University of Missouri Columbia, Missouri §5201 (314) 882-6966 WYERS, Jaek D., M.D. school of Medicine Seaife Hall, 1291 University of Pitisburga Pittsburgh, Pennsylvania 15261 (442) 624-2649 Privileged Communication 239 J. Lederberg Appendix V AIM MANAGEMENT COMMITTEB MEMBERSHIP AIM ADVISORY GROUP: J. Lederberg LINDBERG, Donald, M.D. (Chairman) AMAREL, 605 Lewis Hall University of Missouri Columbia, Missouri 65291 (314) 882-6966 saul, Ph.D. Department of Computer Science Rutgers University New Brunswick, New Jersey 03993 (201) 932-3546 BAKER, William R., Jr., Ph.D. (Executive Secretary) BOBROW, Biotechnology Resources Program National Institutes of Healta Building 31, Room 5B43 9000 Rockville Pike Bethesda, Maryland 20014 (301) 496-5411 Daniel G., Pn.D. {Term expiring] Xerox Palo Alto Research Center 3333 Coyote Hill Road Palo Alto, California 9430} (415) 494-4438 FRIGENBAUM, Edward, Pn.Dd. Department of Computer Science Polya Hall, Room 213 Stanford University Stanford, California 94305 (415) 497-4079 PELDMAN, Jerome, Ph.D. {Term expiring] Department of Computer Science University of Rochester Rochester, New York (716) 275-5671 LEDERBERG, Joshua, Ph.D. (Ex-officio) MILLER, Principal Investigator -— SUMBY Department of Genetics, $331 Stanford University Medical Center Stanford, California 94205 ” (415) 497-5801 George, Ph.D. {Term expiring] Tne Rockefeller University 1230 York Avenue New York, New York 19021 (212) 360-1801 qo dz Q Privileged Communication AIM MANAGEMENT COMMITTEE MEMBERSHIP Avpendix V MOHLER, William C., M.D. Associate Director Division of Computer Research and Technology National Institutes of Health Building 12A, Room 3033 g000 Rockville Pike Bethesda, Maryland 29914 (301) 496-1168 “MYERS, Jack D., M.D. senool of Medicine scaife Hall, 1291 University of Pittsburgh Pittsburgh, Pennsylvania 15261 (412) 624-2649 REDDY, D.R., Ph.D. {Term expiring] Department of Computer Science Carnegie-Mellon University Pittsburgh, Pennsylvania (412) 621-2600, Ext. 149 SAFIR, Aran, M.D. Department of Ophthalmology Mount Sinai School of Medicine City University of New York Fifth Avenue and 100th Street New York, New York 10029 (212) 369-4721 Privileged Communication 244 d. Lederberg Appendix ¥ ALM MANAGEMENT COMMITTEE MEMBERSHIP STANFORD COMMUNITY ADVISORY COMMITTEE: LEDERBERG, Dr. Joshua (Chairman) Principal Investigator -— SUMExX Department of Genetics, $331 Stanford University Medical Center Stanford, California 94395 (415) 497-5801 COHEN, Stanley N., M.D. Department of Clinical Pnarmacology, S169 Stanford University Medical Center Stanford, California 94395 (415) 497-5315 DJERASSI, Dr. Carl Department of Chemistry, Stauffer I-106 Stanford University Stanford, California 94205 (415) 497-2783 FEIGENBAUM, Dr. Edward serra House Department of Computer Science Stanford University Stanford, California 94395 (415) 497-4878 LEVINTHAL, Dr. Elliott C. Department of Genetics Stanford University Medic Stanford, California 94390 (415) 497-5813 J. Lederberg 242 Privilezed Communication USBR INFORMATION - GENERAL BROCHURE Appendix VI Avpendix V ate ee USER INFORMATION - GENERAL BROCHURE Revised May 1976 Privileged Communication 243 J. Lederbers GUIDELINES FOR PROSPECTIVE USERS Appendix VII Appendix VIL GUIDELINES FOR PROSPECTIVE USERS SUHEX-AIM RESOURCE INFORMATION FOR POTENTIAL USERS National users may gain access to the facility resources through an advisory panel for a national progran in Artificial Intelligence in Medicine (AIM). Tae AIM Advisory Group consists of members~at-large of the AI and medical coamunities, facility users and the Principal Investigator of SUMEX as an ex- officio member. A representative of the National Institutes of Health- Biotechnology Resources Program (NIH-BRP) serves as Executive Secretary. Under its enabling 5-year grant, the SUMEX-AIM computing resource is allocated to qualified users without fee. This, of course, entails a careful review of the merits and priorities of proposed applications. At the direction of the Advisory Group, expenses related to communications and transportation to allow specific users to visit the facility also may be covered. USER QUALIFICATIONS The SUMEX-AIM facility is a community effort, not. merely a machine service. Applications for membership are judged on the basis of the following criteria: 1) The seientifie interest and merit of the proposed ressarch and its relevance to the health research missions of the NI. 2) The congruence of research needs and goals to the AI functions of SUMEX- AIM as opposed to other computing alternatives. 3) The user’s prospective contributions and role in the community, with respect to computer science, e.s., developing and snaring new systems or applications programs, sharing use of special hardware, ete. 4) The user’s potential for substantive scientific cooperation with the community, e.g., to share expert knowledze in relevant scientific Specialties. 5) The quantitative demands for specific elements of the SUMRX-AIM resource, taking account of both mean and ceiling requirements. In many respects, this requires a different kind of information for judgment of proposals than that required for routine zrant applications seeking monetary funding support. Information furnished by users also is indispensible to the SUMEX staff in conducting their planning, reporting and operational functions. Privileged Communication 245 J. Lederberg Appendix VII GUIDELINES FOR PROSPECTIVE USERS The following questionnaire encompasses the main issues concerning the Advisory Group. However, this should neither obstruct clear and imaginative presentation nor restrict format of the application. The potential user should prepare a statement in his own words using previously published material or other documents where applicable. In this respect, the questionnaire may be most. useful as a checklist and reference for finding in other documentation the most cogent replies to the questions raised. For users mounting complex and especially non-standard systems, the decision to affiliate with SUMEX may entail a heavy investment that would be at risk if the arrangement were suddenly terminated. The Advisory Group endeavors to follow a responsible and sensitive policy along these lines-~-one reason for cautious deliberation; and even in the harshest contingencies, it will make every effort to facilitate graceful entry and departure of qualified users. Conversely, it must have credible information about thoughtful plans for long- term requirements including eventual alternatives to SUMEX-AIM. SUMEX-AIM is a research resource, not an operational vehicle for health care. Many programs are expected to be investigated, developed and demonstrated on SUMEX-AIM with Spinoffs for practical implementation on other systems. In some cases, the size, scope and probable validation of clinical trials would preclude their being undertaken on SUMEX-AIM as now constituted. Please be as explicit as possible in your plans for such outcomes. Applicants, therefore, should submit: 1) One to two-page outline of the proposal. 2) Response to questionnaire, cross-referenced to supportins documents where applicable. 3) Supporting documents. 4) List of submitted materials, cross-referenced. We would welcome a draft (2 copies) of your submission for informal comment if you so desire. However, for formal consideration by the SUMEX-AIM Advisory Group, please submit 13 copies of the material requested above in final forn. Elliott Levinthal, Ph.D. AI User Liaison SUMEX-AIM Computer Project e/o Department of Geneties, SO47 Stanford University Medical Center Stanford, California 94305 Telephone: (415) 497-5813 May, 1976 J. Lederberg 246 Privileged Communication GUIDELINES FOR PROSPECTIVE USERS Appendix VII SUMeExX-AIM RESOURCE QUESTIONNAIRE FOR POTENTIAL USERS Please provide either a brief reply to the following or cite supporting documents. A) EDICAL AND COMPUTER SCIENCE GOALS 1) Deseribe the proposed research to be undertaken on the SUMEX-AIM resource, 2) How is this research presently supported? Please identify application and award statements in which the contingency of SUMEX~AIM availability is indicated. What is the current status of any application for grant support of related research by any federal agency? Please note if you have received notification of any disapproval or approval, pending funding, within the past three years. Budgetary information should be furnished where it concerns operating costs and personnel for computing Support. Please furnish any contextual information concerning previous evaluation of your research plans by other scientifie review groups. 3) What is the relevance of your research to the AI aoproach of SUMEX-AIM aS opposed to other computing alternatives? B) COLLABORATIVE COMMUNITY BUILDING 1) Will the programs designed in your research efforts have some possible general application to problems analogous to tnat research? 2) What application programs already publically available can you use in your research? Are these available on SUMBX-AIM or elsewhere? 3) What opportunities or difficulties do you anticipate with regard to making available your programs to other collaborators within a reasonable interval of publication of your work? 4) Are you interested in discussing with the SUHEX staff possible ways in which other artificial-intelligence research capabilities might interrelate with your work? 5) If approved as a user, would you advise us regarding collaborative opportunities similar to yours with other investigators in your field? C) HARDWARE AND SOFTWARE REQUIREMENTS 1) What computer facilities are you now using in connection with your research or do you have available at your institution? In what respect do these not meet your researen requirements? Privileged Comnunication i) “4 J. Lederbers Appendix VII GUIDELINES FOR PROSPECTIVE USERS 2) What languages do you either use or wish to use? Will your research require the addition of major system programs or languages to the system? Will you maintain them? If you are committed to systems not now maintained at SUMEX, what effort would be required for conversion to and maintenance on the PDP-1G - TENEX system? What are the merits of the alternative plan of converting your application programs to one of the already available standards? Would the latter facilitate the objectives of Part B), Collaborative Community Building? 3) Can you estimate your requirenents for CPU utilization and disk space? What time of day will your CPU utilization occur? Would it be convenient or possible for you to use the system. during off-peak periods? Please indicate (as best you can) the basis for these estimates and the consequences of various levels of restriction or relaxation of access to different resources. SUMEX-AIM’s tangible resources can be measured in terms of: a) CPU cyeles. b) Connect time and communications. ec) User terminals (In special cases these may be supported by SUMEX- AIM.). d) Disk space. e) Off-line media-printer outputs, tapes (At most, limited quantities to be mailed.). Can you estimate your requirements? With respect to a) and b), there are loading problems during the daily cycle.--Can you indicate the relative utility of prime-time (0900-1690 PST) vs. off-peak access? 4) What are your communication plans (TYMNET, ARPANET, other)? How will your communication and terminal costs be met? See following note concerning network connections to SUMEX-AIM. 5) If this is a development project, please indicate your long-term plans for software implementation in an applied context keeping in mind the research mission of SUMBX-AIM. Qur procedures are still evolving, and we welcome your sugsestions about this framework for exchanging information. Needless to say, each question should be qualified a) “insofar as relevant to your proposal", and b) “to the extent of available information". Please do not force a reply to a question that seems inappropriate. We prefer that you label it as such so that it can be dealt with properly in future dialogue. J. Lederberg 248 Privileged Communication GUIDELINES FOR PROSPECTIVE USERS Appendix VII Above all, we are eager to work with potential users in any way that would nelp minimize bureaucratic burdens and still permit a responsible regard for our accountability both to the NIH and the public. Please do not hesitate to address the substance of these requirements in the format most applicable to you. NETWORK CONNECTIONS TO SUMEX-AIM TYANET Attached is a list of available TYMNET nodes and associated telephone numbers. The cost to users of using TYMNET is the telepnone charge from user location to the nearest TYMNET node. Tnis is available only for communication to SUMEX-AIM and not for other facilities that may be connected to TYMNET. In some cases, there are "foreign exchanges" set up by users. These may offer less expensive communication. Details of these possibilities can best be learned by calling the nearest TYMNET node. The telepnone company can provide information on comparative costs of leased lines, toll charges, etc. The initial capital investment for TYMNET installation as well as login and hourly charges is provided by SUMEX-AIM. Standard usage charses on TYMNET are approximately $3/connect—hour. SUMEX-AIM is connected to the ARPANET. Our name is SUMEX-AIM; our nickname is AIM. We support the new TELNET protocol. Our network address is decimal 56, octal 70. This provides convenient access for ARPANFT Hosts and Associates and tnose who have accounts with ARPANET. Privileged Communication 24g J. Lederberg