Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH ~ NII, H. Penny RESEARCH SUPPORT Funding _ Current Project % of Grant Grant No. Title of Project Year Period Effort Agency DAHC- Heuristic Programming (incl. Indireet Costs) ARPA 15-7 3-C-~0435 Project Current: $ 225,762 ¢$ -- 100 (7/76-7/77) (7/73-7/77) Proposed renewal: $ 375,000 ¢$ 725,000 80 (8/77-9/78) (8/77-9/79) MCS 74-23461 Automation of $ 75,000 $ 150,200 20 NSF Scientific Inference: (5/77-4/78) (5/77-4/79 (eff. Heuristic Computing + 6 mos.) 6/TT) Applied to Protein (inel. Indirect Costs) Crystallography RECENT PUBLICATIONS 1. Feigenbaum, E.A., Nii, H.P., et al.: HASP (Heuristic Adaptive Surveillance Program) Final Report, Vol. I-IV, Technical Report under ARPA Contract M66314-~74-C-1235, Systems Control, Inc., Palo Alto, California, 1975. (Classified document) 2. Engelmore, R.A. and Nii, H.P.: A Knowledge-based System for the Interpretation of Protein X-ray Crystallographic Data. Heuristic Programming Project Memo, HPP-77-2 (also STAN-CS-77-589), January, 1977. 3. Nii, H.P. and Feigenbaum E.A.: Knowledge-based Understanding of Signals. Proc. Workshop on Pattern-Directed Inference Systems, May, 1977. 26 SECTION UH — PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH (Giva the following information for all professional personnal listed on page 3, beginning with tha Principal Investigator, SF continuation pegas and follow tha came general format for aach person} NAME TITLE BIRTHDATE (a, Day, ¥7.)} RINDFLEISCH, Thomas C. Senior Research Associate December 10, 1941 PLACE OF SIRTH (City, Stata, Counrry} PRESENT NATIONALITY (If non-US eftizen, SEX indicate kind of visa and expiration cata} Oshkosh, Wisconsin, U.S.A. U.S. citizen (Mala (“) Femata EDUCATION (3 29in with baccslaursaia training and includes postdcetory) ane YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONZEARED FIELD Purdue University, Lafayette, Indiana B.S. 1962 Physics California Institute of Technology, M.S. 1965 | Physics Pasadena Ph.D. Thesis to ne completed; all course work and examinations completed. HONORS Graduated with Highest Honors, Purdue University NSF Fellowship, Caltech Sigma Xi MAJOR RESEARCH INTER =ST Computer science AOLE IN PROPOSED PROJECT applications in medical research; image Facility Manager processing and artificial intelligence RESEAACH SUPPORT (See instructions} RESEARCH AND/OA PROFESSIONAL EXPERIENCE (Starting with prasant zosition, list training and experiance ratavant to orza of project List a} or mest representative pudlications, Do not axcead 3 pages for each individual.) ‘ Department of Genetics, Stanford University School of Medicine: 1976 ~ present Senior Research Associate/Director, SUMEX Computer Project 1974 - 1976 Research Associate/Director, SUMEX Computer Project . 1971 - 1976 Research Associate — Mass Spectrometry, Instrumentation Research Jet Propulsion Laboratory, California Institute of Technology, Pasadena: 1969 - 1971 Supervisor of Image Processing Development and Applications Group 1968 - 1969 Mariner Mars 1969 Cognizant Engineer for Image Processing 1962 - 1968 Engineer, design and implement image processing computer software PUBLICATIONS (See continuation page.) HIH 393 (FORNERLY PHS 393) 27 Rev. 1/73 # YU. S. COVERNMENT PRINTING OFFICE: 1974 saspso/smas Privileged Communication Joshua LEDERBERG BLOGRAPHICAL SKETCH - RINDFLEISCH, Thomas C. PUBLICATIONS 10. 11. 13. 4. Rindfleisch, T. and Willingham, D.: A Figure of Merit Measuring Picture Resolution. JPL Technical report 32-666, September, 1955. Rindfleisch, T.: A Photometric Metnod for Deriving Lunar Topographic Information. JPL Technical Report 32-785, September, 1965. Rindfleisch, T. and Willingham, D.: A Figure of Merit Measuring Picture Resolution. Advances in Electronics and Electron Physics, Vol. 22A, Photo~Electronic Image Devices, Academic Press, 1956. Rindfleisch, T.: Photometric Method for Lunar Topography. Photogrammetric Engineering, March, 1966. Rindfleisch, T.: Generalizations and Linitations of Photoclinometry. JPL Space Science Summary, Vol. ITI, 1957. Rindfleiseh, T.: The Digital Kemoval of Noise from Imagery. JPL Space Science Summary 37-62, Vol. III, 1970. Rindfleisch, T.: Digital Image Processing for the Rectification of Television Camera Distortions. Astronomical Use of Television- Type Image Sensors. NASA Special Publication SP-256, 1971. Rindfleisch, T., Dunne, J., Frieden, H., Stromberg, W. and Ruiz, R.: Digital Processing of the Mariner 6 and 7 Pictures. J. Geophysical Research, Vol. 76, Yo. 2, January, 1971. Pereira, W.E., Summons, R.E., Reynolds, W.E., Rindfleisch, T.c. and Duffield, A.M.: The Quantitation of Beta-Aminoisobutyric Acid in Urine by Mass Fragmentography. Clinica Chimica Acta, 49, 1973. Summons, R.E., Pereira, W.E., Reynolds, W.E., Rindfleisch, T.C. and Duffield, A.M.: Analysis of Twelve Amino Acids in Biological Fluids by Mass Fragmentograpny. Analytical Chemistry, Vol. 46, No. 4, April, 1974. Pereira, W.E., Summons, R.E., Rindfleisch, T.c. and Duffield, A.M.: The Determination of Ethanol in Blood and Urine by Mass Pragmentograpny. Clin. Chim. Acta, 51, 1974. Pereira, W.E., Sumaons, R.E., Rindfleisoh, T.C., Duffield, A.M., Zeitman, B, and Lawless, J.G.: Stable Isotoos : a ss Fragmentography: Quantitation and Hydrogen-Deuterium Exchange Studies of Eight Murchison Meteorite Amino Acids. Geochem. et Cosmochim. Acta, 39, 153, 1975. h x Dromey, R.G., Stefik, M.J., Rindfleise T.C. and Duffield, A.M. Extraction of Mass Spectra Free of Background and Neighboring Component Contributions from Gas Chromatography/Mass Spectrometry Data. Analytical Chemistry, 48, 1358, 1976. Ui ~~ Smith, D.H., Yeager, W.J., Anderson, P.J., Fiteh, W.., Rindfleiseh, T.Cc. and Achenbach, M.: Historical Library Search. An Approach to Quantitative Comparison of GC/MS Profiles of Complex Mixtures. (Submitted for publication) 28 SECTION 11 — PAIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH (Giv2 the folowing information for all professional parsonnal listed on paga 3. baginnirg vith tha Principal Inyastigator. Use continuation pegas and follow tha same general format for cach person} NAME TITLE BIRTHDATE (Ma, Day, Yr) SCHULZ, Rainer W. Computer Systems Specialist January 29, 1942 PLACE OF SIATH (City, Stara, Country) PRESENT NATIONALITY (/f non-U.S citizen, SEX indicats kind of visa and axpiration cata} Berlin, Germany U.S. citizen (OU Mate ("} Female EDUCATION (2agin with baccalaureate training and include postdoctoral} = . - YEAR SCIENTIZIC INSTITUTION AND LOCATION DEGREE CONFERAED FIELD California State University, San Jose B.A. 1964 Mathematics, Engineering RONOAS Graduated Summa Cum Laude, California State University MAJOR ASSEARCH INTEREST Computer systems design ROLE IN PROPOSED PROJECT System Programmer RESEARCH SUPPOAT (Ses instruczons) RESEARCH AND/OA PROFESSIONAL EXPERIENCE (Strrting with prasent position, jist training and axperiance rasvant to area of project. List all or Most represantstive pubkicstions, Do not exceed 3 pages for eech individual.) (See continuation page.) PUBLICATIONS (none) HIH 393 (FORYERLY PNS 398) Rav, 1/73 29 ® U.S, GOVERNMZNT PRINTING OFFICE: 1974 sas-25a/a004 Privileged Communication Joshua LEDERBERG BIOGRAPHICAL SKETCH - SCHULZ, Rainer W. RESEARCH AND/OR PROFESSIONAL EXPERIENCE Work Experience: 1971 present Institute for Mathematical Studies in the Social Sciences (IMSSS), Stanford University: System Manager. Responsible for operations of large-scale PDP-10 timesharing system. Manager, system software. Technical evaluation responsibility of software and computer hardware. System design and systems development. 1970 -— 1971 Computer Qperations, Inc., Costa Mesa, California: Design of operating system for computer to be built by COT. 1969 ~ 1970 Berkeley Computer Corporation, Berkeley, California: Project leader of BCC timesharing software. Guided monitor and peripheral processor software design and implementation. Coded approximately 50% of basic system. Wrote some micro eode for peripheral processors. 1967 - 1969 Scientific Control Corporation, Dallas, Texas: Assisted Project Genie at the University of California, Berkeley, refining XDS 940 timesharing system. Involved in design of SCC 6700 timesnaring software and hardware, particularly resource allocation and memory management. 1965 =~ 1967 Xerox Data Systems, El Segundo, California: Diagnostic programming for I/O channels. Design of peripheral hardware simulators. Design/implementation of multi-vrogramned system evaluation and diagnostic test for all Sizma computers. 1904 - 1965 IBM, San Jose, California: Wrote an assembler and loader for IBM 1890 and 1130 systems. Assembler ran on a 1401. Wrote diagnostic programs for process control equipment. Assisted engineering in debugeing prototype 1800 and 1130 machines. 30 Privileged Communication Joshua LEDERBERG SLOGRAPHICAL SK ETCH - SCHULZ, Rainer W. Research and/or Professional Experience (continued): Professional Aetivities: 1975 1974 1974 - 1975 1974 = 1975 1973 - present 1973 - present 1973 - 1974 1971 - 1976 1971 - 1973 Intel Corporation, Santa Clara, California: Data processing administrative consultant. systen performance and hardware evaluation. System improvement proposals. System Control, Ine., Palo Alto, California: Secure system design. Consultant in system computer system evaluation. design and University of Southern California (USC~ECL, USC-ISI), Los Angeles: Consultant in system and administrative area regarding computer operations and system develooment. Digital Equipment Corporation, Marlboro, ilassachusetts: Consultant in system development area and marketing decisions for large-scale systems. National Science Foundation, Washington, D.C.: Consultant in technological innovations. Evaluating proposals for technical feasibility. Reviewing highly technical projects in computer science area. Computer Curriculum Corporation, Palo Alto, California: System consultant and software management of prograaminz staff for small computer systems. University of Hawaii, Honolulu: Lecturer in Computer Systen Design and Conputer-Assisted Instruction. Ames Research Center, Mountain View, California: Consultant in System Design and Development of timesharing systems for the ILLIAC IY Project. rnia: Institute for the Future, Menlo Park, o sign r Information Consultant in Computer Svstem De Retrieval Systeus. 31 SECTION Il — PRIVILEGED COMMUNICATION BIOGRAPHICAL SXETCH {Siva the foliowing information for ail professional personne listed on page 3, beginning with t19 Principal Inystigator. Use continuation pogas and follow the same general format for oaca parson} MAME SWEER, Andrew J. TITLE System Programmer SIRTHDATE (Ma, Day, Yr} March 12, 1945 PLACE OF BIRTH [City, Stata, Counmry} Washington, D.C., U.S.A. U.S. citizen PRESENT NATIONALITY Uf ren-U.S citizen, indicsta kind of visa and expiration date) SEX KJ Mata ([] Femaia EDUCATION (82gin with daccalaursete training and includa pastcoctoral)} eron YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERRED FIELO University of Pittsburgh, Pennsylvania B.S. 1965 Mathematics University of Pittsburgh, None -_—— Mathematics, graduate school (1965-66) Computer Science HONORS MASOR RESEAACH INTEREST Operating systems ROLE tN PROPOSED PROJECT System Programmer RESEARCH SUPPOAT (S29 jnstructions} RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Sarrting with prssent position, lizt tesiniog and expariance reiavant to aras of prefect, List all or most representativa publications, Do not exceed 3 peges for each individual.) 1976 —- present Head System Programmer, SUMEX Computer Project, Department of Genetics, Stanford University 1974 ~ 1975 Senior Systems Designer, ILLIAC IV Project, Evans and Sutherland 1970 - 1974 Systems Analyst Supervisor, Computer Center, University of Pittsburgh 1968 1966 University of Pittsburgh PUBLICATIONS (none) 1969 Computer Specialist, Office of Personnel Operations, Department of the Army, Headquarters the Pentagon 1968 Systems Programmer/Analyst, Computer Center, HiH 393 (FORMERLY PHS 993) Ray. 1/73 33 w U.S. GCOVYEANMANT PRINTING OFFICE : 1974 saa-esasooca SECTION H — PAIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH {Gives the following information for all professional personnel listed on pags 3, beginning vith the Principal Invsstigator, Use continuation payas and follow the same zanaral format for each parson} NAME VELZADES, Nicholas TITLE R&D Engineer Instrumentation Research Labs. SIRTHOATE fa, Day, ¥rJ August 25, 1932 PLACE OF BIRTH /City, Stata, Country) Larissa, Greece PRESENT NATIONALITY {ff non-US citizan, SEA indicate kind of visa and expiration cata} U.S. citizen () Maia [7 Femata EDUCATION [Segfa with baccalaureate training and ineluda postdoctoral) - epee YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERRED FIELD City College of San Francisco, California (1954-55) University of California, Berkeley B.S. 1958 Electrical Engineering Stanford University M.S. 1961 Engineering Science HONORS MAJOR HESEARCH INTEREST Electronic circuit design HOLE IN PROPOSED PROJECT Electronics Engineer RESEARCH SUPPORT (See instructions} (See continuation page.) RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with prssent position, fist training and aaperiancs retzvant to arse oF project, List all or mest r2prssenta Sys publications, Do not exceed 3 pages for each individual, } 1962 ~- present Electronics Engineer, Instrumentation Research Laboratories, Department of Genetics, Stanford University 1961 ~- 1962 Project Engineer, Fairchild Semiconductor (Instrumentation), Division of Fairchild Instrument and Camera Company, Palo Alto, Ca. 1958 - 1961 Senior Engineer, Link Division, General Precision, Inc., Palo Alto, Ca, PUBLICATIONS (none) NIH 398 (FORSZRLY PHS 39a) Rav. 1/73 35 & U.S. GOYESNMENT PRINTING OFFICE: 1974 sas-osasanos Privileged Communication Josnua LEDERBERG BIOGRAPHICAL SKETCH - YELZADES, Nicholas RESEARCH SUPPORT Funding Current Project % of Grant Grant No. Title of Project Year Period Effort Ageney RR-00512 Resource Related $ 213,530 $ 698,399 25 NIH Research-Computers (5/77-4/78) (5/77-4/89) and Chemistry (DENDRAL) GM20832 Genetics Research $ 265,587 $1,292,113 18 NIH Project (5/77-4/78) (5/74-4/79) NGR-05-020-004 Cytochemical Studies $ 137,509 1 NASA of Planetary (9/76-12/77) Microorganisms 36 . SECTION 10 ~ PRIVILEGED COMMUMICATION BIOGRAPHICAL SKETCH (Give the foitowing information far ail professional personnd listed on pags 3, beginning with the Principal Invastigator. Use continuation peges and fallow the same ganerst format for each person} NAME TITLE BIRTHDATE (4a, Day, Yr) WILCOX, Clark R. Student Research Assistant May 3, 1948 PLACE OF BIRTH (City, Stata, Country) PRESENT NATIONALITY (If non-US. citizen, SEX indicata kind of visa ard axpiration data} Winston-Salem, North Carolina U.S. citizen [2] Mata (] Femata EDUCATICN (329in with baccalauraata training and inciuds postdoctoral) INSTITUTION AND LOCATION DEGREE CONeEHAED aaa Duke University, Durham, North Carolina B.S. 1970 Mathematics Stanford University M.S. 1973 Computer Science Stanford University (1973-present) Ph.D. (In progresg) Computer Science HONOAS Phi Beta Kappa, Duke University Graduated Magna Cum Laude, Duke University MAJOR RESEARCH INTEREST ROLE IN PROPOSED PROJECT Software portability System Programmer RESEARCH SUPPORT (Sas instructions) ESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with prosent position, list training snd expariance relevant to area of project. List all ormecst rspissantative publicatior® Do not exceed 3 pages for each individual.) 1974 - present Student Research Assistant (MAINSAIL design/implementation), SUMEX Computer Project, Department of Genetics, Stanford University 1970 ~ present Ph.D. Candidate, Department of Computer Science, Stanford University: , 1973-present Research in software portability and directly executable languages under Dr. Michael Flynn 1972-73 Research in complexity theory under Dr. Robert Floyd 1969 — 1970 Undergraduate student, Duke University: 1969-70 Research in symbolic computation under Dr. Robert Caviness, Math. 1969-70 Design/implementation of medical information system under Dr. William Hammond, Medicine 1969 Programmer, Computer Center PUBLICATIONS Wilcox, C.R.: MAINSAIL - A Machine Independent Programming System. Proc. Digital Equipment Computer Users Society (DECUS), 2(4):975-979, Spring, 1976. NIH 393 (FORMERLY PHS 398) 37 Rav. 1/73 w U.S. GOVERNMENT PRINTING OFFICE: 1974 sae-25u/2038 COLLABORATIVE PROJECTS 6 COLLABORATIVE PROJECT PROGRESS AND OBJECTIVES The following subsections report on the collaborative use of the SUMEX facility including the formally authorized projects within the Stanford and AIM aliquots and the various "pilot" efforts currently under way. These project descriptions and comments are the result of a solicitation for contributions sent to each of the project Principal Investigators requesting the following information: I) Summary of research program A) Technical goals B) Medical relevance and collaboration C) Progress summary D) Up-to-date list of publications E) Funding status 1) Current funding 2) Pending applications and renewals II) Interactions with the SUMEX-AIM resource A) Examples of collaborations and medical use of programs via ' SUMEX B) Examples of sharing, contacts and cross-fertilization with other SUMEX-AIM projects (via workshops, system facilities, personal contact, etc.) C) Critique of resource services III) Follow-on SUMEX grant period (8/78 - 7/83) A) Long-range user project goals and plans B) Justification for continued use of SUMEX by your project C) Comments and suggestions for future resource goals, development efforts, ete. We believe that the reports of the individual projects speak for themselves as rationales for participation; in any case the reports are recorded as submitted and are the responsibility of the indicated project leaders. 6.1 STANFORD PROJECTS The following group of projects is formally approved for access to the Stanford aliquot of the SUMEX-AIM resource. Their access is based on review by the Stanford Advisory Group and approval by Professor Lederberg as Principal Investigator. As noted previously, the DENDRAL project was the historical eore application of SUMEX. Although this is described as a "Stanford project," a Significant part of the development effort and of the computer usage is dedicated to national collaborator-users of the DENDRAL programs. Privileged Communication 4 J. Lederberg Section 6.1.1 DENDRAL PROJECT 6.1.1 DENDRAL PROJECT DENDRAL - Resource Related Research - Computers & Chemistry Carl Djerassi, Principal Investigator Professor of Chemistry Stanford University I. OVERVIEW OF RESEARCH ACTIVITIES Technical Goals Qur research, development and future plans focus on both the question of structure elucidation in general and the problem of providing computer assistance to scientists engaged in specific aspects of this important activity. A simplified representation of major milestones in solving unknown biomolecular structures by manual methods is presented in Figure 1. UnNxNOWN ee Jo) SPECTROSCOPY STRUCTURAL comPoun SPECTRA pata INFERENCES oR CHEMsBIOL/ [TT + pe} 9 =STRUCTURE REARRANGED OTHER DATA INTERPRETATION AND ASSEMBLY — CONSTRAINTS KNOWN PHYS.HISTORY COMPOUNDS NEW STRUCTURAL ELIMINATE ANDIDAT INCONSISTENT [-—> Colon INFERENCES AND STRUCTURES CONSTRAINTS MORE SPECTROSCOPY common - 2 rLAN AND EXAMINE ‘ 1 | EXPERIMENTS UNIQUE STRUCTURES ; CHEMICAL ' FEATURES ! | TRANSFORMS ’ ' ' t ‘REACTION ‘ FINAL oo 1 STRUCTURES | 1 SEQUENCES , ns ' ee eee ee ee Figure 1. Important steps in manual solution of structures of unknown chemical compounds. These steps, indicated as separate boxes, may be performed explicitly or implicitly. There are considerably more complex relationships among the boxes of Fig. 1 than are indicated when structures are actually solved. Nevertheless, the Figure provides a good introduction to both our recent work and our future directions. We describe briefly each of the milestones in the following paragraphs. More detailed discussions of each topic follow in subsequent sections. J. Lederberg 2 Privileged Communication DENDRAL PROJECT Section 6.1.1 The first step in identification of an unknown structure is to separate it from other components in a potentially complex mixture and to isolate it in reasonably pure form. These steps are performed by scientists, frequently with the assistance of various instruments. Although our research is not directed toward any part of this separation and isolation procedure (except insofar as these procedures also yield data which are subject to computer-assisted interpretation), information about the chemical and physical characteristics of tne compound may be crucial to further efforts to determine its structure. Depending on the quantity of sample available and its characteristics, various spectroscopic and additional chemical data are then collected on the unknown. A mass spectrum is frequently obtained, e.g., from a combined gas chromatograph/mass spectrometer (GC/MS) system. An important part of our recent proposal to the NIH is directed toward automation of combined GC/MS systems operated at high mass spectrometer resolving powers. Data on elemental compositions and relative ion abundances are then available in computer-readable form for further analysis (see MSRANK). The chemist possess an armamentarium of Spectroscopic techniques which can be brought to bear on a structure. One advantage of our work is that any data so obtained can be used to help solve the Structure as long as it can be expressed, manually or by computer, in Substructural statements about the unknown. The next important phase in structure elucidation is interpretation of the available data (Fig. 1) in terms of structural features of the molecule. These interpretations may be in terms of known structural units ("superatoms", polyatomic aggregates of atoms in known configurations), or in terms of structural units, ring sizes, proton or carbon distributions. The latter set of features represents constraints on the kinds of structures which are possible. Our efforts in the area of computer-assisted data interpretation are focussed on mass spectral and carbon-13 nuclear magnetic resonance (13CMR) data. We are developing general approaches to automated analysis of these data in terms of structural features of unknowns. Our recent efforts are summarized in Figure 2, and discussed in detail Subsequently. We have been concerned with use of these data from two points of view, planning and prediction (Fig. 2). During planning, experimental data are examined in order to extract specific structural information to be used in assembling candidate structures. In prediction each candidate structure is tested to determine how closely its predicted spectrum agrees with the observed Spectrum. The candidates can be ranked accordingly. The Meta-DENDRAL research is directed toward determination of rules of spectroscopic data which can be used either for planning or prediction (see below). Given possible structural fragments of the complete molecule and constraints on how these fragments may be assembled into complete molecules, a process of structural assembly follows (Fig. 1). There has been no proven algorithm for solving this problem prior to earlier work supported by tne current grant. Traditionally, this process has been left to manual, pencil and paper work. Our CONGEN program, which was designed to solve this problem, is the farthest advanced of programs designed to assist in various aspects of structure elucidation. It performs the structural assembly process, under constraints, and Privileged Communication 43 J. Lederberg Section 6.1.1 DATA INTERPRETATION PLANNING” EXTRACTION OF STRUCTURAL INFORMATION DIRECTLY FROM SPECTROSCOPIC DATA, DENDRAL PROJECT PREDICTION USE OF SPECTROSCOPIC DATA TO RANK CANDIDATE STRUCTURES, 1. Mass Spectra - MDGGEN 1. MSPRUNE, MSPRED 2, I5CNMR . 2, 13CNMR “ Meta — DENDRAL FORMATION OF RULES TO BE USED FOR BOTH PLANNING AND PREDICTION, Figure 2. Relationship between use of rules in either planning or prediction. Both approaches are used in utilizing data for structure elucidation. J. Lederberg Wy Privileged Communication DENDRAL PROJECT Section 6.1.1 allows the scientist using the program to examine structural candidates and remove those deemed implausible (Fig. 1). A large portion of our recent and future work is directed toward improving the CONGEN program and building other facilities around it (see later sections). We have demonstrated the utility of CONGEN in structural studies, and subsequent sections discuss our recent developments and applications of CONGEN as well as our interactions with other scientists desiring access to our programs. Given a set of structural candidates, the experimenter examines them to determine what experiments might be performed to focus on the correct structure by stepwise rejection of alternative hypotheses. When there are only a small number of possibilities under consideration, manual methods suffice. But CONGEN provides the capability for exhaustive enumeration of structural possibilities at a point in a structural problem when there may be many hundreds of possibilities. It is very difficult to examine these structures and plan experiments by hand. We have begun exploring ways to provide computer assistance to this important aspect of structure elucidation. We refer to this research area as the Experiment Planner, discussed in more detail below. When new experiments have been planned the researcher carries them out and uses the results as additional constraints on the structural candidates (Fig. 1). New experiments may include collecting of additional spectroscopic data or performing a sequence of chemical reactions on the unknown. The latter experiments may be chosen to convert the unknown into a related compound which possesses physical or chemical properties more amenable to analysis. During the past year we have developed a program to assist scientists in carrying out representations of chemical reactions in the computer and eliminating undesired structural candidates based on constraints exercised on the products of the reaction. This work is described in two subsequent sections. One section describes use of the program, which we call REACT, to explore structural possibilities exactly as outlined above. A later section describes recent progress in increasing the power of REACT. Medical Relevance Structure elucidation is a fundamental problem for medical practice and biomedical research. For example, we are collaborating with physicians in the Department of Pediatrics who monitor the body fluids of newborn infants in order to detect abnormal compounds. Much of the research leading to new drugs and new methods for synthesizing drugs also depends on careful analysis and identification of molecular structures of compounds. The computer tools that we are developing will aid in the determination of molecular structures by giving working scientists help with data collection, data interpretation, hypothesis testing and, most important, systematic consideration of all molecular structures that are consistent with the interpretations of the available data. Privileged Communication 45 J. Lederberg Section 6.1.1 DENDRAL PROJECT PROGRESS SUMMARY Experiment Planner We have begun preliminary considerations of design and implementation of an experiment planner. This program will assist chemists in designing the most effective set of experiments to perform to solve the structure. Although the experiment planner will be a future activity of our group, we are developing and using other structure manipulation functions which will provide groundwork for future developments. One important aspect of experiment planning is the ability to examine in Some way the set of candidate structures. Although many can be drawn for visual review, drawing is impractical when dozens or hundreds of structures are involved. To assist persons using CONGEN in reviewing their structures we have developed a function auxiliary to CONGEN which we call SURVEY. SURVEY FUNCTION: Arps IN PERCEPTION OF ANY OF A PRE-SPECIFIED SET OF STRUCTURAL _ FEATURES IN A GROUP OF STRUCTURAL CANDIDATES, E.G. A) FUNCTIONAL GROUPS B) TERPENOID SKELETONS C) AMINO ACID SKELETONS Figure 3. Function of the SURVEY program and examples of recent application areas. Tne function of SURVEY is summarized in Figure 3. SURVEY simply acts asa reminder to the scientist of the presence or absence of certain structures or Structural features. During the past year we have used SURVEY extensively. For example, we have used it to detect implausible functional groups ina set of candidate structures, using a file of substructures representing a wide variety of functionalities. In many problems, implausible functional groups are forgotten and CONGEN is never constrained to remove them. Another example of use of SURVEY is in conjunction with collaborative work with persons in the J. Lederberg N6 Privileged Communication DENDRAL PROJECT Section 6.1.1 Department of Genetics. Ina analysis of serum or urinary metabolites in patients of high risk of metabolic disorder, we have had occasion to use CONGEN in exploration of unknown structures {Report HPP-77-11]. Some of these structures could formally be conjugates of amino acids with organic acids. If so, such structures will possess backbones of naturally-occurring amino acids. SURVEY was used to provide a summary of which structural candidates possessed such amino acid skeletons. We have recently used SURVEY in a related application involving the structure of "polyalthenol", discussed by LeBoeuf, et al. (Figure 4). Superatoms and constraints supplied to CONGEN to derive structural candidates are summarized in Fig. 4. We summarize in Figure 5 the structural possibilities which resulted. There are five structures possessing a bicyclo[2.1.1] system, and six which possess a bicyclo{4.3.1] system (Fig. 5, top). These structures are energeticaly less favorable. For example, several possess a double bond at a bridgehead atom, which violates Bredt’s Rule. There remain, however, 11 structures which are not formally excluded by data presented by LeBoeuf, et al. Because these workers based their structural assignment on biogenetic grounds, we used SURVEY and REACT to test their hypothesis. We have, in computer-accessible libraries, known terpenoid ring systems which can be used within SURVEY to test sets of structures for known skeletons. Wone of the 22 structural candidates possesses a previously known skeleton. Because the authors postulated a relationship to a known skeleton via a single methyl shift, we used REACT to exercise a single methyl shift in all possible ways on each of the 22 candidates. SURVEY was then used to test the results for the presence of known terpenoid systems, and the drimane skeleton, the postulated precursor of polyathenol, was the only known skeleton which resulted. This does not prove the hypotnesis of LeBoeuf, et al., but certainly helps strengthen it. SURVEY is, however, only the barest beginning of an experiment planner, even though it has proven useful. We plan to build from this beginning toward a much more powerful system. Privileged Communication 47 J. Lederberg Section 6.1.1 DENDRAL PROJECT aN M. LeBoeuf, M. Hamonniere, A. Cave, H. Gottleib, N. Kunesch, and E. Wenkert, Tet. Lett., 3559 (1976). “POLYALTHENOL” Cost NO SUPERATOMS Argpitrary Name NumpeR FV 7 N FY CH-FV | : / | CHis-C-CH-CHp CHC BI 1 | -FY | OH YS F CH2 CHz FV CHz-FV ME 1 FY-CH5-FV CH2 3 AV FV-CH-FV CH l CONSTRAINTS 1) ALL FREE VALENCES BONDED TO NON-HYDROGEN ATOMS 2) GOODLIST IN-CH2-BI 1 To ANY (EVENTUALLY IN-CHy-CHg 5.9) ME-(BI_ CH) 1 To ANY (EVENTUALLY Chz-CH, EXACTLY 1) 3) GOODRINGS 2 exactLy 5 4) BADRINGS 3 Figure 4. Superatoms and constraints supplied to CONGEN in investigations of plausible structural alternatives to the proposed structure of Polyalthenol. J. Lederberg é ivi | 48 Privileged Communication DENDRAL PROJECT (5) \ OH i "2 IN CHoiN OH CH5IN HO HO CHIN Privileged Communication \ OH OH (CHSIN HO HO ; —_ 49 Section 6.1.1 OH (G) \ OH CH» IN CHy5IN OH CHsIN HO Figure 5. Structural candidates for polyalthenol based on data given in Figure 4. J. Lederberg Section 6.1.1 DENDRAL PROJECT REACTION CHEMISTRY DEVELOPMENTS 1, SEPARATION FROM CONGEN - coMMUNICATION VIA FILES OF STRUCTURES, 2, ADDING CONSTRAINTS - SITE - AND TRANSFORM - SPECIFIC, 3, CONTROL STRUCTURE - RAMIFICATION A, ESTABLISH RELATIONSHIPS AMONG PRODUCTS AND REACTANTS B, DEAL PROPERLY WITH RANGES OF NUMBERS OF PRODUCTS 4, INTERACTION - DEVELOP MANIPULATION COMMANDS WHICH PARALLEL LABORATORY OPERATIONS, E.G. SEPARATE INTO FLASKS, TEST CONTENTS OF VARIOUS FLASKS, INCOMPLETE SEPARATIONS, ETC, 5, REPRESENTATION OF REACTIONS 6. PROSPECTIVE DETECTION OF DUPLICATE PRODUCTS BASED ON SYMMETRY PROPERTIES OF: A) STARTING MATERIAL; AND B) TRANSFORMATION, Figure 6. Current and future direction for improvement and extension of REACT, a program for exploration of applications of reaction chemistry to structure elucidation problems. J. Lederberg 50 Privileged Communication