Form Approved SECTION i Budget Bureeu No, 68-RO LEAVE BLANK mee GROup COUNCIL (Month, Year} TO BE COMPLETED BY PRINCIPAL INVESTIGATOR (/tems 1 through 7 and 164} 1. TITLE OF PROPOSAL (Do not exceed 53 typewriter spaces) DEPARTMENT OF HEALTH, EDUCATION, AND WELFARE PUBLIC HEALTH SERVICE ORMERLY GRANT APPLICATION OATE RECEIVED Resource Related Research - Computers and Chemistry (RR-00612 renewal) 2. PRINCIPAL INVESTIGATOR 3.DATES OF ENTIRE PROPOSED PROJECT PERIOD [This applicatic 2A. NAME (Last, First, Initial) / FROM THROUGH Lederberg, Joshua 5/1/74 4/30/77 2B. TITLE OF POSITION 4, TOTAL DIR Re- Dine D mus FOR PERIOD IN FOR FIRST 12-MONTH PERIC Prof d : ofessor and Chairman $1,639 ,456 $488,267 ty, State, Instructions) _ Department of Genetics Stanford University Medical Center Stanford, California 94305 Department of Genetics Department of Chemistry, and Department of Computer Science Stanford University Department of Genetics 3H. MAJOR SUBDIVISION (See Instructions) School of Medicine 7. Hesserch Trvolving Humen Subjects (See Instructions} laventions (Renewal Applicants Only - See Instructions} 77 A.CINO 8.7) YES Approved: A. NO 8.(7] YES — Not previously reported C. (3) YES — Pending Review Date C.CCJYES — Previously reported TO BE COMPLETED BY RESPONSIBLE ADMINISTRATIVE AUTHORITY (ftems 8 through 13 and 168) 9. APPLICANT ORGANIZATION(G) (See Instructions} 7. ION (Check applicable item) CIFEDERAL CJSTATE (CJ LOCAL [KJOTHER (Specify) Stanford University Private, non-profitUniversity sanford, California 94305 3 FICIAL iN ‘BUSINESS OFFICE WHO SHO D ALSO E IRS No. 94-1156365 FFICIA ULD ALSO B ARD iS MA Congressional District No. 17 BOTIFIED IF AN AWARD dE K. D. Creighton Deputy Vice Pres. for Business and Finan: Stanford University Stanford, California 94305 7 AND TELEPHONE NUMB 18 NAME, TITLE, AND TELEPHONE NUMBER OF OFFICIALS) 1 SIGNING FOR APPLICANT ORGANIZATION(S) Telephone Number(415) 321-2 300. X2. V FOR INSTITUTIONAL GRANT PURPOSES (See instructions} O1 School of Medicine c/o Sponsored Projects Office 14, ENTITY NUMBER (Formerly PHS Account Number ) Tetephone Number {s) =2300___X2823 458210 16, CERTIFICATION ANO ACCEPTANCE, We, the undersigned, certify that the statements herein are true end complete to the best of our knowledge end accept, as to any grant awarded, the obligation to comply with Public Health Service terms and Conditions in effect st the time of the averd, iSignetures required on ) — i , APR 26 ‘973 not miM 396 (FORMERLY PHS 998 Rev. 1/73 ——SECTION. 1. DEPARTMENT OF HEALTH, EDUCATION, AND WELFARE LEAVE BLANK PUBLIC HEALTH SEAVICE PROJECT NUMBER RESEARCH OBJECTIVES NAME AND ADDRESS OF APPLICANT ORGANIZATION Stanford University, Stanford, California 94305 SAME, SOCIAL SECURITY NUMBER, OFFICIAL TITLE, AND DEPARTMENT OF ALL mA PERSONNEL ENGAGED ON PROJECT, BEGINNING WITH PRINCIPA ESTIGATOR OFESSIONAL Lederberg, Joshua, Professor of Genetics, Department of Genetics Djerassi, Carl, > Professor of Chemistry, Department of Chemistry Feigenbaum, Edward, Professor of Computer Science, Dept. of Computer Sci Buchanan, Bruce, ‘Research Computer Scientist, Dept. of Computer Scien Duffield, Alan, » Research Associate, Department of Genetics ~ Pereira, Wilfred, Associate, Department of Genetics Rindfleisch, Thomas Associate, Department of Genetics Smith, Dennis Associate, De 7 ’ partment of Chemistry _ Sridharan, Natesa, Associate, Department of Computer Science A e,—-Den mer omistys nme Pen ove ry TITLE OF PROJECT " — : Resource Related Research - Computers and Chemistry USE THiS SPACE TO ABSTRACT YOUR PROPOSED RESEARCH. OUTLINE OBJECTIVES AND METHODS. UNDERSCORE THE KEY WO (NOT TO EXCEED 10) IN YOUA ABSTRACT. The objectives of this research: program are the development of innovative computer and biochemical, analysis techniques for application in medical research and closely related aspects of investigative patient care. We will apply the unique analytical capabilities of gas chromatograph /mass spectromet (GC/MS) and Carbon(13) Nuclear Magnetic Resonance § ectromet (CMR) with the assistance of data interpreting computer programs utilizing artificial intelligence techniques, to investigate the chemical constituents of human body fluids in a variety of clinical contexts, Specific subtasks of this program include; 1) the application of artificial intelligence techniques to programs capable of interpreting mass spectra from basic principles as well as extending mass spectral theory by analysis of solved spectrum structure examples, 2) the extension of GC/MS data systems incorporating an increasing level of automation and allowing the directed collection of specialized information, 3) the application of GC/MS techniques to analyze body fluids such as urine and blood, and to relate detected metabolic abnormalities to clinically observable disease states, and 4) the application of CMR techniques to assist in the determinatic of chemical structure. LEAVE BLANK NIH 398 (FORMERLY PHS 398) PAGE 2 Rev. 1/73 The undersigned ayrees to accept responsibility for the scientific and technical conduct of this project and for provision of required progress reports if a gtant is awarded as the result of this application. -APR2610/3____ __- we, eA Date Joshua Lederber Principal Investigator RESOURCE=RELATED RESEARCH: COMPUTERS AND CHEMISTRY (RR-00612 - Renewal Application) TABLE OF CONTENTS Introduction eesoeootvoetweeeeeveeveeovneseeveeveevnene ov ease eooeoeveevmneeeeveevene ee ees P-1 Part A: Applications of Artificial Intelligence to Mass Spectrometry oeeeseeooeseeeveeaesvseeveeeseeoeveeceee eevee enes oe eae P=5 Part BBCi): Mass Spectrometer Data System Development ........ Pri? Part BCiI): Analysts of the Chemical Constituents of Body Flulds @enpreeHeeeoveeseesee8seeseseeeeoeeewseseweeenheneeeeeneeees P-28 Part C: Extension of the Theory of Mass Spectrometry by Computer @eeeenveeeeseeeeeseersreenseenoveeeteeoveeaeseeeseeeoetceeoaeeveeseteeeseenweeee P-39 Part D: Appltcattions of Carhon(13) Nuclear Magnetic Resonance Spectrometry to Assist [In Chemical Structure Determination ..cccccccscccsccesceececesese PUES Sltgnificance ewpreeeeeeseseeveeeeeeveeteeewtensteseeseeenteenweeeeeweeseeseeoeneneeeoeesteesd @ PR? Co? laborative Arrangements @eeeeoeausueee#eee9#8#Heesesee#etesenwreeoeoeeweeteeseeenvneseeeeree & D-~Rh Facllittes Aval lable @eeoeeea2ns1seeesenteeneteeaeseeeeeeseeoeseeveeeeoeseensveeereeneteneownwte @ P-86 Human Subjects @eeeeoerneevnvwvensseeewvseeewpeoeaeneoueeteeoeoeeseeeesesvseeeesveespeseeveaeec ee P-&8 Budgets and Justification e@orevntetensvseeeveeneeeveee eevee eovonvne snes eeeeranene P-90 Blographles eseeeeeveeeeveeensvnsevrererevneeeveeveseeoeeeveevneoes een eeeaseenseeeevnevneveve P-117 INTRODUCTION INTRODUCTION This proposal seeks a three year extension of our existing jrant for Resource Related esaarch - Computers and Chemistry (RR-Q0612). Over the two years we have been Supported by this grant we have made signiticant progress in all of the areas we initially proposed including clinical applications ot body fluid analysis by yas chroatography/mass spectrometery (GU/MS), extensions to automate our GC/MS instrumentation and data systems, and the development of programs which, in specific areas, match human performance in interpreting mass spectra from first principles as well as extend mass spectral theory to new classes of compounds. Our success to date reinforces our expectations that this research will have a Significant and useful impact on medical research involving studies of human biochemistry. AS discussed in section B(ii) of this proposal, we have bolstered contact with real clinical problems through the Department of Pediatrics (Prof2ssor Howard Cann). We have recently encountered preliminary correlations between the amount of beta-amino isobutyric acid present in the urine of children with lymphoblastic leukemia and the state of their disease; and also between a defect in phenylalanine-tyrosine metabolism and late metabolic acidosis in premature infants. This project is highly interuisciplinary, werging the interests of Professors Lederberg (Genetics), Djerassi (Chemistry), and Feigenbaum (Computer Science), in evolvinj and applying mass spectrometry as an analytical tool in medicine and in modeliny aspects of scientific problem solving processes. Mass Spectrometry is an ideal domain for this collaboration. On the one hand it has special importance to medical science and organic chemistry as a remarkably sensitive and analytically precise physical method for studying human biochemistry at the molecular level. On the other hand, the problems of mass spectrum interpretation are at once sufficiently complex to challenge the human intellect and sufficiently structured to be dealt with by current Computer programming concepts. It is thus a rich, real-world problem domain in which to study the emulation of lower level cognitive tunctions, knowledge representations, and theory formation processes. This combination of interdisciplinary interests promises both near and long term returns for the research investment. AS indicated above, even with relatively crudely automated systems, a significant impact can be made on relevant medical problems. in the longer term the increasing load of body fluid analyses, which will have tos be performed to be responsive to clinical needs, will require unburdening chemists from the laborious processes ot reducing and interpreting the large volumes of data involved. These probleas are squarely adiressed by the proposed use of stored libraries of solved spectra, augmented by computer programs to extend such catalogs by "cognitive" insight. -2- fhis proposal is organized in a manner Similar to the original in that the overall goals are divided into a number of subtasks. These comprise the original subtask definitions as well as one additional task proposed to explore the use ot Carbon(13) nuclear maynetic resonance information as a potentially useful adjunct to mass spectral information to limit the space of candidate molecular structures. The respective proposal subtasks elaborated upon in subsequent sections include: Part A: Applications of Artificial Intelligence to sass spectrometry Part B(i): Mass Spectrometer Data System Development Part B(ii): Analysis of the Chemical Constituents of body Fluids Part C: Extending the Theory of Mass Spectrometry by Computer Part D: Applications of Carbon (13) Nuclear Magnetic kesomance Spectrometry to Assist Chemical Structure Determination This proposal is related to several others pending, in progress, or terminating: 1) SUMEX (NIH: WR-06785, pending - Principal Investigator, J. Lederbery)-- This proposal seeks to establish a computer resource for the application of artificial intelligence in medicine as well as for the exploration of GC/MS as a tool for biomolecular characterization. The present renewal application is subsumed in the SUMEX application but is submitted indepenlently to meet NIH renewal application deadlines which predate National Advisory Research Kesources Council consideration of the SUMEX proposal. Should SUMEX be approvei, this proposal will be withdrawn. Should SUMEX not be approved, this proposal seeks to continue support of our current mass spectrometry research efforts. 2) Genetics Research Center (NIH: pending - Principal Investigator, J. Lederberg)-- This proposal seeks to establish a Genetics Research Center at Stanford for research in medical genetics and the application of such research to clinical aspects of medical genetics. This proposal incorporates a Significant level ot cooperation between the Departments of Genetics and Pediatrics at Stanford including clinical applications of GC/MS. The Genetics Center proposal complements the present renewal application in that it concentrates on research aspects of genetic disease whereas this proposal attacks basic problems of methodology as well as developmental aspects of applying GC/MS analyses of metabolic disorders as indicators of disease states in a broader context. 3) ACHE (NIH: Rk-00311, terminating, July 1973, - Principal Investigator, J. Lederberg)-- the ACME computing resource has been our major source of computing Support for the reduction and analysis of mass spectral data. This Support has been provided as a part of the ACME core research program without an explicit transfer of funds from the DENDRAL project. With the termination of NIh support, the ACME facility will be combined with other Medical Center computing functions on a fee-for-service basis, thereby introducing a new specific iten in our budget to cover these computer costs. 4) Heuristic Programming Research in Artificial Intelliyence (Advanced Research Projects Agency (ARPA): sD-183, in progress ~ Co-Principal Investigators, E. Feigenbaua and J. Lederberj)--This on-going research effort complements the present proposal by supporting those aspects of artifical intelligence concept and program development not directly related to medical problem areas. The present NIH-supported project benefits from this research and acts to enable the transfer of these ideas into a medically relevant context. The current resource grant is headed by Professor &. Feigenbaum as Principal Investigator. He will shortly take a leave of absence for two years to accept the post of Deputy Director of the Information Processing Technigues Office of AKPA. During his absence, Professor Lederberg will act as Principal Investigator of the research project. Whereas Professor Feigenbaum will formally not be a member of the project during his tenure with ARPA, he will maintain his office locally, enabling his to maintain close intellectual contact with our cesearch etfort. PART A: APPLICATIONS OF ARTIFICIAL INTELLIGENCE TO MASS SPECTROMETRY Part Aw Applications of Artificial Intelligence to Mass Spectrometry OBJECTIVES: The overall objective of part A of this proposal is to extend the reasoning power of Heuristic DENDRAL. Mass spectrometry was initially chosen as the task area in which to explore the techniques of heuristic programming for molecular structure elucidation. Much of the past and proposed future efforts will remain directed strongly to analysis of mass spectra because of the sensitivity and speciticity of the technique. It is clear, however, that information available from other spectroscopic techniques, utilized routinely by chemists when sample quantities are sufficient, can and should be used where appropriate to obtain structural information which cannot be provided by mass spectrometry alone. This point is elaborated in the subseyuent discussion of progress and plans. A corollary of the overall objective is to tie the Heuristic DENDRKAL program very closely to the regquiregaents of the Chemical studies outlined below (analysis of steroids from body fluids) and in Part B of the proposal (analysis of chemical constituents of urine, blood, and other body fluids). We have previously directed and will continue to direct our studies toward ciasses of biologically relevant molecules. Thus we have the capability of providing Significant support to the chemically oriented activities as the capabilities of Heuristic DENDRAL are extended. The overall objective encompasses several sub-tasks, outlined below, all of which represent critical steps in building a powerful program in an incremental fashion. This approach provides an operational program which can be used by chemists in a routine production mode, while extensians of the program are under development. The sub-tasks are the tollowing: A) Extend Heuristic DENDRAL to analysis of the mass spectra of complex molecules. This includes the assessaent of the capabilities and limitations of the program in analysis of unknown compounds or mixtures of compounds. It also includes refinement of planning rules which infer compound class or molecular substructure, both being extremely important in subsequent analysis of a mass spectrua. B) Develop the Cyclic Structure Generator to provide DENDRAL with the capabilities for generation of all isomers of a given empirical formula. Define and incorporate constraints on the generator to exclude imaplausible isomers. Enlarge the capacity of the cyclic generator to accept constraints of demanded or forbidden substructures (GOODLIST, BADLIST). C) Develop the ability to incorporate information available from ancillary mass Spectrometric techniques (e.g., metastable ion data, low ionizing voltage data, isotopic labelling) and other spectroscopic data (e.g., substructures from NMR) into the existing Heuristic DENDRAL prograa. D) Extend the Predictor, now capable of prediction of mass Spectra for limited classes of molecules, to the design of experimental strategies. Given a set of data, and partial or ambiguous structural information based on these data, Specify additional experiments which may be done to effect a unique solution or minimize ambiguities. PROGRESS: We have, in the past two years of the existing DENDRAL grant, made significant progress in each of the areas outlined above. We feel that in some areas the progress has been particularly exciting, for example, the completion of the programa for analysis of the aass spectra of complex molecules, and completion of the cyclic structure generator (unconstrained). The following represents a brief outline of accogplishgents to data, keyed to the objeetives A-D above, A) Extension of Heuristic DENDRAL Extension of Heuristic DENDRAL to the mass spectra of complex molecules dictated two important agdifications in the approach used successfully for saturated, aliphatic, monofunctional (SAM) compounds. To reduce ambiguities of elemental composition inherent in low resolution mass Spectra, the decision was sade to extend the program to handle high resolution mass spectral data which specify the eupirical composition of every ion. Although the basic Strategy of Heuristic DENDRAL (plan, generate and test) was Maintained, the absence of a cyclic structure generator at the time the program was written dictated that the basic skeleton, common to the class of molecules analyzed, be specified. The techniques of artificial intelligence have now been applied successfully to a problem of direct biological relevance, namely, the analysis of the high resolution masS spectra of estrogenic steroids. The performance of this program has been shown to compare tavorably with the performance of trained mass Spectroscopists, see Smith, et.al. (1972). The operation of this program has been detailed in this publication, a copy of which is attached. Briefly, the program was designed to emulate the thought processes of an expert as far as possible. High resolution aass spectral data are searched for evidence indicating possible substituent placesents about the estrogen skeleton. Molecular structures allowed by the mass spectral data are tested against chemical constraints, and candidate solutions are proposed. Further details of the performance in analysis of more than thirty estrogen-related derivatives are presented in the above publication. Of particular significance in this effort were, in addition to exceptional performance, the potential for analysis of mixtures of estrogens WITHOUT PRIOR SEPARATION, and for generalization of the programming approach to other classes of molecules. Because of the structure of the Heuristic DENDRAL prograa it is immaterial whether the spectrum to be analyzed is derived from a Single compound or a mixture of compounds. Each component is analyzed, in teras of molecular structure, in turn, independently of the other components. This facility, if successful in practice, would represent a significant advance of the technique of mass spectrometry. Many problena areas, because of physical characteristics of samples or limited sample quantities, could be successfully approached utilizing the spectra of the unseparated mixtures. Even in combined gas chromatography/mass spectrometry (GC/MS), many overlapping peaks will be unresolved and an aralysis progran must be capable of dealing with these sixtures. In collaboration with Prof. H. Adlercreutz of the University of Helsinki, we have recently completed a series of analyses of various fractions of estrogens extracted fron body fluids. These fractions (analyzed by us as unknowns) were found to contain between one and four major components, and structural analysis of each major cogponent was carried out successfully by the above program. fhese sixtures were analyzed aS unseparated, underivatized compounds. The implications of this success are considerable. Many compounds isolated from body fluids are present in very small amounts and complete separation of the compounds of interest trom the many hundreds of other coapounds is difficult, time-consuming and prone to result in sample loss and contamination. We have found in this study that mixtures of limited complexity, which are difficult to analyze by conventional GC/4S techniques without derivatization (which frequently makes structural analysis more difficult), can be rationalized even in the presence of Significant amounts of impurities. A manuscript on this study has been submitted to the Journal of the American Chemical Society In the past year we have extended our library of high resolution mass spectra of estrogens to include 67 compounds. These data represent an important resource and have been included (as iow resolution spectra for the moment) in a collection of mass spectra of biologically important molecules being organized by Prof. S. Markey at the University of Colorado. These data have been used extensively in developing the program strategies for Meta-DENDRAL (see Part C, below}. The Heuristic DENDRAL program for complex molecules has received considerable attention during the last year in order to generalize it from its previous emphasis on specific classes of compounds and program strategies. By removing information which is specitic to estrogens, the program has become much more general. This effort has resulted in a production version of the program which is designed to allow the chemist to apply the program to the analysis of the high resolution mass spectrum of any molecule with a miniaum of effort. Given the spectrum of a known Or unknown compound, the chemist can supply the tollowing kinds of information to guide analysis of the mass spectrua: a) Specitications of basic structure (superaton) corkaon to the class of aolecules. b) Specification of the fragmentation rules to be applied to the superatom, in the form of bond cleavages, hydrogen transters and charge placement. c) Special rules on the relative importance of the various fragments resulting from the above tragmentations. dq) Threshold settings to prevent consideration of low intensity ions. e) Available metastable ion data and the way these data are subseguently used ~~ to establish definitive relationships between fragment ions and their respective molecular ions. f) Available low ionizing voltage data -- to aid the search for molecular ions. g) Results of deuteriua exchange of labile hydrogens -~ to specify the number of, e.g., -OH groups. We have beea very successful in testing the generality of the program, with particular emphasis on other classes of biologically important molecules. We have used the program in analysis of high resolution sass Spectra af progesterone and some methylated analogs, a Small number of androstane/testosterone related compounds, steroidal Sapogenins and n~butyl-trifluoroacetyl derivatives of amino acids. B) Cyclic Structure Generator The cyclic structure generator has been completed after several years of effort under the continuing guidance of Protessor Lederberg. The boundaries, scope and Limitations of chemical structure can now be speci fied. The cyclic structure generator now rests on a firm Mathematical foundation such that we are confident of its thoroughness and ability to generate structures, prospectively avoiding duplicate structures. The prospective nature of the generator is a necessity for efficient implementation, as retrospective checking of each generated structure to eliminate redundancies is too time consuming. The necessary concepts have recentiy been transformed into an operating program. A manuscript describing the mathematical theory of the heart of the generator, the labelling algoritha, has been accepted by Discrete Mathematics (H. Brown, et.al., 1973). A companion manuscript describing the mathematical theory ot the complete generator has been submitted (H. Brown and L. Masinter, 1973, submitted). The cyclic structure yenerator in its entirety (encompassing acyclic and wholly cyclic structures and combinations thereof) will be described for chemists (L. MaSsinter et.al., in preparation). Apart from the labeling algoritha the remainder of the problea involves, first, the combinatorics of assignment of atoms to cycles or chains, and second, construction of acyclic radicals to attach to the rings using the well known principles of acyclic DENDRAL. A companion manuscript will soon be submitted describing for chemists the core of the cyclic structure generator, the labelling algoritha. This algorithm is capable of construction of all isomers, of wholly cyclic graphs, which may be formed by labelling the nodes of a cyclic skeleton with atoms (e.g., C, N, 0) or labelling the atoms of the skeleton with substituents (e.g., -CH3, -OH). Through the use of graph theory, and the symmetry-group properties of cyclic graphs the labelling algorithm avoids construction of redundant isomers. It identifies equivalent node positions prospectively before labelling takes place. It is indicative of the precarious communication between chemists and mathematicians that it had remained unsolved (except for trivial simple cases) despite attention tor over 100 years. As an indication of the complexity of chemistry in teras of numbers of possible structures, take the example of C6H6. The most familiar molecule with this molecular formula is benzene. Yet there are 217 topolggical isomers for C6H6 (with valence constraints) of which only 15 are pure trees. The simple addition of one oxygen atom to the empirical formula of benzene, yielding C6H60, yields 2237 isomers of the most familiar representative, phenol. The first exercise of the generator has been to create a dictionary of carbocyclic skeletons. This time-consuming task would otherwise have to be done each time aie aew molecular foraula is presented. The dictionary is structured to contain keys as to type of skeleton, number of Tings, cring fusion, and so forth. The constraints which we wish to implement are then simple to exercise in the coatext of the dictionary. C) Analysis Using Additional Data Sources Several additional techniques are available to the amass Spectroscopist other than recording the conventional mass spectrum. They provide complementary data which frequently are of great assistance in rationalization of the conventional spectrum, either in terms of structure or fragmentation mechanisms. We have designed the Heuristic DENDRAL program for complex molecules to use data from these additional techniques in auch the same way aS ai chemist does. The following three types of of data can now be used: I) Metastable Ion {MI) Data. Metastable ions provide a means for relating fragment ions to molecular ions in a mass spectrua. This is iaportant in two contexts. In examination of the spectrum of a known compound, the existence of a metastable ion provides strong evidence that a given fragment ion arises at least in part in a single decomposition process from an ion of higher mass (not necessarily the molecular ion). Investigations of this type are necessary to validate the fragmentation rules which guide the Heuristic DENDRAL program. (e.g., investigations of metastable ions of estrogens, Smith, Duffield and Djerassi, 1972). The second context use is the analysis of mixtures of compounds to determine which fragment ions in a very complex spectrugs are descended from which aolecular parents. We have explored the analysis time and specificity of results as a function of the amount of sgetastable ion data available on a mixture. A 10 to 100-fold reduction in computer tine is observed to arrive at single, correct solutions for various mixture components (rather than 5-20 possible solutions limited by the conventional mass spectrum alone). These results are reported in detail in the description on analysis of the estrogen mixtures (Smith, et.al., 1973 F-IC (submitted) ). Metastable ions are those which are formed by fragmentation processes occurring during the flight of an ion after formation and acceleration. These fragmentation processes may occur at any point along the flight path of ions through the maSs spectrometer. Because of the complex behavior of metastable ions formed in magnetic or electric fields, they are uSually studied in field-free regions. A conventional double focussing mass spectrometer possesses two field-free regions where metastable ions may be studied. one region lies between the electric sector and the Magnetic sector. This region can be used to study so-called “normal” metastable ions, i.e., those metastable ions which are observed superimposed on the peaks in the conventional mass Spectrum and which follow the relationship: observed mass of metastable ion = (mass of daughter) **2 /(mass gf parent). The other field-free region lies between the ion source and the electric sector. Metastable ions formed in this region can be examined by de-tuning one analyzer of the instrument (defocussing). This procedure allows establishment of Specific relationships between ions involved in a setastable decomposition so that the parent ion.and its decomposition product, can both be identified. This technique has led to much more useful information for the Heuristic DENDRAL program, as illustrated earlier in this section. II) Low Ionizing Voltage (L¥) Data. The key to successful Operation of the Heuristic DENDRAL prograg is correct inference of the molecular ion(s) and solecalar formula (e) in a given mass spectrum. In the past, metastable ion data were used to assist the program in correct identification of molecular ions. This procedure has now been supplemented, making the program cognizant of LY data. At lower ionizing volatges, molecular ions are formed with lesser amounts of excess internal energy. Most classes of molecules (those that display significant molecular ions) can be analyzed at a sufficiently low ionizing voltage such that only molecular ions are observed, as the internal energy is not sufficient to allow fragmentation. This technigue was used extensively in the analysis of estrogen mixtures and the resulting data Slaplify the program's task of determining molecular ions. IIL) Isotopic Labeling. We have previously described how isotopic labeling of labile hydrogens with deuterius aids analysis. For example, the last phase of the analysis of spectra of complex aolecules involves several "chemical" checks on the validity of proposed structures. The knowledge of the number of hydroxyl groups can be a powerful filter to reject certain candidate structures (Smith, eteal., 1972). There are many qther kinds of data available to chemists engaged in structure elucidation. The details of cheaical isolation and derivitization procedures May reguire that only certain types of functional groups are plausible. Spectroscopic data from other techniques {(e.g., proton or C13 NMR, IR, UV) may be available for a particular unknown. We have designed the Heuristic DENDRAL program for complex molecules with these additional data in sind. Specific Pu plans for implementation of these data as constraints on Heuristic DENDRAL are described in the Plans section below. Certain chemical information, for example, the knowledge that aromatic hydroxy functionalities have been methylated, can already be included as a constraint. D) Extension of the Predictor Programs The function of the Predictor in Heuristic DENDRAL has been to evaluate candidate solutions (structures) by prediction of their mass spectra, based on empirical fragmentation cules, and comparison of predicted versus observed spectra. This has been extended to high resolution mass spectra of complex molecules. Performance has been tested on estpogenic steroids and steroidal sapogenins. There are other aspects of prediction of behavior that we have incorporated and plan to incorporate in the Predictor. We can now predict a mininmua series of getastable defocussing experiments necessary to differentiate among candidate structures resulting froa analysis of a amass Spectrum. Other efforts are discussed in the Plans section, below. This approach amounts to design of optiaua experimental strategies to effect a solution or asminisize ambiguities. We have begun to explore ways in which to predict the aass Spectral behavior of molecules without the need to resort to the classical method of determining many mass spectra followed by empirical generalization. Dr. Gilda Loew has been investigating extended Huckel molecular orbital theory in an attempt at qualitative prediction of bond strength Initial efforts on estrone will shortly appear describing these results (G. Loew, et.al., 1973). Briefly, calculated net atomic charges appear to have little bearing on subseguent fragmentation of the molecule. Bond densities (which are related to bond strengths), however, provide some indication of which bonds are likely to undergo scission in the tirst step of a fragaentation process. PLANS? AS in the previous section, research plans are keyed to the objectives A~D. A) Extension of Heuristic DENDRAL I) We will continue use of the present prograa in collaborative studies with Prof. Adlercreutz concerning estrogenic steroids from, e.g., pregnancy urines. Work to date has inspired a synthetic program at Stanford Universty to verify conclusions of the program with regard to new estrogen netabolites, The planning program will be used extensively in analysis of the synthetic products also. AS the capability for analysis of the mass spctra of other classes of steroids is developed, we hope to extend this collaboration. II) We feel we have achieved a high level of compound-class independence in our present program. AS more classes are L212 analyzed we expect that further "cleanup" may be necessary, but easy to carry out. ITIt) We are presently accumulating a large number of high resolution mass spectra of pregnanes and androstanes. For example, the first step away from estrogen analysis was initially going to be to the analysis of pregnanes, another biologically important class ot steroids. 84. The table is arranged so as to illustrate its use in a fast computer program. A linear array with 138 cells, indexed as shown, has entries that never slip more than one position away from the value of the index. The composition values can therefore be accessed by direct lookup, obviating a table search. A card deck version of the table is available on request from the author. This compilation is a greatly shortened form of some tables that were published some time ago.? This work has been supported in part by the Advanced Re- search Projects Agency (contract SD-183), the National Aero- nautics and Space Administration (grant NGR-05-020-004), and the National Institutes of Health (grant GM-00612-01). ! Beynon, J. H., anp Wiuurams A. E., ‘“Mass and Abundance Tables for use in Mass Spectrometry,” Elsevier, Amsterdam, 1963. 2 LEDERBERG, J., ‘Computation of Molecular Formulas for Mass Spectrometry,” Holden-Day, San Francisco, 1964. Table of Mass Fractions for all Combinations® of H, N, O (H < 10N S 60 < 11) Index ms X 106 H N oO =C Index mp X10 H N Oo =C Index m; X 105 H N Oo = —~49 — 49787 0 2 11 17 0 0 0 0 0 0 31 31537 10 3 11 9 —45 — 45765 0 0 9 12 1 510 2 5 6 14 32 32363 4 2 1 14 —38 — 38554 0 4 10 18 2 1853 4 2 7 12 34 34216 8& 4 8 16 —37 —37211 2 1 11 16 4 4532 2 3 4 9 35 35559 10 1 9 14 —34 — 34532 0 2 8 13 5 5875 4 0 5 7 36 36895 6 5 5 13 —30 ~ 30510 0 0 6 8 6 6385 6 5 11 21 38 38238 8 2 6 11 —25 — 25978 2 3 10 17 7 7211 0 4 1 6 40 40917 6 3 3 8 ~ 24 — 24635 4 0 11 15 8 8554 2 1 2 4 41 42260 8 0 4 6 ~ 23 — 23299 0 4 7 14 10 10407 6 3 9 16 42 42770 10 5 10 20 —21 ~ 21956 2 1 8 12 11 11750 8 Oo 10 4 43 43596 4 4 0 5 —19 — 19277 0 2 5 9 13 13086 4 4 6 13 44 44939 6 1 1 3 -15 — 15255 0 0 3 4 14 14429 6 1 7 Il 46 46792 10 3 8 15 ~14 — 14745 2 5 9 16 15 15765 2 5 3 10 49 49471 8 4 5 12 —13 — 13402 4 2 10 18 17 17108 4 2 4 8 50 50814 10 1 6 10 —10 — 10723 2 3 7 13 18 18961 8 4 11 20 52 42150 6 5 2 9 +9 ~— 9380 4 0 8 li 19 19787 2 3 1 5 53 53493 8 2 3 7 —-8 — 8044 0 4 4 10 20 21130 4 0 2 3 56 56172 6 3 0 4 —~6 —6701 2 1 5 8 21 21640 6 5 8 17 57 57515 8 0 1 2 —4 — 4022 0 2 2 5 22 22983 8 2 9 16 58 58025 10 5 T 16 ~2 —2169 4 4 9 17 25 25662 6 3 6 12 62 62047 10 3 5 11 -1 — 826 6 1 10 15 27 27005 8 0 7 10 64 64726 8 4 2 8 28 28341 4 4 3 9 66 66069 10 1 3 6 29 29684 6 1 4 7 68 68748 8 2 0 3 30 31020 2 5 0 6 73 73280 10 5 4 12 77 77302 10 3 2 7 81 81324 10 1 0 2 88 88535 10 5 1 8 (-0.049 to —0.0008) (0 to 0.03) (0.03 to 0.088) * Arranged so that the index for each entry agrees with 1000 x my + 1.9, ; __ [Reprinted from Journal of Chemical Education, Vol. 49, Page 613, September, 1972.] Copyright 1972, by Division of Chemical Education, American Chemical Society, and reprinted by permission of the copyright owner cf. PART BCii): ANALYSIS OF THE CHEMICAL CONSTITUENTS OF BODY FLUIDS PART b-(2i) ANALYSIS OF “tHE CHEMICAL CONSTITUENTS OF BODY FLUIDS | OBJECTIVES: The overall objectives of this part of the ptoposal are to develop the uses of gas Chromatography (GC) and mass spectroietry (45), undec “intelligent" computer management, for the clinical screening, diagnosis, and study of errors ot metabolism. The efficacy of these analytical tools has teen demonstrated when applied to lamited populations of urine Samples in the research laboratory environment. we propose to enlarye the clinical investiyative applications of SC/“S technoloyy and to demonstrate its utility tor the diaynosis and screeniny ot disease states. Specitically we will apply our GcyMs analysis capabilities to larger and more diversified populations to establish better defined norms, deviations related to identifiable disease states, and control parameters required to remove ambiguities troe results. BACKGROUND AND PROGRESS: For some time we have focussed a substantial part of ouL eftort on exploiting the use of the mass Spectrometer as an analytical instrument for biochemical purposes. Uur central approach has been to intoyrate the mass spectrometer with the yas chromatograph on tae one hand and with “intelliyent" computer management on the other. Gas chromatography is a versatile aud broadly applicable method for the separation of biochemical specimens into a large number of distinct hut unnamed fractions. The mass spectrometer has unique power to analyze such fraction: and give information relevant to their molecular structure. whe conputer becomes indispensable for the overall Mahnayemont of the System and for the reduction and interpretation of the larje volume of data emanating from the analytical instruments. Cur effort in instrumentation, therefore, is an integral part of this research and comprises a good deal of computational software embracing both real time instrument and data Management as well aS artificial intelligence. It also requires considerable eftort in electronic and vacuum technoloyy for the instrumentation hardware, and a coherent system approach for the overall integration of these components. These aspects of the effort are described in section B(i) of this proposal. The voutine screening of normal and abnormal body metabolites, as well as adruys and their metabolites, ain husan body fluids (ret 1) is currently the object of several research programs. Various non-specific methods, including thin layer (rof 2, 3), ion exchange (ref 4, 6), liquid (ref 5), and gas chromatography {ref 7-10), are used primarily with the goal of separating a large number of unnamed constituent materials. when used in conjunction with mass Spectrometry, these methods become P27 -2- specific and provide a powerful means of positive identification of metabolites in human body fluids (ref 11-13). Of these techniques, yas chromatography is the most convenient to interface to the mass spectrometer because the carrier gas can easily be removed as the analysis proceeds on a continuous tlow. Based upon the references cited, aS well as our own on-going prograds, the ability of the Gcyms technique for the analysis of body fluids is well established. we have drawn upon the published literature in helping to design our experimental protocols. Standacd chemical procedures for extracting, derivatizing, and hydrolyzing urine and plasma are used for the GC/MS analysis (ref 13). These procedures permit separation of the following classes of substances: acids, phenols, amino acids, and carbohydrates. It is possible to detect free or conjugated compounds within these classes, The gas chromatogtaphic analysis of each class of compounds presents a metabolic protile. Abnormal profiles (containin, either excessively large peaks from one or nore components or peaks which do not correspond to metabolites usually encountered) are then assayed by mass spectrometry. The mass spectra recorded during the elution of each gas chiomatographic peak then serve to identify the constituents present in that peak. Most madical centers have access to amino acid analyzers in order to screen patients for metabolic abnormalities of the poincipal amino acids, but unless a special research interest exists, other errors of metabolism cannot eaSily be studied. At this institution the GC/MS system provides us the Opportunity tu detect a wide variety of errors which show accumulation of novel amino acids, fatty acids, and many other metabolites in urine, blood, and other bioloyical fluids and tissues. lirine is known to contain several hundred organic compounds. The separation (gas chromatography) and herce identification (MaSS Spectrometry) of these components would be an extremely difficult task. To simplity the separation problem the urine is chemically separated into four tractions as illustrated in the following diagram. URINE (pt = 1, internal standards added) ee me ee ne ee ee ee ae eee ee ee ether phase aqueous phase I | (free ucids) 00 -----+------- ---- -------- -- +--+. A \ \ i (carbohydrates) (amano acids) i Cc B i | tydrolysis i { | ether phase aqueous phase | (hydrolyzed acids) (amino acids) D E The experimental procedura used for working with a urtne sample is as follows. To an aliquot (2.5 ml.) of a Z4 hour urine Sample is added 6N hydrochloric acid until the ph is 4. Two internal standards, n-tetracosane and Z-amino octanoic acid are then added. xcther extraction isolates the tree acids (fraction a) which are then methylated and analyzed by yas chromatojraphy-mass Spectrometcy. An aliquot of the ayueous phase (0.5 ml.) is concentrated to dryness, reacted with n-butanolyhydrochloric acid followed by methylene chloride containing trifluoroacetic anhydride. This procedure derivatizes any amino acids (or water soluble amines) which are then sacjected to GC/MS analysis (fraction 8). Another aliquot (U.5 ml) of the aqueous phase can be derivatized for the detection of carbohydrates (Fraction C). Concentrated hydrochloric acid (0.15 ml) is added to the urine (1.5 al) atter ether extraction and the mixture hydrolyzed for 4 hours under reflux. zther extraction separates the hydrolyzed acid fraction (D) which is then methylated “and analyzed by GC/MS. A portion of the agueous phase (0.5 ml) trom hydrolysis ot the urine is concentrated to dryness and derivatized and analyzed for amino acids {Fraction &£). Asi an example of the application of these methods to hiomedical problems, we can usa some recent Studies we have undertaken on the urine vf a patient sufferiny from acute lymphoblastic leukemia. The gas chromatographic profile (kiyure 1) of the amino acid fraction of his urine showed the presence of an abnormal peak (A). The sass spectra (Figure 2) recorded during the lifetime of this chromatographic peak identified this component as beta-amino isobutyric acid from a comparison with a literature (ref. 19) spectrum of authentic material. Quantitation Showed that this patient was excreting 1.2 grams per day ot beta-amino isobutyric acid. After medical treatment this metabolite was no longer detected in the patient's urine thereby raising the question of whether beta-amino isobutyric acid can ie used aS a metabolic signature for the recognition of lymphoblastic leukemia and for the status of the disease in the course of the treatment cycle. Beta-amino Lsobutyric acid has been observed in the urine of 5 patients suffering frow leukemia and in all instances it disappeared immediately following uruy therapy. We are continuing our Study of this relationship in view of the recognized excretion of elevated apounts ot beta-amino isobutyric acid as the result of a genetic trait. For instance Harris et al. (ref. 14) observed daily urinary excretions of 70-300 my of beta-amino isobutyric acid and noted that histories of high excretion levels tended tu exist in patticular families. At; a second example of the application of GC/NS to biomedical problems we can cite preliminary studies on approximately 80 urine samples from a total of 11 premature or "small for gestational age" infants. This ploject was undertuken to investigate the phenomenon of late metabolic acidosis. ‘this condition 1s characterised by low blood pH levels, poor weight jain, and, as distinct from respiratory acidosis, onset after the second day of life. Its incidence is higher in infants whose birthweight is less than 1750g (one Study shows 92% incidence for these children) than in intants with birthweight greater than 1750g (26%). Of the 11 patients studied we were able tu observe 6 Closely and continuously for periods ranging from 6 to 8 weeks from day 3 of life. Three of these infants had birthweights below 10007 ana the other three were born weighing less than 150Ug. VE the 6, five showed symptoms COrLesponding to late metabolic acidusis and the other showed normal and even development. Ihe tive intants showing the acidosis all excreted very lary? amounts of p-hydroxyphenyllactic acid together with smaller amounts ot p~hydroxypheanylpyruvic acid ana p~hydroxyphenylacetic acid. After reaching a peak, the presence of these compounds in the urine jcadually diminished and almost completely disappeared at the time blood pH and weight gain had returned to normal. fhe infant who did not show symptoms of acidusis only excreted minute amounts of tiese compounds duriny the period of observation. The occurrence of large amounts of these compounds in the urine indicates a temporary defect in pheny lalanine-t yrosine metabolism and dietary fuctors such as protein and vitamin intake can Le shown to affect tie incidence and the severity of the condition. [t is hoped that further studies will result ina clearer picture of relationships between the condition and diet and hence lead to a reduction in its occurrence In the course of these studies, we have recognized two areas where computer analysis ot the data is important in order to handle the volume of data involved and tu standardize the analyses performed. At present these operations, GC profile analysis and mass spectrum identification, are largely manual. In the case of GC profile analysis, approximately 40 peaks for eaci profile must be analyzed in terms of their positions, sizes, etc. relative to other peaks in the profile and insttument pacvameters to evaluate the presence or absence of abnormalities. For cach abnormal peak, a number ot mass spectra (5 to 10), each containing Lon abundance measurements at approximately 50U masses, must be compared against catalogued known materials tor identification. Lf the material is not in the Catalog, the mass Spectrum must be interpreted from basic principles, using high resolution spectrometry and other data sources as appropriate. These are very tedious operations requiring automation for even the proposed limited screening volume. the developmental aspects of these computer-related portions of the research plrogtam are discussed in the other sections of this proposal, FUTURE PLANS In the next grant period we plan to extend our efforts in applying GC/MS techniques to clinical problems both in terns of defining norms and in terms of studying identifiable disease States in collaboration with clinical investigators. The most appropriate target material tor this developmental effort is the metabolic output of NORMAL subjects under controlled conditions of diet and other intakes. The eventual application of this kind of analytical methodology to the diagnosis of disease obviously depends on the establishment of normal baselines, and much experience already tells us how important the influence of nutrient and medication intake can ba in intluencing the composition of urine, body fluids, and breath. Among the most atttractive subjects for such a baseline investigation are newborn infants already under close scrutiny in the Premature Research Center and the Clinical Research Center of the Department of Pediatrics at this institution. Such patients are currently, for valid medical reasons, under a deyree of dietary control ditficult to match under any other circumstance. “any other features of their physiological congition are being carefully monitored for other purposes as well. fhe examination of their urine and other effluents is therefore accompanied by the most economical context of other information and requires the least disturbance of these subjects. Two obvious factors which could profoundly influence the excretion of metabolites detected by GC/MS are maturity and diet. We have alveady initiated a program for serial screening ot urinary metabolite excretion in premature infants of various gestational ages and determination of changes in the pattern ot excretion of various metabolit2s as a tunction of aye following birth. fhese studies are being performed on intants admitted to ~-6- the Center for Premature Infants and the [Intensive Care Nursery at Stanford, a source of some 500 premature infants per year, In addition, in conjunction with an independent study on the effects of both quality and quantity of oral protein intake on the incidence and pathogenesis of late metabolic acidosis of prematucity, we plan to measure the urinary excretion patterns of vactious metabolites and thereby pattially assess the effect of diet on this screening method. We shall use the analyses on blood and urine specimens trom normal individuals in the final development of Tapid, automated identification of compounds described by ass spectromotry. ihe computer will be used to match an unknown muss Spectrum with reference spectra contained in computer files. Programs are also being developed which will provide the Strateyy for the computer to interpret an unknown mass spectrum (not contained in the library) and directly identify the compound (see Parts A and Cc). Litited libraries exist for urine and plasta GC/S5S analyses and will require progressive compilation (assisted by the vENDRAL interpretation programs) as our clinical Satpling proceeds. This will in tutn speed the throughput of the system by allowing the Simple identification otf materials by computer library search procedures. this library will tbe shared freely with other investigatocs. Given our ability to identify various constituents of urine and plasma and to understand normal variation, we shall apply the GC/MS system to pathology, making use of patients with already identiried metabolic defects for control purposes. The main application will, of course, be diagnostic and patients with suggestive clinical manitestations, such as psychomotor tetardation and progressive neurologic disease, as well as suggestive pedigrees (e.y. affected offspriny of consanguineous parents or gultiplex sibships) will be investigated. fhese patients are seen relatively frequently at any university hospital, and their presence in the various in-patient and out-patient services of the Stanford Department of vediatrics 1s well documented. The GC/MS system will be helpful in diagnosing not only errors of amino acid metabolism, tut also Many other metabolic aisorders, some of which are lactic acidemia (ref (15), vefsum's disease (a defect in the oxyyenation of phytanic acid {cef 16)), methylmalonic acidemia (ret 17) and orotic aciduria (ref 16). we also recognize the potential ot this methodology to define new errors of metabolism, We will collaborate with Protessor Howard Cann of the Department ot Pediatrics and derive much of the clinically Significant material tor analysis from patients in the Premature Research Center and the Clinical kesearch Center of the Department of Pediatrics and the Stanford University Children's Hospital. Analyses will te performed on existing GC and MS equipment in the Nepartments of Genetics and Chemistry. REFERENCES 1) Schwartz, M.K., "Biochemical analysis," Anal. Chem., Hu, De QR, (1472). 2) Heathcote, J.G., Davies, D.w., and Haworth, Ce, “Phe Effect of besaltiny on the Determination of amino Acids in Urine by Thin Layec Chromatography." Clin. Chin. Acta, 32, EL. 457 (1971). 3) Davidow, B., Petri, NeLe, and Quame, B., “A Thin Layer Chromatographic Screening Procedure for betecting Druy Abuse," Amer. J. Clin. Pathol., 54, p 714, (1968). 4) kftronu, K. and wolf, b.b., “Accelerated single-coluwn Procedure for Automated Measurement of Amino Acids in Physiological Fluids," Clin. chem., 16, p tel, (1972). 5) Purtis, C.A., "The Separation of the Ultraviolet-absorbing constituents of Urine by High Pressure Liquid Chromatography," J. Chromatoy., 52, p 97, (1970). 6) Wilson-Pitt, W., scott, C.2., Johnson, W.F., and Jones, u., "A Bench-top, Automated, High-resolution Analyzer for Ultraviolet Absorbing Constituents of Body Fluids," Clin. Chem., 16, p. 657 (1970). 7) Dalgliesh, C.E., Horning, &.C., Horniny, &.G., Knose, Kobe, and Yaryger, Ke, "A Gas-uiquid Chromatographic Procedure for Separating a Wide Range of Metabolites Occurring in Urine or Tissue Extracts," Biochem. J., lull, p. 792 (1966). 8) YTeranishi, R., Men, f.R., Robinson, A.t., Cary, be, atid Pauling, Le, "Gas Chromatography of Volatiles from breath and Urine," Anal. Chem., 44, pe 168, (1972). 9) Pauling, L., Robinson, A.B, feranishi, R., and Cary, ?., “Quantitative Analysis of Urine Vapor and Breath by Gas-ligquid Partition Chromatography," Proc. Nat. Acad. Sci. USA, 68, p. 2374, (1971). 10) dZlatkis, A. and Liebich, H.M., "Profile of Volatile Metabolites in Human Urine,” Clin. Chem., 17, 592 (1971). 1t) Mrochak, J.E., Putts, W.C., dainey, W.T., and Burtis, C.A., “Separation and Identification ot Urinary Constituents by Use of Multiple-analytical fecaniques," Clin. Chem., 17, pele (971). 12) Horning, E.C. and Horning, &.G., “Human Metabolic Profiles Obtained by GC and GCyMs," J. Chromatog. sci., 9, Pe 129, (1971) 13) Jellaw, E., Stokke, O., and wldjarn, Le, “Combined Use of Gas Chromatography, dass spectroaetry, and Computer in Diagnosis F-3S -§- and Studies of Metabolic Disorders," Clin. Chen., is, p. 8OL (1972). 14) Harcis, H., "family Studies on the Urinary Excretion of Beta-Amino Isobutyric Acid," Ann, Eugenics, Vol. 14, Page 43, (1953). 15) Haworth, J.C., Ford, J.L., and Youncszai, M.K., “Familial Chronic Acidosis due to an Error in Lactate and Pyruvate Metabolism," Canad. Med. ASS. Je, 79, pe 773 (19607). 16) Herndon, J.H., Steinbery, b., and Ulhendort, H.W., “Kefsum's Disease: Netective Oxidation of vhytanic Acid in Tissue Calturces Derived from Homozyjotes and Heterozyyotes," New England J. of Med., 281, -. 1023, (1969). 17) Morrow, Ge, Schwartz, R. H., Hallock, J.A., and Barness, L.A., “Prenatal Detection of Methylmalonic Acidemia," J. Pediatrics, 77, p. 126, (1970). 18) Fallon, J.H., Smith, L.H., Graham, J.H., and Burnett, C.H., "A Genetic Study of Hereditary Urotic Aciduria," New england J. of Med., 27u, pe d7e, (1964). 19) Lawless, J.G. and Chadha, M.S., “iffass Spectral analysis of C(3) and C(4) Aliphatic Amino Acid Derivatives," Anal. Biocienm., Wu, pe 473, (1971). 20) Keynolds, W.E., Racon, V.A., Bridyes, J.C., Copurn, T.c., Halpern, #., Lederbery, J., L2vinthal, E.C., steed, &., and Tucker, &.B., “A Computer Operated Mass Spectrometer System," Anal. Chem, 42, pe Vlec, (1970). ee jn vices weep! | Po _ ate Ss FIGURE 1 Gas Chromatogram of the Amino Acid Fraction of Urine 188 whl | an 88 68 auth, ter Ji Ww 4Q : 6? | Sat It 20 4 56 2G | | 2 1" ma (la 2ou | 4. wr 1 {3 ‘ hy ‘ 1 ‘ | 3 Peet wetter serfee per \" ep rere Tt Hee Tee eae TNT ep port ET ai) 68 6B 188 128 142 168 182 200 220 249 260 FIGURE 2 Mass Spectrum of Beta-Amino Isobutyric Acid PART C: EXTENSION OF THE THEORY OF MASS SPECTROMETRY BY COMPUTER PART C. Extending the Theory of Mass Spectrometry by a Computer (Meta~DENDRAL) OBJECTIVES: The Heuristic DENDRAL performance program described in Part A is an automated hypothesis formation program which sodels "routine", day-to-day work in science. In particular, it models the inferential procedures of scientists identifying components, such as those found in human body fluids. The power of this program clearly lies in its knowledge about Various Classes ot compounds normally tound in body fluids, which knowledge allows identification of the compounds. The Meta-DENDRAL program described in this part is a critical adjunct to the performance program because it is designed to supply the knowledge which the performance program uses. Theory formation is essential in order to carry out the routine analyses - either by hand or by computer. However, the staggering amount ot effort required to build a working theory (even for a Single class of compounds) holds back the routine analyses. The goal of the Meta-DENDRAL program is to fora working theories automatically (from collections of experimental data) and thus reduce the human effort required at this stage. By Speeding up the time between collecting data for a Class of compouad® and understanding the rules underlying the data, the Meta~DENDRAL program will thus provide an improvement in the development of diagnostic procedures. Theory formation in science is both an intriguing problen for artificial intelligence research and a problem area in which scientists can benefit greatly from any help the computer can give. While the ill-structured nature of the theory formation problem makes it more a research task than an application, we have already provided computer prograas which are of definite help to the theory- forming scientist. Mass spectrometry is the task domain tor the theory formation program as it is for the Heuristic DENDRAL program. It is a natural choice for us because we have developed a large number of computer programs for manipulating molecular structures and mass spectra in the course of Heuristic DENDRAL research and because of the interest in mass Spectrometry among collaborative researchers already associated with the project. This is also a good task area because it is difficult, but not impossible, for human scientists to develop fraymentation rules to explain the mass spectrometric behavior of a class of molecules, Mass spectrometry has not been completely formalized, and there still remain gaps in the theory. Understanding theory formation enough to automate Substantial parts of it will benefit all of the biomedical Sciences. More directly, building a computer program which forms a theory of mass spectrometry will greatly enhance the power of mass spectrometry as a diagnostic instrument. FOXe Detailed accounts of this research are available in the DENDRAL Project annual report to the National Institutes of Health, in several research papers already published and in manusctipts submitted for publication. PROGRESS: In the period covered by the initial NIH grant the Meta-DENDRAL program has moved from a set of ideas to a set of working computer programs. The first three segments of Meta~DENDRAL have been plogrammed and can be used with new experimental data. These segments are first summarized and then described in more detail in subsequent sections. We described the initial design of the sMeta-DENDRAL program in a paper presented to the 2nd International Joint Conference on Actificial Intelligence (London, August, 1971). And further design details and partial implementation of programs were described in a paper presented at the 7th Machine Intelligence Workshop (Machine Intelligence 7, B. Meltzer & De Michie, eds., 1972). Summary ot Segment 1 The data interpretation and Summary program (INTSUM) defines the space of mass spectrometric processes, interprets all the data in terms ot these processes, and summarizes thea process by process. This program is capable of a much nore thorough analysis of the data than a human can perform. Summacy of Segment 2 The rule formation proyram starts with the interpreted and summarized results of the data. It searches the set of processes for those that meet the criteria for cCules, and attempts to resolve ambiguities when several processes explain many of the same data points. The resulting rules are characteristic processes for the whole class of molecules. Summary of Segment 3 The class separation program is an extension of the Sinuple rule formation program just mentioned. Because the initial set of molecules may not all behave alike in the mass Spectrometer, it is necessary to separate the important Subclasses and formulate characteristic rules for each subclass. SEGMENT 1. The initial segment of the theory formation program is data interpretation. after the experimental data have been collected for a large number of compounds, the program re-interprets all the data points in terms of its internal model of the experimental instrument. This part of the program has already proved useful to chemists studying the mass spectrometry of new classes of compounds. It has been described in a paper recently submitted for publication (Applications of Artificial Intelligence for Chemical Inference X. INTSUM. A Data Interpretation Program as Applied to the Collected Mass Spectra of Estroyenic Steroids, submitted to Tetrahedron). The computer program for data interpretation and summary has been well developed. While it is never safe to call a program "finished", this program has reached the staye where we have turned it over to the chemists who want to look at explanatory mechanisms for the mass spectra of many compounds. Ordinarily, this is such a tedious task that chemists are forced to limit their analysis to a very few out of a total space of potentially interesting mechanisms. The computer program, on the other hand, systematically explores the space of possible mechanisms and collects evidence for each, This program is described in the Machine Intelligence 7 paper, and the results obtained by running it with many estroyen Spectra are discussed in the manuscript submitted to Tetrahedron. Mr. William C. white has been largely responsible for coding the program in LISP. The progran runs in the overnight LISP system at the Medical School's ACME facility, and on the Stanford Computation Center IBM 360/67. It is currently being used by Dr. Steen Hammerua, a post-doctoral fellow in chemistry from the University of Copenhagen, to summarize the fragmentations found in the spectra of substituted progesterones, and by Dr. Dennis Smith to interpret data from other classes of steroids. SEGMENT 2. The second segment of Meta-DENDRAL produces reasonable rules of mass spectrometry. The cule formation segnent starts with the interpreted and summarized data from the first segment. [It looks for the processes which are most frequent, which explain highly significant data points, and which are least ambiguous with other processes. Atter applying these criteria, it selects a set of processes which appear to be characteristic of the whole set of molecules initially given. Planning before rule tormation is necessary because there is so much intormation in the summary of possible fragmentations found in the data. It is desirable to collect all the information to avoid missing unanticipated mechanisms which occur frequently throughout the compounds in. the data. But even the summary of the mechanisms is voluminous enough to obscure the "obvious" rules waiting to be found. Iu a planning program implemented by Mr. Steven Reiss, the computec peruses the summary looking for mechanisms with "strong enough" evidence to call them first-order rules of mass spectrometry. Out criteria for strong evidence may well change as we gain more experience. For the moment, the program looks for mechanisms which (a) appear in almost all the compounds (80%) and {(b) have no viable alternatives (where "viable alternatives" are those alternative explanations which are frequently occurring and cannot be distinguished unambiguous1y). The output of this program, even though crude in many seases, is useful to chemists who first want to see the highly reliable, unambiguous rules which can be foraulated. If there are none, ot course, there is little point in pressing ahead blindly. This is an indication that some modifications need to be made, for example, splitting up the original set of compounds into sore homogeneous subgroups. On the other hand, if some likely rules can be found, these will serve as "anchor points" for resolving ambiguities with other sets of mechanisms and also serve as a "core" of rules to be extended and modified in the course of detailed rule formation. SEGMENT 3. As mentioned above, class separation is important because the initial collection of compounds may not be known to behave alike in the instrument. The rule formation program gust be prepared to retract its asSuaption o£ homogeneity. Mr. Steven Reiss, working with Dr. Buchanan, has written a first extension of the rule formation program which allows class separation on the basis of characteristic rules found for the subclasses. A paper describing segments 2 and 3 - rule formation with Subclass separation - thas been submitted to the 3rd International Joint Conference on Artificial Intelligence. The computer proyrams produced to date have already proved useful for helping to formulate mass Spectrometry theory for classes of biologically relevant molecules. Chemists have used these programs as tools for rule formation. They have examined the estrogenic steroids this way, including separate studies on some eyuilenins, acetates and benzoates. Also, they have used the program to interpret data fron several classes of pregnanes. Planss: In the coming period we propose to focus on three aspects of theory tormation. We plan to {1) extend the Capabilities of the programs, (2) make our rule formation programs more usable by chemists, and (3) continue our exploration of the more theoretical aspects of rule formation. 1. We anticipate new diftficulties as the classes of molecules under study become more complex, either with respect to Structural features or mass spectrometric behavior. Although we have made the programs flexible, extending the work just to new sets of data will undoubtedly introduce new problems. Now that the usefulness of the prograas has been demonstrated, we propose to couple the theory formation program more closely to data of more direct clinical relevance. For example, the mass spectrometry of amino acids and the aromatic acids frequently found in urine needs to be better understood before automatic analysis of the components of (the acid and neutral fractions of) urine is successful. Parts A and B of this proposal, in other words, can both be helped by the continuation of Part Cc. The program is now limited to forming cules which are more descriptive of the sample than explanatory. We are currently working on ways of generalizing the descriptive cules so that they are more truly general. Drs. sridnaran and Buchanan have started experimenting with computer programs which generalize the rules in various ways. Mc. Carl Farrell is currently working on a computer program for his Ph.D. thesis which allows systematic exploration of VariouS methods of generalizing on rules. His WOrk investigates the efficacy ot different control structures as well as different inductive rules. 2. The programs are now used by chemists, but not without a fair amount of help from the programming staff. We aust overcome some of the barriers to facile use before the programs can be counted as successful. For example, putting the data in the correct format can be made easier, aS Can defining constraints on the search space and modifying parameter values. The programs do not now require the chemist to know LISP. However, we propose to develop easier access to control of the programs through careful design of the user interface. Depending on hardware limitations, we would also like to provide a time-shared, graphics- oriented interface. 3. The descriptive form of rules agentioned above May be inherent in the conceptual framework we have chosen for the rule formation program. The program uses a "ball and stick" model of molecular structures, so it is no Surprise that Situations and actions in rules are simply described. We wish to explore more sophisticated models of mass SpectcCometry with the hope of discovering how a progran could search the space of possible sodels during rule formation. This is still a very challenging problem. We have so far concentrated on more practical aspects of theory formation - 1.e., producing results of immediate utility. But we teel strongly that we must grapple with the outer teaches of the problem in order to arrive at meaningful solutions. PUBLICATIONS ~- PART C B.G. Buchanan, E.A. Feigenbaua, Je Lederberg, "A Heuristic Programming Study of Theory Formation in Science", in Proceedings of Second International Joint Conference on Artificial Intelligence, Imperial College, London (September, 1971). (Also Stanford Artificial Intelligence Project Memo No. 145, Computer Science Dept. Report CS-221) B.G. Buchanan, E.A. Feigenbaum, and N.S. Sridharaag, "Heuristic Theory Formation: Data Interpretation and Rule Formation". In Machine Intelligence 7, Edinburgh University Press (1972). B.G. Buchanan and WN. Sctidharan, "Rule Formation on Non-Hoaogeneous Classes of Objects", submitted for presentation at the Third International Joint Conference on Artificial Intelligence (Stantord, August, 1973). PART D: APPLICATIONS OF CARBON(13) NUCLEAR MAGNETIC RESONANCE SPECTROMETRY TO ASSIST IN CHEMICAL STRUCTURE DETERMINATION PART D. CARBON-13 NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY The goal of our Heuristic DENDRAL research is to develop Capid, accurate and flexible computer techniques for identifying unknown steroids and other biologically important compounds from spectroscopic data. We have made Significant progress toward this goal: Our systen is currently capable of correctly analyzing high-resolution maSS spectra of estrogenic steroids and mixtures thereof. AS we extend our methods to the more complex probleas presented by other steroid classes, and eventually by other types of biologically important molecules, we will find it necessary to have available sources of structural information other than mass spectroscopy. Carbon-13 nuclear magnetic resonance (CMR) spectroscopy is an ideal candidate. Basically, the CMR experiment measures the extent to which each carbon nucleus in the sample molecule is shielded fron an applied magnetic field. This Shielding, of chemical shift, is caused by the distribution of electrons around the nucleus, and is determined by the carbon's hybridization and local chemical environment. Other investigators have determined that the shift of a carbon is strongly dependent upon the nature and placement of substituents at nearby centers, and that to a first approximation these substituent effects are additive. Thus, the CMR spectrum of a compound contains information which rather straightforwardly can be related to the possible local environments of each carbon. The structural information provided by CMBR data compliments that from mass spectroscopy, and there is relatively little redundancy between the two methods. Data from the latter represent molecular fragmentations, which take flace most readily neac functional groups. Thus, mass spectroscopy frequeatly gives structural information about the environments of such groups. In CMB spectroscopy, on the other hand, the chemical shifts of carbons in large alkyl moieties, far removed from functionality, are the best understood and _ the most predictable. Further, the Owe! of & fragmentation of large molecules such as steroids can show the general pattern of substitution in the molecule, while CMR shifts are sensitive to specific local patterns. Because the two methods “mesh" so nicely, we see the development of analytic CMR techniques as an extremely fruitful field of research. Our eventual ain is to completely define the structures of unknown compounds using only these two sources of information. We are well equipped to study this field. Ia our Chemistry department, we have a Varian XL-100 (Fourier-transfora) nuclear magnetic resonance spectrometer, one of the sost sensitive and flexible instruments currently available for CMR work. We have competent investigators in our Chemistry and Computer Science departments who are interested in, and in fact currently working on, the project. Finally, we have had considerable experience with computerized structure analysis, and much of what we have learned can be applied to the CMR problen. We have already begun investigating the use of CMR data in automated structure analysis, with our initial study focussed upon the acyclic amines. The analysis of low-resolution mass spectra of large amines is not capable of discerning the structures of long alkyl chains, so we felt that this class of molecules would provide a good test of CMR methods. Ms. Hanne Eggert of our group has obtained the CMR Spectra of over 100 acyclic amines, and has derived ah accurate set of predictive rules relating structure to chemical shifts. Dr. Raymond E. Carhart has used these rules to develop a computerized approach to the identification of amine structures from observed CMBR spectra (See attached manuscript). The progran, entitled AMINE, has proven to be extremely selective: The analysis of the CMR spectrum of trioctyl amine, tor example, yields only seven possible structures, though the molecule has over 700 million structural isomers. [In contrast, the analysis of the low-resolution mass spectrum of triheptyl amine gives nearly 2000 solutions out of a possible 38 million isomers. These results illustrate the tremendous amount of structural information which CMR spectroscopy can provide. This source of information has, in general, been ignored in steroid-identification research, primarily because large amounts of sample (50 milligrams or more for steroids) are needed to obtain reliable CMR spectra. However, CMR spectroscopy is still a relatively new field, and the sensitivity of current instruments is far from the threshold which new technologies can provide. We expect the minimua Sample size to drop to the sub-milligram level in the future, and with such sensitivity, the CMR spectrometer could be a powerful tool in biochemical and smedical research. If this tool is to be utilized to its fullest extent, it is important that we begin now to develop the concepts and techniques needed in the interpretation of CMR data. We propose, then, to study various classes of steroids in a manner analogous to the amine study, with the goal of developing a program which can! ‘reason out? steroid 4 j 2 as “Sf Sturctures from CMR data, perhaps in combination with mass-spectral data. Ms. Eggert has already collected CMR data on a variety of keto-substituted androstanes and Cholestanes to assess the effect of the carbonyl group on the chemical shifts of the steroid-skeleton carbons, and has, in the process, uncovered some aistaken CMR shift assignments published in the literature. we will study a variety of functional groups in this way, deriving general rules for predicting the spectra of more complex steroids. As these rules emerge, we will couple them with tae computerized heuristic~search and structure-generation techniyues which we have developed in our previous mass~- and CMR-spectroscopy research. PUBLICATIONS -- PART D RoE. Carhart and C. Djerassi, J. CHEM. SOc. (PERKIN II), submitted for publication (see attached preprint). He Eggert and C. Djerassi, J. Amer. Chem. soc., in press. Proofs (if required) by air mail to Professor Carl Djerassi Department of Chemistry Stanford University Stanford, California 94305 Applications of Artificial Intelligence for Chemical Inference. xr.) Analysis of Carbon-13 NMR Data for Structure Elucidation of Acyclic Amines Raymond E. Carhart* and Carl Djerassi, Departments of Computer Science and Chemistry, Stanford University, Stanford, California, 94305, U. S. A. This paper describes a computer program, entitled AMINE, which uses a set of predictive rules to deduce the structures of acyclic amines from their empirical formulae and Carbon-13 NMR (CMR) spectra. The results, summarized in Tables 2-5, of testing the program on 102 amines indicate that AMINE is quite accurate and selective, even for large amines with many millions of structural isomers, and demonstrate that the computerized analysis of CMR data can be a powerful analytical tool. The logical structure of the program is outlined here, including a section on the general problem of spectrum matching. Generalizations of the methods used by AMINE are suggested. I. INTRODUCTION In recent years, there has been a substantial amount of research directed toward the computerized identification of molecular structure 3-5 NMR, 726»? 7 3,4 from mass-spectroscopic and infra-red’ data. Our Heuristic DENDRAL program, which relies primarily upon mass-spectral -2- data, has been shown to be quite accurate for certain classes of Saturated, acyclic, monofunctional compounds, and more recently, the 3b There are methods have been extended to the estrogenic steroids. limitations to the information content of mass-spectral data, however, particularly when compounds are considered which have long, perhaps highly branched alkyl chains. An analysis of the mass spectrum of triheptylamine, for example, yields about 2000 solution structures,“ and although this is only a small fraction of the roughly 40 million (non-stereochemical) isomers of CopHggns it is still an impractically large number. The problem is that alkyl moieties do not give characteristic fragmentation patterns, and in fact, most spectroscopic methods are relatively insensitive to their structure. However, recent studies indicate that C-13 nuclear magnetic resonance (CMR) spectroscopy” is an exception. For several classes of compounds? rules have been obtained which allow one to predict the CMR spectrum of a substance from its molecular structure, and in all cases, the rules indicate that the chemical shift of any Carbon, even one in a large alkyl chain-end, depends heavily upon branching at nearby centers. Thus, it appears that CMR spectroscopy, either alone or in combination with other methods, could be a powerful tool in the computerized analysis of molecular structure. This paper outlines the methods py which such an analysis may be carried out for the acyclic amines, and 10 describes a FORTRAN IV computer program,’~ entitled AMINE, in which these methods are implemented. -3- This class of compounds was chosen for two reasons. First, the recent work of Eggert and Djerassi 22 has yielded a detailed set of predictive rules for the acyclic amines. Secondly, for a given number of Carbon atoms, a saturated, acyclic amine has decidedly more Structural possibilities than most other simple types of acyclic organic compounds (for example, stereochemistry aside, there are nearly 15 million C20-amines, but only about 6 million C20-alcohols), 24 and thus the structural analysis of amines represents a particularly challenging problem. II]. DEFINITION OF THE PROBLEM A fully proton-decoupled, natural-abundance CMR spectrum® typically consists of a number of sharp peaks representing the resonance frequencies, in the applied magnetic field, of the various types of Carbon atoms present in the sample. A standard compound, commonly TMS, is usually included in the sample to provide a reference frequency, and the peak positions, or chemical shifts, are measured as fractional deviations from this reference, in parts per million (ppm). Previous investigations have shown that the shift of a particular Carbon is determined by its hybridization and local environment, and thus each shift contains some structural information. There are a few ranges of shifts which are characteristic of certain functional groups, such as C=O or C=C, but aliphatic Carbons in most molecules lie in a broad -4- Spectral region from which detailed structural information cannot be extracted readily. 9a 9b-k For acyclic amines,”° and a few other types of compounds, there exist predictive rules which allow one to calculate the spectrum of a compound whose structure is known, with a typical accuracy of about 1-2 ppm in a total range of roughly 100 ppm. For these classes, the structure-identification problem could in principle be solved via the generation of all possible structures of a particular type (say, acyclic amines with a particular number of Carbon atoms), the prediction of their spectra, and the comparison of these predictions with the observed spectrum. In fact, Sasaki et a1, have used this procedure in the automated identification of a few small alkanes. For large molecules, though, the number of possible isomers can be overwhelming, and even a very efficient computer program could not carry out such an analysis in a reasonable length of time. Program AMINE is designed to accomplish the same goal, but in a much more efficient manner. It takes, as its only input data, an observed CMR spectrum, the number of Carbons in the amine, and a goodness-of-fit criterion. The observed spectrum consists of a list of shifts, 0=(0),.++50,), measured in ppm relative to TMS. Each of these corresponds to one or more Carbons in the sample molecule. Under favorable circumstances, 14 it is possible to determine the number of Carbons corresponding to each observed shift (this will be called the tally of the shift) once the relative peak intensities and the -5- empirical formula are known. If the tally of a shift is known to be at least 2, 3, etc., then the shift is entered in duplicate, triplicate, etc. in the observed-shift list. These tallies are not necessary to the program's operation, but even if they are underestimated, they can add considerably to the speed and accuracy of the analysis. The number of Carbons, Ne in the amine must be greater than, or equal to, the number of shifts in the observed spectrum. Generally, No cannot be determined from the CMR spectrum, but must be obtained from some other analytical method such as mass spectroscopy or elemental analysis. The goodness-of-fit criterion, DELTA, which is used in the comparison of o to the predicted spectra of molecules or molecular fragments, represents the maximum expected error in the predictive rules. The amine rules are derived, in part, from the alkane rules of Lindeman and Adams , 24 who note that 95 percent of the studied alkanes have predicted shifts within 1.5 ppm of the observed values. A similar situation exists for the amines, so a value of DELTA = 1.5 ppm has been used in most of this work , The goal of the program is to find all acyclic, N.-Carbon amines whose predicted spectra satisfy the following two criteria: a) Every predicted peak must lie within DELTA of one of the observed peaks; and b) Within this limit, the predicted shifts must be assignable to the observed ones in such a way that all of the latter are accounted for. ITI. OVERVIEW OF PROGRAM OPERATION The operation of program AMINE can best be viewed in terms of four interconnected processes; structure generation, pruning, filtering, and Spectrum matching. The STRUCTURE GENERATOR builds a pool of increasingly large and complex alkyl chain-ends, and eventually uses these to construct amine molecules. It relies heavily upon the PRUNER to cull from the growing pool any chains which are inconsistent with the observed spectrum, and similarly upon the FILTER to test entire amine molecules. The FILTER also takes care of outputting the acceptable solution structures, and ranking them according to how well they fit the observations. Both the FILTER and the PRUNER use the spectrum MATCHER, which is responsible for the actual comparison of predicted and observed spectra. Each of these processes will be discussed in detail, below. IV. STRUCTURE GENERATION The structure generation scheme used in this study, which is related to the enumeration algorithm of Henze and Blair, ?? is applicable only to saturated, acyclic, monofunctional compounds. It is an efficient approach from the standpoint of CMR structural analysis because it rapidly generates substructures which contain a relatively large number of "predictable" Carbons (i. e., those near the ends of alkyl chains), and thus many of these substructures may be ruled out early in the analysis as being inconsistent with the observed data. foes -7- At any point in the generation, the STRUCTURE GENERATOR contains a pool of monovalent alkyl radicals which, through pruning (see below) have been found to be consistent with the observed CMR Spectrum. The pool initially contains only the -CH, radical. By attaching one or more of these pool members .({along with an appropriate number of hydrogen atoms) to a central Carbon, it constructs new radicals, each of which is passed to the PRUNER for testing. Any that agree with the observed Spectrum are included in the pool, and are subsequently used to construct larger chains. In the final step of the analysis, the STRUCTURE GENERATOR similarly attaches alkyl groups to a central Nitrogen, constructing amine molecules of the proper empirical formula. These it passes to the FILTER for final testing and ranking. At all stages of the generation, tests are made which insure that no radical or amine is considered twice. As will be discussed below, a given alkyl radical actually undergoes several different tests during pruning, with each test corresponding to a distinct chemical environment in which the chain-end might exist. The STRUCTURE GENERATOR keeps a record of these tests for each pool member, and constantly checks that it is using the radicals in a consistent fashion. If, for example, the PRUNER finds that the ethy] group is consistent with the observed spectrum only if it is attached to Nitrogen in a secondary amine, the STRUCTURE GENERATOR will never construct an n-propyl group, sec-butyl] group, or any other radical which contains an ethyl group connected to Carbon. Neither will it generate -8- primary or tertiary amines with N-ethyl groups. V. PRUNING The PRUNER is the real heart of program AMINE. It is responsible for keeping the growing chain-end pool to a manageable size by weeding out alkyl radicals which are inconsistent with the observed Spectrum. In testing a particular chain-end, R, shown schematically in Figure 1, the basic question considered by the PRUNER is: "Of all possible sets of CMR shifts which R could produce, is at least one consistent with the observed spectrum?" Actually, the question is somewhat more complex, but this provides a good starting point. Now, according to the predictive rules, 74 a Carbon's shift is determined by the structure which surrounds it, up to four bonds away. Further, the effect of a first-row atom which is four bonds removed does not depend upon whether that atom is Carbon or Nitrogen. Thus, because X in Figure 1 must contain at least one such atom (namely Nitrogen), the shifts of Cs and any Carbons "below" it are completely predictable and independent of the internal structure of X. The shifts of the remaining Carbons, Cy, Co and C,, depend to varying degrees upon the structure of X, with C, being the most sensitive. By investigating all possible X structures to a "depth" of four atoms (measured from the R-X bond), the PRUNER could generate an exhaustive list of spectra that R might produce, testing each for -9- inconsistency with the observed one. Usually, though, X contains enough atoms that there are several hundred of these “depth-4" structures, and the above approach proves to be rather cumbersome. Instead, the PRUNER considers only "depth-3" expansions of X, for which Ce and all Carbons below it are predictable. The shift of Cy is simply ignored, even when a reasonable estimate of its value might be made. This simplification cuts the number of unique X substructures to, at most, 94. There are two factors which can reduce this number still further. First, some of the substructures may contain too many (or few) Carbons to be consistent with the known atom-count of X. Secondly, there are many cases in which a single predicted spectrum for R may result from two or more related X's. This situation arises because, according to 9a the shifts of certain types of Carbons the predictive rules, (specifically, those which are four or more bonds from Nitrogen, or three if the degree of the amine is known) are not sensitive to the type or distribution of first-row atoms which are four bonds away, but only to their number. Thus, in the computation of the shift of Ce, é CH -X = -CHC C CH, is equivalent to three other structures: -10- c Che CH,-C CH,.-C -K= CHE ON, -cH 2%, -cHe 2 CH, CHC CHy-N All four may be considered as a single entity, which can be represented as: C- -X = -CHC 2 (the no. of non-H atoms) C- Once such a grouping of X substructures has been done, there remain, at most, 69 cases for the PRUNER to consider. These are summarized in Table 1, where they are further grouped into fifteen classes according to a) the type of atom directly attached to R, b) the degree of that atom and c) if that atom is Carbon, and is attached to Nitrogen, the degree of the amine. The actual purpose of the PRUNER is to consider each of these classes, determining whether at least one class member gives R a predicted spectrum consistent with the observed one, and to return the results of the fifteen class-tests to the STRUCTURE GENERATOR. The efficiency of this class-by-class investigation can be greatly improved by the inclusion of a hierarchy of pre-tests, each of which is aimed at excluding one or more classes at once. For example, classes -11- 1-12 in Table 1 all have one common feature: The atom to which R is attached is Carbon. Thus, as a pre-test for all twelve classes, the PRUNER treats X as a Carbon whose neighbors are unknown (schematically, X = C-?) and predicts as much of the spectrum of R as possible. If these predictions do not match the observations, it bypasses all further consideration of classes 1-12 and proceeds with the X = N-? pre-test for classes 13-15. Otherwise, it considers a number of more detailed pre-tests, each corresponding to a possible set of neighbors to the central Carbon in X = C-?. The actual hierarchy is outlined in Figure 2. In each of the pre-tests, the local environment of either Ce or Cy is known to a depth of only three atoms, and hence the corresponding shift cannot be predicted precisely. In most of these cases, the PRUNER can derive upper and lower limits for the shift from the predictive rules. These limits, which define an estimated shift, encompass a relatively small spectral region (0-5 ppm) because the shift of a Carbon is usually not very sensitive to atoms which are four bonds away. Even though the estimated shifts are not exact, they convey useful information to the MATCHER, and thus increase the overall program efficiency. VI. FILTERING In the final stages of the analysis, the STRUCTURE GENERATOR constructs amine molecules with No Carbons by attaching to a central -12- Nitrogen, one or more alkyl chains which have survived the pruning process. These amines are passed to the FILTER, which is responsible for calculating their total CMR spectra and, via the MATCHER, comparing these predictions with the observations. If an amine passes the test, the FILTER writes out the structure along with the predicted shifts. It then repeats the spectral comparison using progressively smaller values of DELTA until it finds the smallest value, DELMIN, for which a match still exists. In the event that several solution amines result from a particular run of AMINE, these DELMIN values can be helpful in ranking the candidates according to how well they fit the observed spectrum. VII. SPECTRUM MATCHING Eventually, the pruning and filtering processes reduce to problems in spectrum matching. Suppose the MATCHER receives for testing a list of m predicted shifts, some of which may be represented by small spectral regions rather than exact values. Now, the predictive rules are not precise, so each shift is actually associated with a range of acceptable values (given the generic symbol r) whose size is controlled by the input parameter DELTA. This parameter measures the maximum tolerable disagreement between predicted and observed shifts, so the range for a shift, S, extends from S+DELTA to S-DELTA, while that for an estimated shift, bounded above and below by Sy and Sy» extends from SFDELTA to S, ~DELTA. It should be noted that, in the latter case, -13- there are really two factors which contribute to the breadth of the range. One is the basic imprecision in the predictive rules, while the other arises because the PRUNER, in its pre-tests, sometimes calculates shifts for Carbons whose environment is not completely known. The spectrum-matching algorithm, though, makes no distinction between these; It only "knows" that a predicted spectrum consists of a list of ranges. This list will be written as r=(rysfos-+-s%2), with u; and 1, as, respectively, the upper and lower bounds for the range re. The nature of the observed spectrum, 9=(0;,05,...,0,), has been discussed in section II. The MATCHER takes these n shifts to be exact, because any estimated uncertainty (usually on the order of 0.1 ppm) in their measurement may be included in the tolerance DELTA. It is the task of the MATCHER to ascertain whether yr could be a subspectrum of o, or if m=N. (N. being the number of Carbons in the amine), whether yr could be interpreted as o. If, for a range rss there exists an observed shift 05 such that Ui20jaI;5 then it will be said that r. can be assigned to O,. The simplest test of agreement between r and o involves checking that each r; in r can be assigned to at least one 0, ino. This test does not consider the important condition that, eventually, all shifts in o must be used, and therefore a stronger test can be defined. If every Carbon in the molecule gives a different observed shift, or if an analysis of peak intensity data gives the tally of each peak, then nN In this case, it is clear that no two predicted shifts can ~14- be assigned to the same 0. Thus, referring to Figure 3, r does not match o even though the simple test is not violated, because r) and rz must both be assigned to Oo. In more complicated cases, each of several r.'s may be assignable to two or more 05'S, and vice versa, so the application of this test in an efficient manner can present 14a an outgrowth of difficulties. Fortunately, simple matching theory, the mathematical field of graph theory, provides a general method (see MATCHING ALGORITHM, below) of finding the maximum number, M, of ranges which can be assigned to the elements of oO without duplication. Clearly r cannot match o if this number jis less than m. There may be cases, though, in which complete tally information is unavailable, which means that the number of observed shifts, n, is smaller than the number of Carbons, Noe In such cases, there are (N.-n) "extra" shifts which lie somewhere beneath the n observed ones, but there is no way of determining where they belong. It is stil] possible to strengthen the simple test, but here, the additional constraint is that the predicted spectrum, once assigned, can have no more than (N.-n) "extra" peaks, either. If the simple test is passed, then every rz can be assigned to at least one 0. However, M is the maximum number of ranges which can be assigned to oO without duplication, so (m-M) must be the number of "extra" ranges in r. The condition that strengthens the simple test here, then, is (m-M)_(No-n). Because M cannot exceed m, this condition reduces to the previous one (m=M) when n=N.. -15- The above spectrum-matching scheme is useful not only in the current study, but for general cases in which a set of predicted CMR shifts of variable uncertainty is to be compared with an observed spectrum with, perhaps, incomplete intensity information. VIII. MATCHING ALGORITHM The algorithm for determining M is related to the so-called qi4b hungarian metho of simple matching, but takes advantage of certain special features of the spectrum-matching problem. It may be described briefly as follows: Begin with M=0 and process the 05's in algebraic order, beginning with the largest. For each Os, scan the ranges r; looking for those which satisfy U;20;a145 but which have not yet been assigned. If there are none, proceed to the next 05. If there is just one, assign it to Os, increment M by 1 and proceed to the next 05. If there are several, assign the one with the largest lower limit 1.) to 05, increment M by 1 and proceed to the next 0;. It is possible to prove that this gives the maximum matching between r and o, but a presentation of our proof is beyond the scope of this paper. IX. RESULTS Program AMINE has been implemented on the IBM 360/67 and DEC PDP-10 -16- computers. Any mention of timing in the following discussion refers to total execution time (central processor time plus "wait time") on the former machine. 2? The program requires about 35 thousand words of storage. A sample of the program's output is shown in Figure 4, The only large set of amine CMR spectra available in the literature is that given by Eggert and Djerassi, 24 who used it in the derivation of predictive rules. The set consists of 102 amine spectra, including both shifts and tallies. Three of these spectra correspond to diastereomeric mixtures, and these are not suitable for testing AMINE, because the program assumes that the input spectrum corresponds to a pure compound. Neither is tridodecylamine because it exceeds the maximum number of Carbons (currently 24) allowed by the program. The remaining 98 amines were used in the testing of the program. Some experimentation indicated that DELTA=1.5 ppm was small enough for efficient and selective program operation, yet large enough that about 95% of the test cases gave the correct solution among the output Structures. Increasing DELTA by 50% to 2.25 ppm slowed the program by a factor of 2-4, but AMINE always obtained the correct structure with this higher DELTA value. Generally, shift tallies were found to be unnecessary for amines containing fifteen or fewer Carbons, but for larger molecules, the analyses proved to be excessively costly unless all of the Carbons were identified in the observed spectrum. With DELTA=1.5 ppm, and using tallies for the amines with sixteen or more Carbons, the program obtained only one answer, the correct one, -17- for the 88 amines listed in Table 2. The six cases summarized in Table 3 gave from two to seven solutions, with the correct structure ranked 16 for first. For three of these Six, (see section VI.) first or tied the inclusion of tallies ruled out the incorrect answers. Four amines gave no solutions with DELTA=1.5 ppm. These were rerun using DELTA=2.25 ppm, and tallies were included to offset the longer running-times. As indicated in Table 4, three of these runs gave only the correct answer, while the fourth yielded two equally ranked solutions, including the correct one. These analyses required from 0.02 to 100 seconds of computer time, with a typical 10- or 11-Carbon amine using about 1-2 seconds. In none of the runs was an incorrect solution obtained without the accompanying correct one, and in only four cases was it necessary to use the larger DELTA value. The results for the eight amines containing sixteen or more Carbons are especially encouraging: In a reasonable length of time, the program was able to select the correct structure, along with very few others, from an “isomer space" containing from about 300,000 (for N.=16) to about 700,000,000 (for N.=24) members. !4 The above results are biased to some extent because the amines used for testing the program are the same ones used by Eggert and Djerassi in the predictive-rule formation. As a test of the generality of the program, analyses have been run on the spectra of four "unknown" amines which do not appear in the original list. The results of these tests cases are summarized in Table 5. The spectra of the two 13-Carbon -18- amines were analyzed using DELTA=1.5 ppm, and no attempt was made to include tallies. Only the correct structure was obtained in these cases. For the two 20-Carbon amines, tallies were measured under Special experimental conditions (see below). With DELTA=1.5 ppm, one of these gave two equally ranked structures, while the other gave none. A rerun of the second case with DELTA=2.25 ppm yielded five solutions with the correct one ranked as tied for second. This is the only case in which the ranking procedure favored an incorrect answer over the correct one, but here, as in most of the other multiple-result runs, the incorrect structures are sufficiently different from the correct one that they should be distinguishable by mass-spectroscopic techniques. X. CONCLUDING COMMENTS Two major conclusions result from this study. First, the CMR Spectrum of an acyclic amine appears to be highly characteristic of the structure of the amine. For example, only one of the nearly 15 million! structural isomers of Coota3N gives a predicted spectrum which matches the observed spectrum of N-butyldi(2-ethylhexy] )amine. Thus it can be concluded that CMR data do indeed contain a tremendous amount of structural information. Secondly, it has been found that efficient methods for extracting this information exist, and can be implemented on the digital computer. There is no reason to believe that these conclusions are peculiar -19- to the acyclic amines. The computational techniques outlined in this paper can readily be generalized to other classes of Saturated, acyclic, monofunctional compounds: To do so for a particular class, one needs to obtain an accurate set of predictive rules, and, perhaps, to modify the pruning process slightly to account for special features of those rules. Such rules already exist for alkanes2°>4 alkenes?*, and alcohols”, and as research in CMR spectroscopy progresses, further sets should become available. Extensions to polyfunctional and/or cyclic classes would also require more sophisticated structure-generation methods, but these are available, 2764217 In short, it appears that the computerized analysis of CMR spectra holds great promise as an accurate and selective tool in the identification of unknown compounds. XI. EXPERIMENTAL The four "unknown" amines were prepared, and their proton noise-decoupled CMR spectra obtained, using previously described 9a The spectra of the two 20-Carbon amines were also run in techniques. the presence of chromium acetylacetonate, and the integrated intensities from these were used to determine the peak tallies, 18 The observed shifts for the four amines are given below, in ppm downfield of internal MS. The estimated uncertainty in these shifts is 0.1 ppm. For the two 20-Carbon amines, tallies are included in parentheses. P67 -20- N-(3-methy] buty1)-2-ethylhexylamine;53.6, 48.6, 39.8, 39.6, 31.7, 29.3, 26.3, 24.8, 23.2, 22.8, 14.1, 11.0. N-(3-methylbuty1)-1,5-dimethylhexylamine;53.5, 45.6, 39.9, 39.4, 37.8, 28.1, 26.4, 24.0, 22.7, 20.6. N-(2-ethyl hexyl )-N-(3-methy] butyl] )heptylamine;59.6(1), 95.0(1), 53.1(1), 38.1(1), 36.8(1), 32.3(1), 31.8(1), 29.7(1), 29.4(1), 27.9(2), 26.5(1), 24.9(1), 23.6(2), 23.0(2), 14.3(2), 11.0(1) N-penty1-N-(3,3-dimethylbuty1)-3,5,5-trimethy1hexylamine; 54.5(1), 52.5(1), 51.9(1), 50.1(1), 40.9(1), 31.1(1), 30.2(3), 30.0(1), 29.8(1), 29.7(3), 27.7(2), 22.9(2), 14.1(1). XII. ACKNOWLEDGEMENTS Thanks are due to Hanne Eggert for preparing the “unknown" amines and obtaining their spectra, and to Larry Masinter together with Drs. N, S. Sridharan and Bruce Buchanan for their helpful discussion and criticism of this work. The financial support for this project provided by the National Institutes of Health (grant RR-612) is gratefully -2l- acknowledged. f-i7 -22- References For part X, see D. H. Smith, B. G. Buchanan, W. C. White, E. A. Feigenbaum, J. Lederberg, and C. Djerassi, submitted for publication in Tetrahedron. National Institutes of Health postdoctoral Fellow, 1972- 1973. (a) B. G. Buchanan, A. M. Duffield, and A. V. Robertson in "Mass Spectroscopy: Techniques and Applications," ed. G. W. A. Milne, Wiley and Sons, New York, 1971, p. 121; (b) D. H. Smith, B. G. Buchanan, R. S. Engelmore, A. M. Duffield, A. Yeo, E. A. Feigenbaum, J. Lederberg, and C. Djerassi, J. Amer. Chem. Soc., 1972, 94, 5962, and previous papers in the series. — A. Buchs, A. M. Duffield, G. Schroll, C. Djerassi, A. B. Delfino, B. G. Buchanan, G. L. Sutherland, E. A. Feigenbaum, and J. Lederberg, J. Amer. Chem. Soc., 1970, 92, 6831. (a) H. S. Herze, R. A. Hites, and K. B. Biemann, Anal. Chem., 1971, 43, 681; — (b) L. R. Crawford and D. J. Morrison, ibid., 1971, 43, 1790; (c) D. H. Smith, ibid., 1972, 44, 536; (d) P. C. Jurs, ibid., 1971, 43, 1812, and references cited therein. (a) S.-I. Sasaki, Y. Kudo, S. Ochiai, and H. Abe, Mikro- chimica Acta Acta [Wien], 1971, 726; (b) S.-I. Sasaki, S. Ochiai, Y. Hirota, and Y. Kudo, Japan Analyst, 1972, 21, 916. H. Abe and S.-I. Sasaki, The Science Reports of the Tohoku University, Series 1, 1972, 55, 63. For a general discussion of CMR spectroscopy, see "Carbon-13 Nuclear Magnetic Resonance for Organic Chemists," G. C. Levy and G. L. Nelson, Wiley-Interscience, New York, 1972. (a) H. Eggert and C. Djerassi, J. Amer. Chem. Soc., in press; (b) Jd. D. Roberts, F. J. Weigert, J. I. Kroschwitz, and H. J. Reich, ibid., 1970, 92, 1338; ‘s c) D. M. Grant and E. G. Paul, ibid., 1964, 86, 2984, 1 (e C d) L. P. Lindeman and J. Q. Adams, Anal. Chem., 1971, 43, ) D. E. Dorman, M. Jautelat, and J. D. Roberts, J. Org. hem., 1971, 36, 2757; 10. 11. 12. 13. 14. 15. 16. 17. 18. -23- (f) D. K. Dalling and D. M. Grant, J. Amer. Chem. Soc., 1967, 89, 6612; — (g) M. Crist], H. J. Reich, and J. D. Roberts, ibid., 1971, 93, 3463; (h) D. E. Dorman, S. J. Angyal, and J. D. Roberts, ibid., 1970, 92, 1351; —— (i) J. K. Crandall and S. A. Sojka, ibid., 1972, 94, 5084; (j) W. R. Woolfenden and D. M. Grant, ibid., 1966, 88, 1496; (k) F. J. Weigert and J. D. Roberts, ibid., 1970, 92, 1347, and references cited therein. _ Copies of the program, along with sample input decks, may be obtained from the authors. These isomer counts were computed using the enumeration algorithm of Henze and Blair, Reference 13. G. N. La Mar, J. Amer. Chem. Soc., 1971, 93, 1040. H. R. Henze and C. M. Blair, J. Amer. Chem. Soc., 1931, 53, 3042 and 3077. (a) "The Theory of Graphs and its Applications," C. Berge, Wiley and Sons, New York, 1964, p. 92; (b) ibid., p. 99. On the PDP-10, the program runs more slowly by a factor of about four (central processor time only). Two structures are considered to be tied when their DELMIN values differ by 0.1 ppm or less. Exhaustive, irredundant methods for the generation of cyclic structures have recently been developed by L. M. Masinter and N. S. Sridharan as part of our Heuristic DENDRAL project. A manuscript describing their work is in preparation. S. Barcza and N. Engstrom, J. Amer. Chem. Soc., 1972, 94, 1762. P-7 Table 1. The Substructures X Considered by the PRUNER. Class "Depth 3" X structure(s) ~° Class “Depth 3" X structure(s) 1 R+CH“C= }1,2,3 13 ReNH, 2 RCHAWNH, 14 R-NH-C— }0,1,2,3 f~ 3. ReCHa-NH=C C- 2 RN | ous C C- 4 R-CHo-N¢ C C cH C- Ron( C 5 R-CH{ 1,2,...,6 CH, C= CHA-C NH Rn 2 6 RCH CHC C= }0,1,2,3 c NH=C céc 7 Recut RoN¢ « c- }0,1,2,3 is) ‘cH, C C nec cH 8 R-CHK RNC C ‘ce }0,1,2,3 CHA=C C= C 9 R+CEC- fre 9 c€c C- ReN{ C NH 10 R-cec~ } 0.15. 6 LZ C- cH4c Ron NH= cHgc 11 ReCkC- ] Oto. --36 L C- c nc 12 ReCEC- JOrdees8 i f~7R Table 2. Cases for which AMINE obtained only the correct structure using DELTA = Amine (prefix only) methyl ethy] propyl] isopropyl trimethyl buty] sec-butyl] isobuty] tert-butyl diethyl] pentyl l-methyl] buty] é2-methyl butyl 3-methyl butyl 2,c~-dimethyl propyl N-methyl-sec-butyl N-methy]l-tert-buty] N-methyldiethy] hexy] 1,3-dimethyl butyl 1,2,2-trimethylpropyl 2,2-dimethy] buty1 dipropyl diisopropyl N-ethyl butyl] N-ethyl-sec-butyl triethyl _ N,N-dimethy]l-sec-butyl N,N-dimethy] -tert- -butyl heptyl l-methylhexy] l-ethylpentyl 1,3-dimethyl pentyl N-methylhexyl N-isopropylbutyl N-isopropyl-sec-butyl octyl l-methylhepty] 2-ethylhexyl 1,5-dimethy]hexy] 1,1,3,3-tetramethyl butyl dibutyl] diisobutyl N-ethylhexyl N\,N-dimethyThexy] N,N-diethylbutyl N,N-diethyl-sec-butyl 1.5 ppm and, except as noted, no tallies. ‘Amine (prefix only) N-ethyldiisopropyl nony] N-propylhexy] N-sec-butylpentyl N- -Sec- buty]l-3-methyl butyl] N-tert-buty]-3-methy1buty1 N-methyi-1,1,3,3-tetramethy1- butyl] tripropyl decy] dipentyl N-butylhexyl N-tert-butylhexy] N-sec-buty1-3,3-dimethylbuty1 di(3-methy] butyl) N-ethyldibuty] N-ethyl dibutyl N,N-diisopropyl butyl] N-pentylhexy] N-butyl-l-methyl]hexy] N-pentyl-1,3-dimethy1] buty1 N-(3,3-dimethy1 butyl) penty] N-butyl-1l-ethylpentyl N-methyl] -N-butylhexyl N-propyldibuty] N-isopropyldibuty] N-(1,3-dimethy] buty] ) hexyl tributyl N-ethyldipentyl N-tert-butyl dibutyl] N,N-dibuty1-3-methy] buty] N,N-dibutyThexy] N,N- N-s -dibutyl-3,3-dimethylbuty1 Ssec- butyl dipenty! N,N-dipentyl-1 -methylpenty] tripenty] tri(3-methyl butyl) Using tallies: N,N-dipentyl-1 N,N-dibutyl-1, butyl] trihexyl N-buty1di(2-ethylhexy1) di (2-ethylhexy1) ,3-dimethyl buty] 1, > 3,3-tetramethy] - Table 3. Cases for which AMINE obtained two or more structures using DELTA = 1.5 ppm and, except as noted, no tallies. Amine (prefix only) dihexy] N-penty1-1,1,3,3-tetra- methyl buty] N-(1-ethylpenty1])-1-propy1- N,N-dibutylhepty] diocty? triocty? Solutions (prefix only) dihexyl a N-pentylhepty] N-pentyl-1,1,3,3-tetra- methyl butyl] N-tert-butyl-1,1-dimethy1- N-(1-ethylpenty1)-1-propy1- N-(1-ethylbuty1 )-1-propy1- N,N-dibutylhepty] a N-buty1-N-pentylhexy] dioctyl N-heptylnonyl N-hexyldecy] trioctyl N-heptyl-N-octylnony1 N,N-diheptyldecy] N-hexyldinonly N-hexyl-N-octyldecy] N-hexy1-N-heptylundecy] N,N-dihexyldodecy] a) The use of tallies excludes these structures. b) Tallies were used in these runs. Rank tied tied tied tied tied tied tied tied tied tied tied tied tied tied Table 4. Cases for which AMINE found no structures using DELTA = 1.5 ppm. The correct solutions appeared when DELTA was increased to 2.25 ppm and tallies were included. ° Amine 1-isopropylhexylamine N-penty1-1,2,2-trimethy] propylamine® N-buty1-N-(1,2,2-trimethy] propyl ) pentyl amine N-buty1-N-penty1(1,1,3,3-tetramethy]buty1 )amine a) A second structure, equally ranked, was found in this case: N-propyl-N-(1,2,2-trimethypropy] )hexylamine. Table 5. Amine (prefix only) N-(3-methy1buty1)-1,5-dimethylhexy] N-(3-methy] buty1)-2-ethylhexy] N-hepty1-N-(3-methy1buty] )-2-ethylhexy] N-penty1-N-(3,3-dimethy]lbuty1) - 3,5,5-trimethylhexy] Conditions DELTA Tallies (ppm) used? 1.5 no 1.5 no 1.5 yes 2.25° yes Results obtained by AMINE for the four "unknown" amines. Solutions (prefix only) N-(3-methylbuty1)-1,5-dimethylhexy] N-(3-methy]buty1])-2-ethyThexy] N-hepty1]-N-(3-methylbuty1)-2-ethyThexy] N-penty1]-N-(3-methy]buty1 )-2-ethylhexy] 2-ethyl-1,5,5,7,7-pentamethy1]-1- (2,2-dimethy]propy] )octy] N-penty1-N-(3,3-dimethy] buty1) - 3,5,5-trimethy] hexyl] N,N-di (tert-butyl )-2-methy]-2- (2,2-dimethy] propyl )hexy] N-tert-butyl-1,1,3-trimethy1-3- (2,2-dimethylpropy] )octy] 2-ethyl-1,1,5,7,7-pentamethy1]-5- (2,2-dimethyl propyl] )octy] a) With DELTA = 1.5 ppm, no structrures were found for this amine. Rank 1 (tied) “1 (tied) 2 (tied) 2 (tied) 2 (tied) 2 (tied) Figure l. Figure 2. Figure 3. Figure 4. Figure captions A schematic illustration of R, the alkyl chain-end to be tested by the PRUNER. The group X contains the Nitrogen atom, along with any carbons and hydrogens not included in R. The hierarchy of pre-tests used by the PRUNER. A "?" attached to an atom indicates that the neighbors of that atom are unknown at testing time. A case in which r and o do not match when n = Ne» even though the simple test is passed. Sample output from program AMINE (PDP-10 version). The solution structure is written in polish-prefix notation as described in Reference 3a. Figure | C-?-—~ = N-?-— Figure Z . [-—~ CH,-C-? ——~ Class 1 tests CH,-N-? ——~ Class 2, 3 and 4 tests C-? | p——— CH wee Class 5 tests C-? p———— CH —p— cK ——~~ Class 6 tests C-? C-? NH-C p— CHC —~ Class 7 tests C-? C n (1) Program Director Feigenbaum, Edward A CSCo-Principal Invest. 10 Buchanan, Bruce G. 1,2)¢s |Associate Invest. 50 Duffield, Alan Ch Associate Invest. 25 Smith, Dennis Ch Research Associate 100 Hammerum, Steen Ch Research Associate 50 Sridharan, Natesa CS Research Associate 50 PART A Reiss, Steve CS (Computer Programmer 50 Hjelmeland, Larry CS Research Assistant 100 7 Masinter, Larry CS Research Assistant 50 Stefik, Mark CS Research Assistant 50 Wharton, Kathy Admin. Assistant 25 Larson, Dee Secretary 25 (1) See Budget Notes (2) In first year only 9/1]/74-4/30/75 covered TOTAL ————_____-—-»- | 5 80 ,624 2. CONSULTANT COSTS (Include Fees and Travel) s 1,100 3. EQUIPMENT (Itemize) $s 4. SUPPLIES Office supplies 350 s STAFF s TRAVEL @, DOMESTIC 1 400 (See Inatnictions) b. FOREIGN $ 6. PATIENT COSTS (Separate Inpatient and Outpatient) s - 7. ALTERATIONS AND RENOVATIONS s - 8. OTHER EXPENSES (itemize per inattuctions) Telephone, postage, etc. $ 200 Publication costs 700 Computer terminal rent 3,200 Computer usage costs 36 ,000 + 40,100 9. Subtetal — Items 1 thry § — ———ge fg 123,574 10. TRAINEE EXPENSES (See Instructions) PREDOC TORAL No. Proposed $ FOR a. STIPENDS | POSTDOCTORAL No. Proposed 5 OTHER (Specify) No. Proposed s TRAINING DEPENDENCY ALLOWANCE $s GRANTS TOTAL §TIPEND EXPENSES —— a» | 5 b. TUITION ANO FEES $ ONLY c, TRAINEE TRAVEL (Describe) s 4, Subtotal — Trainee Expenses s (12. TOTAL DIRECT COST (Add Subtotals, Items 9 and 11, and enter on Page 1) $ 123 5574 Substitute Budget Page 572 Pr vF GPO 930-793 SECTION 11 — PRIVILEGED COMMUNICATION Part A BUDGET ESTIMATES FOR ALL YEARS OF SUPPORT REQUESTED FROM PUBLIC HEALTH SERVICE DIRECT COSTS ONLY (Omit Cents) 1ST PERIOD ADDITIONAL YEARS SUPPORT REQUESTED /This application only) DESCRIPTION ISAME AS DE. TAILED BUDGET) ] 2NO0 YEAR 3RO YEAR 4TH YEAR STH YEAR 6TH YEAR 7TH YEAR PERSONNEL COSTS 80,624 95,175 100,320 CONSULTANT COSTS (Include fees, travel, etc.) 1,100 1,200 1,300 EQUIPMENT SUPPLIES 350 400 450 DOMESTIC TRAVEL +2800 =2£00 00 FOREIGN PATIENT COSTS - - - ALTERATIONS AND RENOVATIONS OTHER EXPENSES 40,100 45,450 50,000 TOTAL DIRECT COSTS 123,574 [143,825 |153,870 TOTAL FOR ENTIRE PROPOSED PROJECT PERIOD (Enter on Page 1, Item 4) ————_—» $ 421.269 ’ REMARKS: Justify alt costs for the first year for which the need may not be obvious, For future years, justify equipment costs, as well as any significant increases in any other category. If a recurring annual increase in personnel costs is requested, give percentage. (Use continuation page if needed.) See attached budget justification notes. PHS-398 fojer Ras 2.70 BUDGET - PARTS B (i) AND B (ii) MASS SPECTROMETER DATA SYSTEM DEVELOPMENT AND ANALYSIS OF THE CHEMICAL CONSTITUENTS OF BODY FLUIDS ry ~~ oR PRIVILEGED COMMUNICATION SECTION II SUBSTITUTE THIS PAGE FOR DETAILED BUDGET PAGE PERIOD COVERED GRANT NUMBER SUBSTITUTE ae SURSUS HR H DETAILED B - UDGET FOR FIRST 12-MONTH PERIOD 5/1/74 4/30/75 1, PERSONNEL (List all personnel engajted on project) Eeron: AMOUNT REQUESTED (Omit cents) NAME (Last, first, initial) TITLE OF POSITION %/HRS. | TOTAL Lederberg » Joshua G | Principal Investigator or 3 Program Director Duffield, Alan Ch lAssociate Investig. 25 Pereira, Wilfred Ch |Research Associate 50 Summons, Roger Ch |Post Doctoral Fellow 100 Rindfleisch, Thomas E |Research Associate 100 Veizades, Nicholas E jResearch Engineer 100 PART B (i) and (ii) Reynolds, Walter E |Research Engineer 20 Tucker, Robert CS |Computer Programmer 75 Wegmann, Annemarie Ch |Sr.Research Assist. 100 Steed, Ernest E |Research Engineer 10 Pearson, Dale E jElectronics Tech. 60 DeFrancisci, Richard Machinist 20 Allan, Muriel Secretary 25 TOTAL ——____» | $ 139, 830 2. CONSULTANT COSTS (Include Fees and Travel} 3. EQUIPMENT (Itemize) Computer Terminal * $ 3,000 4 SUPPLIES Of fice supplies-$750; chemicals,glassware,and lab apparatus-$2,500; GC supplies (gases,phases,columns,etc.)-$950; dry ice and liq.nitrogen- $1,500; electronic supplies and parts-$3,500;GC/MS data recording media- $2,100; mini-computer supplies-$1,500; mass spec. repairs and parts-$7,60Q*% 20,400 aoe o, voMesticl east coast ($500); 1 mid-west ($350); 1 west coag #150) s 1,000 (See Instructions) b. FOREIGN $ ” 6. PATIENT COSTS (Separate Inpatient and Outpatient) $s - 7, ALTERATIONS AND RENOVATIONS 7 Mass spectrometer laboratory air conditioning and power modifications s 2,500 8. OTHER EXPENSES (Itemize per instructions) Telephone and data communications - $1,200; Publication costs - $1,000; Mini-~computer maintenance contract - $4,600; computing costs from ACME follow-on - $64,000 $ 70,800 9. Subtotal — Items | thry 8 ene gs 237,530 10. TRAINEE EXPENSES (See instructions) PREDOCTORAL No. Proposed $ FOR a. STIPENDS | POSTDOCTORAL No. Proposed $ OTHER (Specify) No. Proposed $ TRAINING DEPENDENCY ALLOWANCE $ GRANTS TOTAL STIPEND EXPENSES ————____» |5 b. TUITION AND FEES s ONLY c. TRAINEE TRAVEL (Describe) $ WwW. Subtotal ~ Trainee Expenses a rnnmention | 12, TOTAL DIRECT COST (Aud Subtotals, Items 9 and 11, and enter on Page 1) ——_—-~-____—_—» | $ 237.530 9 Substitute Budget Page 5-72 Pe. am GPO 930.791 For Forms PHS 398 and PHS 2499-1 SECTION 11 — PRIVILEGED COMMUNICATION Part B i) and ii) BUDGET ESTIMATES FOR ALL YEARS OF SUPPORT REQUESTED FROM PUBLIC HEALTH SERVICE DIRECT COSTS ONLY (Omit Cents) DESCRIPTION ast rerioo | ADDITIONAL YEARS SUPPORT REQUESTED (This application only) TAILED BUOGET) 2ND YEAR 3RD YEAR 4TH YEAR 5TH YEAR 6TH YEAR 7TH YEAR PERS cosrs tt 139,830] 148,066] 156,775 CONSULTANT COSTS _ _ - (Include fees, travel, etc.) EQUIPMENT 3,000 3,000 3,000 SUPPLIES 20,400] 21,050] 22,250 DOMESTIC 1,000 1,000 1,000 TRAVEL FOREIGN PATIENT COSTS _ _ _ ALTERATIONS AND RENOVATIONS 2,500 ~ - OTHER EXPENSES 70,800 75,000 79,500 TOTAL DIRECT COSTS 237,530 | 248,116} 262,525 , TOTAL FOR ENTIRE PROPOSED PROJECT PERIOD (Enter on Page 1, Item 4) ———-» | $ 748,171 page if needed.} See attached budget justification. REMARKS: Justify all costs for the first year for which the need may not be obvious. For future years, justify equipment costs, as well as any significant increases in any other category. If a recurring annual increase in personnel costs is requested, give percentage, (Use continuation PHS-398 BUDGET ~ PART C EXTENSION OF THE THEORY OF MASS SPECTROMETRY BY COMPUTER oe PRIVILEGED COMMUNICATION SECTION Il SUBSTITUTE THIS PAGE FOR DETAILED BUDGET PAGE PERIOD COVERED GRANT NUMBER For Foems PHS 398 and PHS 2499-1 SUBSTITUTE FROM THROUGH DETAILED BUDGET FOR FIRST 12-MONTH PERIOD 5/1/74 4/30/75 1, PERSONNEL (List all personnel engaged on project) eon AMOUNT REQUESTED (Omit cents) NAME (Last, first, initial) TITLE OF POSITION ”/HRS. | TOTAL Lederberg, Joshua G | Principat Investigator or 3 1 rogram Directo Feigenbaum, Edward a.‘ )¢s o-Principal Invest, 10 Buchanan, Bruce c.(1,2) os Associate Invest. 50 Sridharan, Natesa CS|Research Associate 50 Hammerum, Steen Ch/Research Associate 50 White, William CS|Computer Programmer 50 PART C Farrell, Carl CS|Research Assistant 100 Wharton, Kathy Admin. Assistant 25 Larson, Dee Secretary 25 (1) See budget notes (2) Covers 9/1/74-4/30/75 in year 1 TOTAL — ne | 5 48,521 2. CONSUL TANT COSTS (Include Fees and Travel) $ - 3. EQUIPMENT (itemize) $ - 4. SUPPLIES ‘ 5 350 STAFF a, DOMESTIC $ 1,400 TRAVEL (See Instructions) b. FOREIGN $ 6. PATIENT COSTS (Separate Inpatient and Outpatient) $ - 7. ALTERATIONS AND RENOVATIONS . $ - 8. OTHER EXPENSES (Itemize per instructions) Telephone, postage, etc. $ 200 Publication costs 700 Computer terminal rental 1,600 Computer usage 21,000 $§ 23,500 9. Subtotal — Items TP thry 8 9 =e S| gs 73 3/71 10. TRAINEE EXPENSES (See Instructions) PREDOCTORAL No. Proposed $ FOR a. STIPENDS | POSTDOCTORAL No. Proposed $ OTHER (Specify) No. Proposed $ TRAINING DEPENDENCY ALLOWANCE $s GRANTS TOTAL STIPEND EXPENSES wenn | § b. TUITION AND FEES s ONLY ¢., TRAINEE TRAVEL (Describe) $ WW Subtotal ~ Trainee Expenses nena |S 12. TOTAL DIRECT COST (Add Subtotals, Items 9 and 11, and enter on Page ) ne |S 73.771 2 Substitute Budget Poge 5-72 Anes GPO 930.791 SECTION I) — PRIVILEGED COMMUNICATION Part C BUDGET ESTIMATES FOR ALL YEARS OF SUPPORT REQUESTED FROM PUBLIC HEALTH SERVICE DIRECT COSTS ONLY (Omit Cents) STP Thi icati DESCRIPTION astgeron [ADDITIONAL YEARS SUPPORT REQUESTED (This application only) TAILED BUDGEV 2N0 YEAR 3RD YEAR 4TH YEAR 5TH YEAR 6TH YEAR 7TH YEAR PERSONNEL COSTS 48,521 61,194 | 64,655 CONSULTANT COSTS (Include fees, travel, etc.) ~ ~ ~ EQUIPMENT 7 - - SUPPLIES 350 400 450 DOMESTIC L 1 TRAVEL ,400 ,600 1,800 FOREIGN PATIENT COSTS ~ - - ALTERATIONS AND _ _ _ RENOVATIONS OTHER EXPENSES 23,500 27,650 30,450 TOTAL DIRECT COSTS 73,771 90,844 | 97,355 TOTAL FOR ENTIRE PROPOSED PROJECT PERIOD (Enter on Page 1, Item 4) ——» | $ 961,970 REMARKS: Justify all costs for the first year for which the need may not be obvious. For future years, justify equipment costs, as well as any Significant increases in any other category. If a recurring annual increase in personnel costs is requested, give percentage, (Use continuation page if needed. } See attached budget justification notes. PHS-398 ana BUDGET ~ PART D APPLICATIONS OF CARBON(13) NUCLEAR MAGNETIC RESONANCE SPECTROMETRY TO ASSIST IN CHEMICAL STRUCTURE DETERMINATION PRIVILEGED COMMUNICATION SECTION II SUBSTITUTE THIS PAGE FOR DETAILED BUDGET PAGE PERIOD COVERED GRANT NUMBER SUBSTITUTE DETAILED BUDGET FOR FIRST 12-MONTH PERIOD| ee . 5/1/74 4/30/75 1. PERSONNEL (List all personnel engaged on project) Error AMOUNT REQUESTED (Omit cents) NAME (Last, first, initial) TITLE OF POSITION ”/HRS. TOTAL T . Lurene Djerassi, Carl$ ) Ch | Principal Investigator or 3 9 Program Director Carhart, Ray (2) Ch |}Post Doctoral Fellow 100 nnamed Ch |Post.Doc.Res.Assoc. 100 Van Antwerp, Craig Ch |Research Assistant 50 PART D (1) See budget notes (2) Covers 9/1/74-4/30/75 in year 1 Ce oad TOTAL $ 33,592 2. CONSUL TANT COSTS (Include Fees and Travel) _ $ 3. EQUIPMENT (Itemize) s _ 4. SUPPLIES ‘ Chemical supplies 900 $ STAFF | poMESsi . $ 500 TRAVEL a DoMEsTIC | east coast trip (See Instructions) b. FOREIGN $s 6. PATIENT COSTS (Separate inpatient and Outpatient) 5 - 7, ALTERATIONS AND RENOVATIONS - $ ~_ 8. OTHER EXPENSES (itemize per instructions) . Publication costs and reproduction services $ 100 NMR instrument usage (25 hrs/month @ $25/hour) 7,500 Computer usage 10,800 gs 18,400 9. Subtotal ~ ftems 1 they 8 nnn 1g 53 3 392 10. TRAINEE EXPENSES (See Instructions) PREDOCTORAL No. Proposed $ FOR a. STIPENDS | POSTDOCTORAL No. Proposed $ OTHER (Specify) No. Proposed $ TRAINING DEPENDENCY ALLOWANCE $ GRANTS TOTAL STIPEND EXPENSES =» /§ b., TUITION AND FEES $ ONLY ¢. TRAINEE TRAVEL (Describe) $ VW. Subtotal ~ Trainee Expenses ST nnnnnrnenenonitinn | 12, TOTAL DIRECT COST (Add Subtotals, Items 9 and 11, and enter on Page U Senge | 53 302 > Substitute Budget Page 5-72 Peek. GPO 930-793 For Farme PHS 298 and PHS 2499.1 SECTION tH — PRIVILEGED COMMUNICATION Part D DIRECT COSTS ONLY (Omit Cents) BUDGET ESTIMATES FOR ALL YEARS OF SUPPORT REQUESTED FROM PUBLIC HEALTH SERVICE 1ST PERIOD ADDITIONAL YEARS SUPPORT REQUESTED (This application only) DESCRIPTION (SAME AS DE- TAILED BUDGET) | 2ND YEAR 3RD YEAR 4TH YEAR 5TH YEAR 6TH YEAR 7TH YEAR PERSONNEL COSTS 33,592 53,178 56 ,176 CONSULTANT COSTS _ _ _ (include fees, travel, etc.) EQUIPMENT - - - SUPPLIES 900 1,000 1,100 DOMESTIC 500 500 500 TRAVEL FOREIGN PATIENT COSTS - - ~ ALTERATIONS AND RENOVATIONS - - - OTHER EXPENSES 18,400 20,000 22,200 TOTAL DIRECT COSTS 53,392 74,678 | 79,976 TOTAL FOR ENTIRE PROPOSED PROJECT PERIOD (Enter on Page 1, Item 4) ———_—_——_» $ 208,046 page if needec.} See attached budget justification notes. REMARKS: Justify all costs for the first year for which the need may not be obvious. For future years, justify equipment costs, as well as any significant increases in any other category. If a recurring annual increase in personnel costs is requested, give percentage, (Use continuation L * PHS-398 Rev, 3-70 f-66S COMPOSITE BUDGET - PARTS A+B+C+D PrEGY PRIVILEGED COMMUNICATION SECTION II SUBSTITUTE THIS PAGE FOR DETAILED BUDGET PAGE SUBSTITUTE PERIOD COVERED GRANT NUMBER FROM THROUGH DETAILED BUDGET FOR FIRST 12-MONTH PERIOD 5/1/74 4/30/75 1, PERSONNEL (List all personnel engaged on project) yl aot AMOUNT REQUESTED (Omit cents) NAME (Last, first, initial) TITLE OF POSITION %/ HRS, I TOTAL Lederberg, Joshua G_ | Principal Investigator or 10 Program Director Feigenbaum, Edward CS |Co-Principal Inves. 20 Djerassi, Carl Ch ;Co-Principal Inves. 3} Buchanan, Bruce CS JAssociate Inves, 100 Duffield, Alan Ch |Associate Inves. 50 smith, Dennis Ch |Research Associate 100 COMPOSITE BUDGET Sridharan Natesa CS |Research Associate 100 Hammerum, Steen Ch |Research Associate 100 Pereira, Wilfred Ch |Research Associate 50 Rindfleisch, Thomas E |Research Associate 100 Carhart, Ray Ch |Post Doctoral Fellow 100 Summons, Roger Ch |Post Doctoral Fellow 100 Unnamed Ch {Post Doc.Res.Assoc. 100 See attached sheet TOTAL ————_-—____—-» | § 302,567 2. CONSUL TANT COSTS (Include Fees and Travel) $ 1,100 3. EQUIPMENT (Itemize) Computer Terminal $ 3,000 4. SUPPLIES ‘ See attached sheet s 22,000 STAFF . OMESTIC $ TRAVEL 2° 4 300 (See Instnictions) b. FOREIGN , $ _ 6. PATIENT COSTS (Separate Inpatient and Outpatient) $ 7. ALTERATIONS AND RENOVATIONS J : tga : : $ Mass spectrometer laboratory air conditioning and power modifications 2,500 8. OTHER EXPENSES (Itemize per instructions) Telephone, data communications, postage, etc. $ 1,600 Publication costs $ 2,500 Mini-computer maintenance contract $ 4,600 NMR Instrument usage $ 7,500 Computer terminal rental 4,800 152.800 Computer usage (ACME follow-on, Campus 360/67, and ARPANET) 131,800 s ’ 9, Subtotal — Items 1 they 8 ee 2 488,267 10. TRAINEE EXPENSES (See Instructions) PREDOCTORAL No. Proposed $ FOR o. STIPENDS | POSTDOCTORAL No. Proposed $ OTHER (Specify) No. Proposed $ TRAINING DEPENDENCY ALLOWANCE $ GRANTS TOTAL STIPEND EXPENSES —~————_-_-_» /§ b. TUITION ANO FEES $s ONLY c. TRAINEE TRAVEL (Describe) $s Wa Subtotal ~ Trainee Expenses Sacer |S ~ . , cr 12, TOTAL DIRECT COST (Add Subtotals, Items 9 and 11, end enteron Page 1) $ 488 3267 Substitute Budget Page 5-72 pe ee GPO 930-79) For Forms PHS 398 and PHS 2499-1 DO NOT TYPE IN THIS SPACE-BINDING MARGIN Continuation page ‘ t PERSONNEL (Continued) Name Title of Position Veizades, Nicholas Reynolds, Walter Steed, Ernest White, William Tucker, Robert Reiss, Steve Wegmann, Annemarie Pearson, Dale Hjelmeland, Larry Masinter, Larry Stefik, Mark Farrell, Carl Van Antwerp, Craig Wyche, Margaret DeFrancisci, Richard Wharton, Kathy Larson, Dee Allan, Muriel SUPPLIES Office supplies E cS cS cS Ch cS cS cS CS Ch Research Engineer Research Engineer Research Engineer Computer Programmer Computer Programmer Computer Programmer Senior Research Assistant Electronics Technician Research Assistant Research Assistant Research Assistant Research Assistant Research Assistant Laboratory Technician Machinist Administrative Assistant Secretary Secretary Chemicals, glassware, and laboratory apparatus GC supplies (gases, phases, columns, etc.) Dry ice and liquid nitrogen Electronic supplies and parts GC/MS data recording media (chart paper, Calcomp, etc.) Mini-computer supplies (paper, ribbons, tapes, disks, etc.) Mass spectrometer repairs and replacement parts Time or Effort 100 20 10 50 75 50 100 60 100 50 50 100 50 50 20 50 50 25 $ 1,450 3,400 950 1,500 3,500 2,100 1,500 7,600 $22 ,000 PHS -398 Rev. 2-69 Page GPO : 1969 O - 350-360 f-/ G6 COMPOSITE - PARTS A, B, C, & D- SECTION 11 — PRIVILEGEO COMMUNICATION BUDGET ESTIMATES FOR ALL YEARS OF SUPPORT REQUESTED FROM PUBLIC HEALTH SERVICE DIRECT COSTS ONLY (Omit Cents) 1ST PERIOO *ADDITIONAL YEARS SUPPORT REQUESTED (This application only) DESCRIPTION {SAME AS DE- TAILED BUDGET} | 2ND YEAR 3RAD YEAR 4TH YEAR 5TH YEAR 6TH YEAR 7TH YEAR PERSONNEL COSTS 302 ,567 357,613 377,926 CONSULTANT COSTS include fees, travel, etc.) 1,100 1,200 1,300 EQUIPMENT 3,000 3,000 3,000 SUPPLIES 22,000 22,850 24,250 DOMESTIC 4,300 4,700 5,100 TRAVEL FOREIGN - ~ - PATIENT COSTS - - ~ ALTERATIONS AND RENOVATIONS 2,500 OTHER EXPENSES 152,800 168,100 182,150 TOTAL DIRECT COSTS 488,267 | 557,463 | 593,726 TOTAL FOR ENTIRE PROPOSED PROJECT PERIOD (Enter on Page 1, {tem 4). ——_——» S$ 4 ,639 ,456 REMARKS: Justify alf costs for the first year for which the need may not be obvious. For future years, justify equipment costs, as well as any significant increases in any other category. If a recurring annual increase in personnel costs is requested, give percentage, (Use continuation page if needed.) PHS-398 rr Rev. 3-70 P1607 BUDGET DETALL AND JUSTIFICATION The buiyets for the DENDRAL Froject are presented in tour parts, corresponding to the four proposal sections; 4, B{ij and (ii), C, ani D. Parts A and c represent the portions concerned with Heuristic and Meta-DENDRAL; bart B deals with the data SyStem automation and instrument maintenance functions as well as the development aspects of GC/MS analysis of body fluids; and Part 2 as an extension ot DENDRAL methodoloyy to Carbon (13) nuclear magnetic resonance spectrometry. As a general note, Professor Lederbery will devote a total of 10% of his time to this res2arch as the Principal Investigator. His time is budyated as follows: 4% on Part A, 3a On Part BH, and 3% on Part &. The narrative comments on Parts A and C have been combined below because the personnel ani computer resources overlap to a large extent. BUDGET EXPLAWATION - PARTS A & C PERSONNELS? a) The petsonnel on the DENDRAL staft constitute its most valnable resource. All of the people listed in the proposal are how working on the D&NuRAL Project. All are hecessary to Support the high level of scientific activity in Chemistry (A. Duffield, vb. Smith, S. Hammerum, and L. Ujelmeland) and Computer Science (Ff. Feigenbaum, B. Buchanan, N. Sridharan, WwW. White, S. Kkeiss, M. Stefik, L. Masinter, and C. Farrell). Mr. tark Stefik's status will have changed to Research Assistant for Part A from his current status as Conputer Proglammec on Part B. Mr. Steve deiss' salary has been increased in order to properly compensat2 him for the duties he performs. Recent changes in draft boacu policies allow Conscientious Objectors to receive higher compensation to reflect actual job duties. Specific University approval has been requested for this increase but has not yet been received. Mr. Larcty wasinter has previously been paid from other Eunds, but 18 essential to the NitH-related work. Leet a b) Salary tigures are increased annually by 5% for merit increases and promotions. Fringe benefits are budgeted at the standard University rates of I7% through 8/74 and are increased annually per University projections to 18.3% in 9/74, 19.3% it 9/75, and 20.4% in 9/70. No new personnel are added in Year 2. However, the salary budget incteuses by more than the rates noted above because ail of Pr. Buchanan's salary is tovered (see c) below) and Professor Feigenbaum returns from his leave ot absence (see da) below). c) Beuce Buchanar currently has an NIH Career bevelopment Award through 6/31/76. However, bezause of recent WIH bud jet cutbacks, there is a strong probability that this award will te cancelled before that uate. Dr. Ferguson of NIK stated on 2/e/13 tnat the award could only be guaranteed through s/74, d) As aoted in the Introduction to this proposal, vr. Feigenbaug will be on leave of absence with AKPA tor a period of two yoars. This overlaps the term of this grant application such that no salary is badyeted for Dr. Feigenbaun duriny the first gtant year. His salary is budgeted starting in the second grant year when he will formally return to his position in this research project. KQULP MENT: No equipment purchases are rc2quiced tor Parts A and C, SUPPLi€S AND TRAVEL: Office supplies are budgeted based on our experience over the past year. The trave] budget covers expected costs for attendiny professional meetings and maintaining contact with related work at other locations. Because Artificial Intelligence is a tapidly expanding field, it is essential to Maintain a high deyree of personal interaction in order to assimilat2 new developments, These budget items are increased and roundad at t0% per year. Prey OTHE EXPENSES: Telephone costs include connections and wsage ror conputer terminals, Publication costs are budgeted at a nominal rate based on past experience and are increased by 10% per year. In the catejory of Computer Yerminal Kent, the budget for Part A includes the lease cost of 2 portable Texas Instruments terminals. An additional terminal is added in 5/75 to accommodate increased use of the programs by nersonnel and a larger community of Stanford users. The fart © budyet covers the continued lease of one T.I. terminal and an additional terminal starting in 5/75. Cormpuater time is budgeted according to current rate structures based on dur on-yoing experiance in utilizing the Stanford (SCC) 360/07 and machines available via the AHPAMET. We will hot make use of the AUME follow-on machine (370/158) for Parts A and C beacause of the availability of superior Lise facilities on these other machines. Instrument data will be communicated From the 3/0/158 (see Part B) to the LISP programs for analysis. BUDGET EXPLANATION - PARTS B(i) AwD 8B(ii) Tais budget covers instrumentation maintenance, data syster development, and research into applications of GC/MS analysis ot body fluids as described in Parts B(i) and B(i1l) of this proposal. This budyet represents a significant increase over that Submitted toc Part BR of the DENDRaL grant currently in-proyress (current budyet $60,000 per year). The major reasons for this increase ara twofold: a) Increases in required personnel support because of corresponding decreases in support from other sources and b) The need to implement oar computing support from a source other than the ACME 36u750 for whach NIH funding is terminating. We have ctiyorously attempted to keep these increases to an absolute minimum consistent with Maintaining the viability ot our unaugmented research program, We have previously received substantial support for our GC/MS research from NASA. Hecaase of shifting federal priorities, however, NASA support kas declined Substantially and we project Will terminate in the first year of this renewal. At the sume time, our reseurch has been moving to emphasize more and more heavily GC/MS applications in clinically related aSpects or metabolic indicators of disease. Thus it is reasonable, as well as necessary, that support for this continued research shift to NTH. As waentioned in the Introduction to this proposal, we have an application pending with NIH-G&S for support in applyiny these techniyues to aspects of genetic disease. These Proposals are complementacy in goals and it is assumed in this budyet that the Genetics Center proposal will provide support for a Major fraction (approximately 50%) of tne low cesolution GC/HS laboratory (Finnigan 1015 instrument) aincludiny personnel, Supplies, atc. There is, howevar, a small amount of operational manpower overlap between the two proposed efforts. If both proposals ate funded, a savings will result through common Operational support which will be negotiated with NIH at the appropriate time. AS discussed under future plans tor Part B(i) of this proposal, w2 have had to plan an alternative source of computing to support this research because NIH subsidy of the ACME tacility terminates in July 1973. We have chosen to use the stanford-sronsored follow-on to ACME, mounted on an Lupe 375/155, Since out computer programs will operate with a minimum of modification. This facility will operate on a tee-tor-service basis. Whetaas its rate structure is still evolving, we have estinated, on the basis of available information, the cost of transferting our computing to that facility as reflected in our budget (%64,u00 per year). It should be noted that this rate Structure dyes not include indirect charges at this time. as the Tate structure becomes better tefined, the indirect cost may be full -49- ancluded in the usage rates. Tais would necessitate a slight modification of the budyet as will be negotiated with NIH as appropriata, Tice following gives a detailed description ot the various components 91 the bart # budget: PRESONNELS: The ,ercsonnel budgeted for GT/MsS applications, laboratory operations, and data system development are necessary to achieve our research quals and are Currently active in the SC/MS plrojtams, Chemistry support for the interpretation of body fluii analyses in cooperation with our clinical collaborators include prs. A. Duffield (29%), Ww. Periera (20%), and 2. Summons (10U%). M. Wyche provides laboratory and instcument operation support for the low resolution GCS/N5 laboratory. Messers vindfleisch, Veizades, keynolas, and Tucker are essertial to the data systea development effort and prooviue hardware and software maintenance Support aS well. dessers Kinditleiscu (1004) and Tucker (/5%) are pligartly cesponsitle for the sottware system design, implerentation, alu maintenance. Hr. Veizades (100%) is primarily concerned «aith the hatdware maintenance and development aspects of the hiyh resolution MAT-711 instrument ani Mr. Reynolds (20a) with the Finnigan 1415 low resolution instrument. Fs. A. wWeywann (1004) is reSponsible for the operation of the high resolution GC/MS instrument (MAT-711). br. Steed (104) provides necessary jlasswock development ana maintenance, Mc. Pearson (60%) Supports the fabrication and repair of electronic hardware for both instruments, and #c. NeFrancisci (4U%) provides necessary machinist support tor mechanical repairs and fixtures. fs. Allan (254) provides reyuired secretarial Support tor the above Instcuwentation kesearch Laboratory personnel. This manpower complement is sarried into the future Yeadls as Shown. Salaries are increased by 54 per year and staff Lenerits are applied at standaca University cates. These start at I/7k in fiscal year 1974 (9/773 - 8/74) and increase to 16.3% in 4/74, 19.34 in 9775, and 20.4% in 9/76 based on University projections. OCUIPMENT: Vur request for additional eyuipment is minimal. we budget tor the purchase of a computer tarminal in the first year tor $3,000. This replaces u currantly rented terminal integral to the Oty: data system and saves $5,280 over the three year -6- jtant period by purchasing instead of continued rental. In the sscond year we budget for an event counter necessary tor proper equipment maintenance for which we are assubbing responsibility. we already maintain the Finniyan 1615 instrument and will take over the mAT-711 because of progressively poorer pertormance by VARLAN ASsociates in maintaining that instrument over the past year. This equipment is also needed to implement experimental control functions on the mass spectrometer. In the third year, replacemant of outdated test egjaipment will be required, $3,090 are budgeted for this purpose. SUPPLIES: Supplies are budgeted based on our actual Operating experience and are minimized consistent with a viable research effort. Office supplies include stationery supplies, postage, reproduction services, etc. and are budjeted at $63 per month. The budget for chemicals, glassware, and laboratory appacatus ($2,900) provides the necessary materials for derivatizing and analyzing body fluid sumples. GU supplies ($950) and dry ice and liquid nitroyen ($1,500) are necessary tor instrument Operation and are based on past experience. The largest part ot the liguid nitrojen pudget is used for the high cvesolution instrument, Electronics supplies and parts (83,599) inclade Circuit boards, semi-conductors, etc. needed for mass spectroaeter control electronics such as tot the metastable acquisition system as well as for maintaining out existing test equipment (oscilloscopes, voltmeters, power supplies, etic.). GC/Ms cata Lecorcing media (82,190) include chart ana Calcomp plotter papers of vatious types (includiny UV-sersitive Paper for the war-711) for tne purvose of Lecotding mass Spectromet:1 and yaS chromatograph effluent uata. The budjeted amount reflects our usaje over the past year. Similarly, wini-computer supplies (1,500) include Teletype and line printer paper and ribbons, magnetic tapes (DEC tape and Lun cotpatible tape), aru disk cartridges based on previous usage history. The budget for MmaSS Spectrometer repairs ani replacement parts (37,600) covels our maintenance of these instruments based in part on predictable replacsments (tilamonts, multipliers, etc.) and in part on an estimate tron previous experience of unscheduled problemas (power supplies, valves, pumps, etc.). fhe supplies budjet tor future years covers tiese same thems with 6% added for increased usaye and intlation. TRAVEL: We have budgeted for travel to attend plotessional meetings and to visit other GU/NS laboratolics on the basis Of § @ast coust trip (6500), 1 mid-west trip (6350), and | owost coast trip ($155) . ALTENAITONS AND KUNOVAPIONS: We have had problems with thermal overloads on the high resolution fass spectrometer instrument and associated electronics during the summer months. In addition, because ot the moditied computing configuration Cequired by the ACME transition, we will locate a disk and fCinter equipment in the same laboratory to support the mini-~computer interfacing the MAT-/11. These conditions require an auyghentation to existing air-conditioning and power facilities in the laboratory estimated at $2,564. OTHER EXPENSES: We budget for telephone and data communications service based On out curcent experience (5100 per month). In addition, $81, .9 is budgeted for publication zosts and $4,600 for mini-comnputer maintetance. This maintenance is an extension of our current contract with Digital Equipment Corporation and includes the prevailing 10% discount in the Stanford/vEC contract. We budget for data reduction and storage computing costs on the ACME tollaow-on machine (370/158) as follows, based on our ACN experienc? and current information on the follow-on System rate Structuce. We consume approximately 300,000 pajge-Minutes of computing per month on ACME for development and production computing. At a rate of $4.02 per paje-minute, this comes to $6,900 per month. In addition, we use approximately 82,000 per month for data storaye (29,096 ulocks at #10 per wlock pec month). This gives a total of 896,000 per year and applying a projected 30% discount rate for hijgh volume usage, leaves an estimated net cost of $64,900 per year. These estimates are increasel by 64 in succeeding years for increased usage and inflation. /- vt ~ BUDGET EXPLANATION - PART D This cudget covers the portion of the research program which extends the DENDRAL methodology to Carbon(13) suclear Magnetic Resonance Spectrometry. DERSONNEL: The personnel budget includes a salary tor Dr. &. Carhart after the expiration of his SIH Fellowship in 6/74, one Post voctoral kesearch Associate (to ba adied to the staft), and one half-tim2 Research Assistant (Mr. Van Antwerp). No funding is requested for Dr. Carl Djerassi's time (3%). A Computer Programmer (to be added to the staff) is budgeted in 1975 to assume tha additional anticipated programming duties. Salaries are increased by 5% per year and staff benefits are applied at standard University rates. These start at 17% in fiscal year 1974 (9/73 - 8/74) and increase per University projections to 16.3% in 9/74, 19.3% in 9/75, and 20.4% in G/7To. SUPPLIES: We budget 9900 for chemical supplies for the preparation ot test samples. TRAVELS: Wwe budyet $5090 to cover one 2ast coast trip. OTHER EXPENSES: Other expenses include $100 for publication and reproduction costs and $7,500 for usage of the existing NMR instrument in the bepattment of Chemistry. This NMR usaje is budgeted at Standard cates covering 25 hours of usage per month at 525 per hour. {fn addition, we budget tor use of the Stanford (5C7) 360/o7 computer where CM& analysis proyrams, at the current level of jevelopment, are run. these costs are computed on the f- he -9- basis of 1.5 hours of usage per month at approximately $000 per hour. frttk, BIOGRAPHIES Pte 7 Aare Tit BIRTHOATE (Ma, Gay, 7; Professur and Executive Head, LEDERBERG, JOSHUA Department of Genetics 5=2 3-25 PLACE OF BIRTH (City, State, Counvy! PRESENT NATIGNALITY (If non-US citizen, DEX Indicate kind of visa and expiration date) Montetair, Now Jersey U.S.A, [TY Male (77 Female ECUCATION (Sujfa with boccalaureate traming und include postdoctoral} ~ YEAR | SCIENTIFIC INSTITUTION ANDO LOCATION OEGREE CONFERRED FIELO Columbia College, New York B.A. 1944 College of Physicians & Surgeons, Columbia University, New York (1944-46) Yale University Ph.D. 1947 Microbiglozy HONORS 1957 - National Academy of Sciences 1958 ~ Nobel Prize in Medicine MAJOR RESEARCH INTEREST ROLE tN PROPOSED PROJECT Molecular Genetics; Artificial Intelligence PRINCIPAL INVESTIGATOR RESEARCH SUPPOAT (See in Sruczons} SEE ATTACHMENTS: RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with present position, ist trainiog end experience relevant io area of project, List ail Of most representaove pudlication:s Oa not exceed 2 pazes for earch individual.) . . 1961- —1959- 1957-1959 1957 1950 1947-1959 1946-1947 1945-1946 Stanforc University Director, Kennedy Laboratories. for Molecular Medicine ... ._ Professor, Genetics and Biology, and Executive Head, Department of Genetics, Stanford University University of Wisconsin Chairman, Department of Medical Genetics Melbourne University, Australia Fullbright Visiting Professor of Bacteriology University of California, Berkeley Visiting Professor of Bacteriology University of Wisconsin Professor of Genetics Yale University. Research Fellow of the Jane Coffin. Childs Fund for. Medical Research Columbia University. Research Assistant in Zoology Professional Activities: 1967- NIMH: National Mental Health Advisory Council 1961-1962 President (Kennedy)'s Panel on Mental Retardation 1960- NASA Committees: Lunar and Planetary Missions Board 1958-" National Academy of Sciences: Committees on Space Biology 1950-. President's Science Advisory Committee panels: National Institutes of Health, National Science Foundation Study sections (genetics) RHS-398 Rev. 3-70 1) 2) 3) 4) 5) 6) 7) Grant Number NAS A:NGR-05-020 NIH: AIT-05160 NIH: RR-00311 NIH: GM- NIH: RR-00785 NIH: Computer Lab- oratory Health Care Resource Program NIH:GM00295 RESEARCH SUPPORT SUMMARY FOR JOSHUA LEDERBERG Grant Title Cytochemical Studies of Planetary Micro-organisms Genetics of Bacteria Advanced Computer for Medical Research (ACME) Stanford Medical School Facility Genetics Research Center (J. Lederberg,Principal Investigator Stanford University Medical Experimental Computer Facility (SUMEX) Successor to #3 Large Scale Screening of Body Fluids for Metabolic Signs of Disease with Computer-managed Gas Chromatographv and Mass Spectrometry Training Grant in Genetics Current Year $ 180,000 60,000 362 ,632 547,035 884 ,660 159,881 143,964 Total Award $3,800,000 280 ,000 2,612 ,632 (yrs 4-7) 2,609 , 383 5,960,417 900 ,238 756,650 Grant Term Budgeted 4 Time 9/60-8/73 4% (Future support dubious) 9/68-8/73 15% (Renewal pending) 1966-7/73 25% (see #5) 9/73-8/78 10% (Pending) 9/73-8/78 20% (Pending) 9/73-8/78 10% (Pending, Program funds impounded) 7/69-6/73 15% (Renewal pending) SELECTED LIST OF PUBLICATIONS Lederberg, J., 1959 A View of Genetics Les Prix Nobel en 1958: 170-89. Buchs, A., A. B. Delfino, A. M, Duffield, C. Djerassi, B. G. Buchanan, E. A. Feigenbaum, and J. Lederberg, 1970. Applications of Artificial Intelligence for Chemical Inference, VI. Approach to a general method of interpreting low resolution mass spectra with a computer. Helvitia Chimica Acta 53 (6): 1394-1417. Feigenbaum, E. A., B. G. Buchanan, J. Lederberg, 1971 On generality and problem solving: a case study using the DENDRAL program in Machine Intelligence 6, (B. Meltzer and D. Michie, eds.), Edinburgh University Press, P. 165-190. Reynolds, W. E., V. A. Bacon, J. C. Bridges, T. C. Coburn, B. Halpern, J. Lederberg, E. C. Levinthal, E. Steed, R. B. Tucker, 1970. A Computer Operated Mass Spectrometer System. Analytical Chem. 42:1122-1129, September 1970. Lederberg, J. "Use of Computer to Identify Unknown Compounds: The Automation of Scientific Inference" in Biochemical Applications of Mass Spectrometry (G. R. Waller, ed.). John Wiley & Sons, New York (in press). DELIION Ht ~ PRIVILEGED COMMUNICATION Principal Investigator: Carl Djerassi BIOGRAPHICAL SKETCH (Give the following information for all Professional personnel listed on page 3, beginning with the Principal Investigator. se continuation pages and follow the same generai format for each person.) NAME TITLE BIRTHDATE (Mo., Day, Yr.} Carl DJERASS! Professor of Chemistry October 29, 1923 PLACE OF BIRTH (City, State, Country) PRESENT NATIONALITY f/f non-US citizen, SEX indicate kind of visa and expiration date} Vienna, Austria U.S.A, Bl Matle (Female EDUCATION (Begin with baccalaureate training and include postdoctoral) * YEAR SCIENTIFIC (NSTI N TUTION AND LOCATION DEGREE CONFERRED FIELD Kenyon College A.B. (summa 1942 Chemistry, Biology cum laude) University of Wisconsin Ph.D. 1945 Organic chemistry, Biochemistry (minor) HONORS Hon. D.Sc., Natl. Univ. of Mexico (1953), Kenyon College (1958), Worcester Polytechnic Institute (1972); Hon. Prof., Fed. Univ. Rio de Janeiro (1969). Member U.S. National Academy of Sciences, American Academy of Arts and Sciences, foreign member, Royal Swedish Academy of Sciences f a ienti leopoldina), Brazilian Academy of Sciences, (cont, below) MAJOR RESEARCH INTEREST Kigt, prod, chemist _|ROLE IN PROPOSED PROJECT res} and (steroids, alkaloids, terpenoids, antibiot . . chem, applicgtions of physical methods (mass Principal Investigator ec,, Optical rotatory dispersion, circular AESeAd CH SUPeORT (See instructions] dichrotsm), Current Total % Time Grant Title Period Year Budgeted Effort NIH AM 04257 Mass Spectrometry in 10/1/70 to $56,833 $316,016 10% Organic and Biochemistry 9/30/75 NIH GM AM Marine Chemistry with 1/1/73 to 112,550 578,180 18% 06840-15 special emphasis on steroids 12/31/77 This is a pending application which, if approved, will represent a renewal of my current NIH Grants No. GM 06840 and No. AMCA~12785, both of which expire in 1973. RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with present position, list training and experience relevant to area of project. List all Or most representative publications, Do not exceed 3 pages for each individual, } Academic Experience: Professor of Chemistry, Stanford University, 1959-present. Associate Professor (1952-1954) and Professor (1954-1959), Wayne State University. Industrial Research Experience: Ciba Pharmaceutical Co., Summit, N.J.: Research Chemist, 1942-1943 and 1945-1949, Syntex Corporation: Associate Director of Chemical Research (Mexico City) 1949-1952, Research Vice President (Mexico City) 1957-1960; (Palo Alto, California) 1960-1968, President, Syntex Research 1968-present. Editorial Boards: (Current) Journal of the American Chemical Society, Steroids, Tetrahedron, Organic Mass Spectrometry. (continued on next page) Honors (cont.) Mexican Academy for Scientific Investigation. Hon. Fellow of Phi Lambda Upsilon, Amer. Academy of Pharmaceutical Sciences, British Chemical Society and Mexican Chemical Society, Phi Beta Kappa. Numerous hon, lectureships including 1964 Centenary Lecturer (The British Chemical Society) and 1969 Annual Chemistry Lecturer, Royal Swedish Academy of Engineering. American Chemical Society Award in Pure Chemistry (1958), Baekeland Medal (1959), Fritzsche Award (1960), Intra-Science Research Foundation Award (1969). Freedman Patent Award of American Institute of Chemists (1971). Foreign Member, Royal Swedish Academy of Sciences (1972). D.Sc. (hon.), Worcester Polytechnic Institute (1972). Scheele-Lecturer, Pharmaceutical Society of Sweden (1972); American Chemical RHS-398 Society's Award for Creative Invention Gos Rev. 3-70 : age 5 U.S. GOVERNMENT PRINTING OFFICE : 1971 O - 451-736 DO NOT TYPE IN THIS SPACE-BINDING MARGIN BIOGRAPHICAL SKETCH (C. Djerassi) Continuation page Principal Investigator:Carl Djerassi RESEARCH AND/OR PROFESSIONAL EXPERIENCE (cont.) Miscellaneous: Chairman of the AAAS Gordon Research Conferences on Steroids and Natural Products (1952-1954); Member of American Pugwash Committee (1968 to present); Chairman of Latin America Science Board of National Academy of Sciences (1966-1968); Chairman of National Academy's Board on Science and Technology for International Development. PUBLICATIONS Author or co-author of 750 publications and six books. Approximately 150 papers and one book | deal with various applications of chiroptical methods in organic and biochemistry. PNS-398 Page Rev. 2-69 GPO ; 1969 © - 380-360 SECTION 1! — PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH (Give the following information for all Professional personnel listed on page 3, beginning with the Principal Investigator. Use continuation pages and foltow the same general format for each person.} NAME TITLE BIRTHDATE (Mo., Day, Yr.) Principal Investigator, Feigenbaum, Edward A. DENDRAL Project 1-20-36 PLACE OF BIRTH (City, State, Country] PRESENT NATIONALITY (/f non-US, citizen, SEX indicate kind of visa and expiration date) Weehawken, New Jers U.S. Citi eehawken, New Jersey S. Citizen G9 Mate} Female EOUCATION (Begin with baccalaureate training and include postdoctoral) YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERRED FIELD Carnegie Institute of Technology Pittsburgh, Pennsylvania B.S. 1956 Electrical Engineering Ph.D, 1959 Behavioral Sciences. HONORS and memberships: American Psychological Association; Association for Computing Machinery (Member of the National Council 1966-68); American Association for the Advancement of Science, . MAJOR RESEARCH INTEREST ROLE IN PROPOSED PROJECT Artificial Intelligence Principal Investigator RESEARCH SUPPORT (See instructions} RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with present position, list training and experience relevant to area of project. List all or most representative publications, Do not exceed 3 pages for each individual.) 1965- Stanford University, Computer Science Department Faculty _ 1965-1968 Stanford University, Director, Computation Center 1963 Summer Research Training Institute in Computer Simulation of Cognitive Processes (National Science Foundation) 1962 Carnegie Corporation. Summer Research Training Institute in Heuristic Programming. Faculty member. 1960-1964 University of California, Berkeley Research-Center for Research in Management Science, 1960-196) Research-Center for Human Learning, 1961-1964 Assistant and Associate Professor, School of Business Administration, 1960-64 1957-1960 The RAND Corporation, Santa Monica, California 1956 IBM Scientific Computing Center, New York Selected Publications: "Applications of Artificial Intelligence for Chemical Inference I. The Number of Possible Organic Compounds. Acyclic Structures Containing C, H, O ana N", J. Am. Chem. Soe., 91, 2973 (1969). (Co-Author). "Applications of Artificial Intelligence for Chemical Inference II. Interpretation of Low Resolution Mass Spectra of Ketones", J. Am. Chem. Soc., 91, 2977 (1969). ' (Co-Author). RHS-398 Rev. 3-70 Publications of Edward Feigenbaum "Applications of Artificial Intelligence for Chemical Inference III. Aliphatic Ethers Diagnosed by their Low Resolution Mass Spectra and Nuclear Magnetic Resonance", J. Am. Chem. Soc., 91, 740 (1969). (Co-Author). “Heuristic DENDRAL: A Program for Generating hxplanatory Ulypotheses in Organic Chemistry", in Machine Intelligence h, Edinburgh University Press, 1969. (Co-Author). "Toward an Understanding of Information Processes of Scientific Inference in the Context of Organic Chemistry", in Machine Intelligence 5, Edinburgh University Press, 1970. (Co-Author). "A Heuristic Program for Solving a Scientific Inference Problem: Summary of Motivation and Implementation", Stanford Artificial Intelligence Project Memo No. 104, November 1969. (Co-Author). "Applications of Artificial Intelligence For Chemical Inference IV. Saturated Amines Diagnosed by Their Low Resolution Mass Spectra and Nuclear Magnetic Resonance Spectra", Journal of the American Chemical Society, 92, 6831 (1970). (Co-Author). "Applications of Artificial Intelligence for Chemical Inference V. An Approach to the Computer Generation of Cyclic Structures. Differentiation Between All the Possible Isomeric Ketones of Composition C6H100", Organic Mass Spectrometry, 4, 493 (1970). (Co-Author). "Applications of Artificial Intelligence for Chemical Inference VI. Avproach to a General Method of Interpreting Low Resolution Mass Spectra with a Computer", Chem. Acta Helvetica, 53, 1394 (1970). (Co-Author). "On Generality and Problem Solving: A Case Study Using the DENDRAL Program", in Machine Intelligence 6, Edinburgh University Press (1971). (Co-Author). "A Heuristic Programming Study of Theory Formation in Science", in proceedings of the Second International Joint Conference on Artificial Intelligence, Imperial College, London (September 1971). (Co-Author). "Applications of Artificial Intelligence for Chemical Inference VIII. An Approach to the Computer Interpretation of the High Resolution Mass Spectra of Complex Molecules. Structure Elucidation of Estrogenic Steroids", Journal of the American Chemical Society, 94, 5962-5971 (1972). (Co-Author). "Heuristic Theory Formation: Data Interpretation and Rule Formation", in Mechine Intelligence 7, Edinburgh University Press (1972). (Co-Author). "Applications of Artificial Intelligence for Chemical Inference X. Datsun. A Data Interpretation Program as Applied to the Collected Mass Spectra of Estrogenic Steroids", to be submitted. (Co-Author). SECTION I! — PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH (Give the following information for alf Professional personne! listed on page 32, beginning with the Principal Investigator. Use continuation pages and follow the same general format for each person,} NAME TITLE BIRTHDATE (Mo., Day, Yr.) Buchanan, Bruce G, Research Computer Scientist 7-7-0 PLACE OF BIRTH (City, State, Country} PRESENT NATIONALITY (/f non-US citizen, SEX indicate kind of visa and expiration date) St. Louis, Missouri U.S.Citizen . AZ) Mate (_] Female EDUCATION (Begin with baccalaureate training and include pastdoctoral) YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERRED FIELD Ohio Wesleyan University B.A. 1961 Mathematics Michigan State University M.A., Ph.D. | 1966 Philosophy HONORS Recipient of National Institutes of Health Career Development Award (1971-1976) Invited Speaker at 1972 National Institutes of Health Symposium on Numerical Methods in Chemistry (Washington) MAJOR RESEARCH INTEREST . ROLE IN PROPOSED PROJECT Associate Investigator RESEARCH SUPPORT (See instructions) RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with present position, list training and experience relevant to area of project. List alf or most representative publications, Do not exceed 3 pages for each individual.) 1972-present Research Computer Seientist, Stanford University 1966-1971 Research Associate, Stanford Artificial Intelligence Project Publications: "On the Design of Inductive Systems: Some Philosophical Problems". British Journal for the Philosophy of Science 20 (1969), 311-323. (Co-Author). "Applications of Artificial Intelligence for Chemical Inference IT. Interpretation of Low Resolution Mass Spectra of Ketones". Journal of the American Chemical Society, 91, 2977-2981 (1969). (CorAuthor). "Applications of Artificial Intelligence for Chemical Inference I. The Number of Possible Organic Compounds: Acyclic Structures Containing C, H, O and N". Journal of the American Chemical Society, 91, 2973-2976 (1969). (Co-Author). "Applications of Artificial Intelligence for Chemical Inference III. Aliphatic Ethers Diagnosed by Their Low Resolution Mass Spectra and NMR Data". Journal of the American Chemical Society, 91, 7440-5 (1969). (Co-Author). "Heuristic DENDRAL: A Program for Generating Explanatory Hypotheses in Organic Chemistry". Machine Intelligence 4, Edinburgh University Press (1969). (Co-Author). RHS-398 Rev. 3-70 Publications of Bruce Buchanan: "Toward an Understanding of Information Processes of Scientific Inference in the Context of Organic Chemistry". Machine Intelligence 5, Edinburgh University Press (1969). (Co-Author). "On Generaiity and Problem Solving: A Case Study Using the DENDRAL Program". Machine Intelligence 6, Edinburgh University Press (1969). (Co-Author). "Some Speculation About Artificial Intelligence and Legal Reasoning". Stanford Law Review, Vol. 23, No. 1, November 1970. (Co-Author). "Applications of Artificial Intelligence for Chemical Inference VI. Approach t+ @ General Method of Interpreting Low Resolution Mass Spectra with a Computer". Chemica Acta Helvetica, 53, 1394 (1970). (Co-Author). "An Application of Artificial Intelligence to the Interpretation of Mass Spectra", Mass Spectrometry Techniques and Appliances (1970). "Applications of Artificial Intelligence for Chemical Inference IV. Saturated Amines Diagnosed by Their Low Resolution Mass Spectra and Nuclear Magnetic Resonance Spectra". Journal of the American Chemical Society, 93, 6831 (1970). (Co-Author). "The Heuristic DENDRAL Program for Explaining Empirical Data". Proceedings of :FIP Congress 1971, Ljubljana, Yugoslavia. (Co-Author). "A Heuristic Programming Study of Theory Formation in Science". Proceedings of Second International Joint Conference on Artificial Intelligence, Imperial College, London (1971). (Co-Author). "Applications of Artificial Intelligence for Chemical Inference VIII. An Approach to the Computer Interpretation of the High Resolution Mass Spectra of Complex Molecules. Structure Elucidation of Estrogenic Steroids". Journal of the American Chemical Society, 1972. (Co-Author). "Heuristic Theory Formation: Data Interpretation and Rule Formation”. Machine Intelligence 7, Edinburgh University Press (1972). (Co~Author). "Review of Hubert Dreyfus! 'What Computers Can't Do: A Critique of Artificial Reason'", Computing Reviews (January, 1973). "Applications of Artificial Intelligence for Chemical Inference IX. Analysis of Mixtures Without Prior Separation as Illustrated for Estrogens". Submitted to the Journal of the American Chemical Society. (Co-Author). "Applications of Artificial Intelligence for Chemical Inference X. Datsum. A Data Interpretation Program as Applied to the Collected Mass Spectra of Estrogenic Steroids". To be submitted. (Co-Author). Memberships Association for Computing Machinery (ACM) Philosophy of Science Association American Association for Advancement of Science (AAAS) aotre gy er Loe, Bde cm fom mm . a foe wn nae “ +t wre CON AES Sop sat for awe it ara geeeral format for ech person | NAME . TITLE . BATE GATE (Yarra ae, Alan M. DUFFIELD Research Associate Vvecemncér 672555 PLACE OF BIRTH (City, State, Country] PRESENT NATIONALITY (/f non-US citizen, SEK indicate kind of visa and expiration date) Perth, Western Australia Australian, Permanent resident Termipcrant Vian Cot Male T Fay a EDNUCATION (Sera wrt oucculiureate treme sd include Qovtductoral) INSTITUTION AND LOCATION YEAR SCIENTIFIC OEGREE CONFERRED FIELS University of Western Australia B. Sc(lst Clhss Hons ) 1958 Organic Chemistry University of Western Australia Ph.D. 1962 Organic Chemsitry HONORS - wo MAJOR RESEARCH INTEREST ROLT IN PROPOSED PROJECT Applications of mass spectrometry to Organic Chemist/mass spectroscopist Piology and Biomedical Problems RESEARCH SUPPORT (See instructions) N/A RESEARCH AND/OR PROFESS, ONAL EXPER EN CE (Starurg with present cosigon, {ist {raining and experience reievant (0 area Of prs,est wisldn OF most representauve pubucations, Oo notexceed 3 Gages for each individual.) . we fet, Sete te het Research Associate, Department of Genetics, Stanford University 1970 School of Medicine 1969 - Head of the Mass Spectrometry Laboratory, Chemistry Department Stanford University . 1965 - 69 “esearch Associate, Department of Chemistry, Stanford University 1963 - 65 Posidoctoral Fellow, Department of Chemistry, Stantord University 1962 - 63 Postdoctoral Fellow, Department of Biochemistry, Stanford University . . School of Medicine. a PUBLICATIONS SINCE 1971 . . £ ¢} r 1. An Apolicetion of Artificial Intelligence to the Interpretation of Mass spectra. Mass Spectrometry, _B.W.G. Milne, Ed., John Wiley and Sons, i New York, 1971, pp, 121-178 . By B. G. duchanane A. M. Duffield and A. V. Robertson Any Bev. 3-79 10, il, 12, Mass Spectrometry in Structural ard Stereochemical Problems. CCIV. Spectra of Hydantoins.II. Electron Impact Induced Fragmentation of some Substituted Hydantoins. Org. Mass Spectr., 5, 551 (1971) By R. A. Corral, 0. 0. Orazi, A. M. Duffield and C. Djerassi Electron Impact Induced Hydrogen Scrambling in Cyclohexanol and Isomeric Methylcyclohexanols. Org. Mass Spectr., 5, 383 (1971) By R. H. Shapiro, S. P. Levine and A. M. Duffield Derivatives of 2-Biphenylcarboxylic Acid. Rev. Roumain. Chem., 16, 1095 (1971) By A. T. Balaban and A. M. Duffield Alkalcide aus Evonymus europaea L. Helv. Chim. Acta, S4&, 2144 (1971) By A. Kldsek, T. Reichstein, A. M. Duffield and F. Santavy Studies on Indian Medicinal Plarts. XXVIII. Sesquiterpene Lactones of Enhyura Fluctuans Lour. Structures of Enhydrin, Fluctuanin and Fluctuadin. Tetrahedron, 28, 2239 (1972). By E. Ali, P. P. Ghosh Dastidar, S. C. Pakrashi, L. J. Durham and A. M. Duffield The Electron Impact Promoted Fragmentation of Aurone Epoxides. Org. Mass Spectr., 6, 199 (1972) By B. A. Brady, W. I. O'Sullivan and A. M. Duffield The Determination of Cyclohexylanine in Aqueous Solutions of Sodium Cyclamate by Electron Capture Gas Chromatography. Anal. Letters, 4, 3C1 (2971) By M. D. Soloman, W. E. Pereira and A. M. Duffield Computer Recognition of Metastable Ions. Nineteenth Annual Conference cn Mass Spectrometry, Atlanta, 1971, p. 63 To By A. M. Duffield, W. £. Reynolds, D. A. Anderson, R. A. Stillman, Jr. and C. E. Carroll Spectrometrie de Masse. VI. Fragmentation de Dimethy1-2 ,2-dioxolanes-1,2- Insatures. Org. Mass Spectr., 5, 1409 (1971) By J. Kossanyi, J. Chuche and A. M. Duffield Chlorpromazine Metabolism in Sheep. II. In vitro Metabolism and Preparation of 3H-7-Hydroxychlorpromazine. ~ Journees D'Agressologie, 12 , 333 (1971) By L. G. Brooks, M. A. Holmes, I. S. Forrest, V. A. Bacon, A. M. Duffield and M. D. Solomon Mass Spectrometry in Structural and Stereochemical Problems. CCXVII. Electron Impact Promoted Fragmentation of O-Methyl Oximes of Some a,8-Unsaturated Ketones and Methyl Substituted Cyclonhexanones. Canadiau J. Chem., 50, 2776 (1972) By Y. M. Sheikh, R. J. Liedtke, A. M. Duffield and c. Djerassi PL ie eT an Ae Ud lor US2 CONE NUaHON 25-8 FOI fay the ame general tormat for each person. | NAME Wilfred E, PEREIRA TITLE BIRTHDATE (313., Sav, rr) Research Associate June 23 1976 PLACE OF BIRTH (City, State, Country} Madras, S, India PRESENT NATIONALITY f/f non-US crtizen, SEX indicate kind of visa and expiration @ste} Indian, Permanent Resident Immigrant Visa Gel Male Femsi2 EDUCATION (Begin with baccalaureate training and include postdoctoral} YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERRED FIELD Madras Medical College, Madras, India |B, Pharm 1960 Pharmaceutical Cheristry Saugar Univ, Madhya Pradesh, India M. Pharm 1962 Pharm. Chem & Cherm of Natu: U.C. Med. Center, San Francisco, Calif | Ph.D. 1968 Pharm. Chem & Pharmacoloey HONORS MAJOR RESEARCH INTEREST Identification of Metabolites & drug metabolites in Biological fluids ROLT IN PROPOSED PROJECT Organic chemist RESEARCH SUPPORT (See instructions) RESEARCH AND CR PROFESSIONAL EXPERIENCE (Starting with oresent position, fist training and experience relevant to area of projecte Listas or mgst representatve pubications, Do not exceed 3 pages far each individual.) : 4 190 - 1970 1970 ~ present Research Associate During these four years I have been and synthetic organic chemistry. Post Doctoral Fellow, Dept. of Genetics Stanford University Med. School same institution LD involved with peptide synthesis, amino acid analysis I helped develop methods for the separation of diasterioisomers by gas chromatography and have been involved with the routine use of gas chromatography ass spectrometry and pathological urine and serum samples, for the identification of urinary metabolites in normal My applications of mass Spectrometry have included the deveoloment of mass fragmentography for the determination of the amino acid contents of soil and PXLARMAX serum. My present project involves the screening of urine from leukemic patients for abnormal metabolites and to investigate the metabolic fate of anti-leukemic chemotheropeutic agents i. 2. in the body. PUBLICATIONS Transesterification with an Anion-exchange Resin: W. Pereira, V. Close, W, Patton and B. Halpern, J. Org. Chem. 34:2032 (1969). Alcoholysis of the Merrifield-type Peptide-polymer Bond with an Anion Exchange Resin; W. Pereira, Ve Ay Close, E, Jellum, W, Patton and 8B. Halpern, Australian J. of Chem. 22:1337 (1969). AnS98 Rev. 370 13. 14. 15. 16. 1?. 18. 19, 20, Publications Thermal Fragmentation of Quinoline and Isoquinoline N-Oxides in the loz Source of a Mass Spectrometer. : Acta Chem. Scand., 26, 2423 (1972). By A. M. Duffield and 0. Buchardt Applications of Artificial Intellisence for Chemical Inference. VIZ. An Approach to the Computer Interpretation of the High Resolution Mass Spectra of Complex Molecules. Structure Elucidation of Estrogenic Steroi¢s. J. Amer. Chem. Soc., 94, 5962 (1972) By D. H. Smith, B. G. Buchanan, R. S. Englemore, A. M. Duffield, A. Yeo, E. A. Feigenbaum, J. Lederberg and C. Djerassi Mass Spectrometry in Structural and Stereochemical Problems. CCXIX. Identification of a Unidirectional Quadruple Hydrogen Transfer Process in 7-Phenyl-hept-3-en-2-one O-Methyl Oxime Ether. Org. Mass Spectr., 6,1271 (1972). By R. J. Liedtke, Y. M. Sheikh, A. M. Duffield and c. Djerassi An Automated Gas Chromatographic Analysis of Phenylalanine in Serun. Clinical Biochem. , 5, 166 (1972) | By E. Steed, W. Peraira, 3B. Halpern, M. D. Solemen and A. M. Duffield Pyrrolizidine Alkaloids. XIX. Structure of the Alkaloid Erucifoline, Coll. Czech. Chem. Commun., (1972) . By P. Sedmera, A. Klasek, A. M. Duffield and F, Santavy. Mass Spectrometry in Structural and Stereochemical Problems, COMNIZ, Delineation of Ccmpeting Frasmentation Pathways of Comzlex Molecule from a Study of Metastable Ion Transitions of Deuterated lerivarive Org. Mass Spectr., 7, (1973) By D. H. Smith, A. M. Duffield and C. Djerassi L w Chlorination Studies I. The Reaction of Aqueous Hypoch lorous Acid with Cytosine. Biochem. Biophys. Res. Commun., 48, 880 (1972) By W. Patten, V. Bacon, A. M. Duffield, B. Halpern, Y. Hoyano, “%. Pereira and J. Lederberg A Study of the Electron Impact Fragmentation of Promazine Sulshoxide and Promazine using Specifically Deuterated Analogues. Austral. J. Chem., 26, (1973). By M. D. Solomon, R. Summons, W. Pereira and A. M. Duffield Spectrometric de Masse. VIII. Elimination d'eau Induite par Impact Electronique dans le Tetrhydro-1,2,3,4-naphtalenediol-1,2. Org. Mass. Spectrom., 7 (1973). By P. Perros, J. P. Morizui, J. Kossanyi and A. M. Duffield The Determination of Phenylalanine in Serum by Mass Fragmentography Clinical Biochem., submitted for publication (1973). . By W. E. Pereira, V. A. Bacon, Y. Hoyano, R. Summons and A. M. Duffield 36 56 T. 9. 10. the Action of Nitrosyl Chloride on Faenylalanine Peptides; W. Patton, E, Jellum, D. Nitecki, W. Pereira and B. Halpern, Australian J. of Chem, 22:2709 (1969). Abnormal Circular Dichroism of & «Amino Acid Esters; J. Cymerman Craig and W. E. Pereira, Tet. Let. 18:1563 (1970). The Use of (+)-2,22-Trifluoro-1-Phenylethylhydrazine in the Optical Analysis of Asymmetric Ketones by Gas Chromatography ; W. E. Pereira, M. Solomon and B. Halpern, Australian J. of Chem.24:1103 (1971). The Microsomal Oxygenation of Ethyl Benzene. Isotopic, Stereochemical, and Induction Studies; R. E, McMehon, H. R. Sullivan, J. Cymerman Craig and W. E. Pereira, Arch. Biochem. Biophys. 132:575 (1969). The Steric Analysis of Aliphatic Amines with Two Asymmetric Centers by Gas-liquid Chromatography of Diastereoisomeric Amides, W. E. Pereira and B. Halpern, Australian J. Chem. 25:667 (1972). Optical Rotatory Dispersion and Absolute Configuration -XVII, A -Alkylphenylacetic Acids; J. Cymerman Craig, W. E. Pereira, B. Halpern and J. W. Westley, Tetrahedron 27:1173 (1971). The Optical Rotary Dispersion and Cire dar Dichroism of o-Amino and A-Hydroxy Acids; J. Cymerman Craig and W. E. Pereira Tetrahedron 26:3457 (1970) . The Determination of Cyclohexylamine in Aqueous Solutions of Sodium Cyclamate by Electron-capture Gas Chromatography; M. D. Solomon, W, E, Pereira and A. M. Duffield, Anal, Let. 4:301 (1971). Publications continued- ll. le. 13. 14, 15. 16. iT. 18, 19. 20. Chlorination Studies. I. The Reaction of Aqueous Hypochlorous Acid with Cytosine; acca W. Patton, V. Brown, A. M. Duffield, B. Halpern, Y. Hoyano, W. Pereira and J. Lederberg, Biochem. Biophys. Res. Commun, 48:880 (1972). The Use of R-(+)-1-Phenylethylisocyanate in the Optical Analysis of Asymmetric Secondary Alcohols by Gas Chromatography; W. Pereira, V. A. Bacon, W. Patton, B. Halpern, and G. E. Pollock, Anal, Let. 3:23 (1970). A Rapid and Quantitative Gas Chromatographic Analysis for Phenylalanine in Serum; B. Halpern, W., E. Pereira, M. D. Solomon and E, Steed, Anal. Biochem. 39:156 (1971). Electron-impact Promoted Fragmentation of Alkyl-N-(1-Phenylethyl)- Carbamates of Primary, Secondary and Tertiary Alcohols; W. E. Pereira, B, Halpern, M. D. Solomon and A. M. Duffield, Org. Mass Spectrometry 5:157 (1972). Peptide Sequencing by Low Resolution Mass Spectrometry; V. Bacon, E. Jellum, W. Patton, W. Pereira and B, Halpern, Biochem. Biophys. Res. Commun. 37:878 (1969). A Gas Liquid Chromatographic Method for the Determination of Phenylalanine in Serun; E, Jellum, V. A. Close, W. Patton, W. Pereira and B, Halpern, Anal. Biochem, 31:227 (1969). Quantitative Determination of Biologically Important Thiols and Disulfides by Gas Liquid Chromatography} E. Jellum, W. Patton, V. A. Bacon, W. E. Pereira and B. Halpern, Anal, Biochem, 31;339 (1969). . A Study of the Electron Impact-promoted Fragmentation of Promazine Sulfoxide and Promazine Using Specifically Deuterated Analogues; M. D. Solomon, R. Summons, W, Pereira and A. M, Duffield, Australian J. Chem. (1973, in press). The Determination of Phenylalanine in Serum by Mass Fragmentography; + Pereira, V. A. Bacon, Y. Hoyano, R. Summons and A. M. Duffield, Clin. Biochem, (In press). Chlorination Studies II, The Reaction of Aqueous Hypochlorous Acid with ~-Amino Acids and Dipeptides; W. E. Pereira, Y. Hoyano, R. Summons, V. A. Bacon and A, M, Duffield, Biochem.et Biophys. Acta (In press), BIOGRAFAICAL SKETCH (Give the following information for a'!l profissioral cerscnr 4 listed on pase 3, beginning with the Principal {nvastigstor, Use continuation pes and follow the same general format for eech person, } NAME TITLE BIRTHDATE (da, Osy, ¥r) Thomas C. Rindfleisch Research Associate 12-10-41 PLACE OF BIATH (City, State, Country] PRESENT NATIONALITY (/f non-U.& citizen, SEX 7 indicate kind of visa and expiration date} J c Oshkosh, Wisconsin, USA USA XS male ems EDUCATION (egin with baccalaureate traming end includo postdoctoral] YEAR SCIENTIFIC { NSTITUTION AND LOCATION DEGREE CONFERRED FIELD Purdue University, Lafayette, Ind. B.S 1962 Physics California Institute of Technology, _ M.S 1965 Physics Pasadena, CA Ph.D Thesis to bd completed. All course work jand examinations completed. HONORS Purdue University, Graduated with Highest Honors, Sigma xi. MAJOR RESEARCH INTEREST ROLE IN PROPOSED PROJECT Space scilencee, computer science and imaze processing Technical Support HESEAKCH SUPPORT (Soe fastrucuons) RESEARCH ANO/OR PROFESSIONAL EXPERIENCE (Starting with presant position, dist training end experience retevant Co ares Of project Listas Of Most representative publication, Do not exceed 3 popes for esch individwal.} 1971-Present Stanford University Medical School, Department of Genetics, Stanford, CA. / Research Associate - Mass spectrometry, Instrumentation research. 1962-1971 Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA. Relevant Experience: 1969-1971: Supervisor of Image Processing Development and Applications Group. 1968-1969: Mariner Mars 1969 Cognizant Engineer for Image Processing 1962-1968: Engineer - design and implement image processing computer software. 1. Rindfleisch, T. and Willingham, D., "A Figure of Merit Measuring Picture Resolution," JPL Technical Report 32-666, September 1, 1965. 2. Rindfleisch, T. and Willinghan, D., "A Figure of Merit Measuring Picture Resolution," Advances in Electronics and Electron Physics, Volume 22A, Photo~Electronic Image Devices, Academic Press, 1966. USS ame Thomas C. Rindfleisch PUBLICATIONS (cont'd) 3. 4, 5. 8, 9, Rindfleisch, T., "A Photometric Method for Deriving Lunar Topographic Information," JPL Technical Report 32-786, September 15, 1965. Rindfleisch, T., "Photometric Method for Lunar Topography," Photo- grammetric Engineering, March 1966, : Rindfleisch, T., "Generalizations and Limitations of Photoclinometry,” JPL Space Science Summary Volume III, 1967. Rindfleisch, T., "The Digital Removal of Noise fron Imagery," JPL Space Science Summary 37-62 Volune III, 1970. Rindfleisch, T., "Digital Image Processing for the Rectification of Television Camera Distortions," Astronomical Use of Television-Type Image Sensors, NASA Special Publication SP-256, 1971. Rindfleiech, T., Dunne, J., Frieden, H., Stromberg, W., and Ruiz, R., “Digital Processing of the Mariner 6 and 7 Pictures,” Journal of Geophysical Research, Voluze 76, Number 2, January 1971, Rindfleisch, T., "Digital Image Processing," To be published, IEEE Special Issue, July 1972. ° SECTION UW — PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH (Give the following information for all professional personnel listed on page 3, beginning with the Principal Investigator. Use continuation pages and follow the same general format for each person.) NAME : TITLE Dennis H. Smith Research Associate BIRTHDATE (Mo., Day, Yr} 11/12/42 PLACE OF BIRTH (City, State, Country) PRESENT NATIONALITY (ff non-U.& citizen, SEX indicate kind of visa and expiration date) New York USA aa Male (J Female EDUCATION (Begin with baccalaureate training and include postdocteral) YEAR SCIENTIFIC INSTITUTION ANDO LOCATION DEGREE CONFERRED FIELD Massachusetts Inst. of Technology Cambridge, Mass. S.B. 1964 Chemistry University of California, Berkeley Berkeley, California Ph.D. 1967 Chemistry HONORS Alfred P. Sloan Foundation Scholarship NASA Predoctoral Traineeship Phi Lambda Upsilon, Sigma Xi MAJOR RESEARCH INTEREST Mass Spectrometry and A.1I. in Chemistry RESEARCH SUPPORT (See instructions) N/A ROLE iN PROPOSED PROJECT Research Associate RESEA RCH AND/OR PROFESSIONAL EXPERIENCE (Starting with present position, list training and experience relevant to area of project, List all Or most representative publications. Do not exceed 3 pages for each individual.) 1971-Present Research Associate, Stanford University, Stanford,Ca. 1970-1971 Visiting Scientist, University of Bristol, Bristol, England 1967-1970 Assistant Research Chemist, University of Calif.at Berkeley, Berkeley, Ca. 1965-1967 NASA Pre-Doctoral Traineeship, University of Calif.at Berkeley,Berkeley, Ca. Publications: See attached list. RHS-358 Rev. 3-70 DO NOT TYPE IN THIS SPACE-BINDING MARGIN Continuation page l. 10. ll. 12. 13. Publications: H. G. Langer, R. S. Gohlke, and D. H. Smith, "Mass Spectrometric Differential Thermal Analysis," Anal. Chem., 37, 433 (1965). S. M. Kupchan, J. M. Cassady, J. E. Kelsey, H. K. Schnoes, D. H. Smith, and A. L. Burlingame, "Structural Elucidation and High Resolution Mass Spectrometry of Gaillardin, a New Cytotoxic Sesquiterpene Lactone,'"' J. Amer. Chem. Soc. 88, 5292 (1966). D. H. Smith, Ph.D. Thesis, "High Resolution Mass Spectrometry: Techniques and Applications to Molecular Structure Problems," Dept. of Chemistry, University of California, Berkeley, California (1967). H. K. Schnoes, D. H. Smith, A. L. Burlingame, P. W. Jeffs, and W. DUupke, "Mass Spectra of Amaryllidaceae Alkaloids: The Lycorenine Series," Tetrahedron, 24, 2825 (1968). A. L. Burlingame, D. H. Smith, and R. W. Olsen, "High Resolution Mass Spectrometry in Molecular Structure Studies, XIV. Real-time Data Acquisition, Processing and Display of High Resolution Mass Spectral Data," Anal. Chem., 40, 13 (1968). A. L. Burlingame and D. H. Smith, "High Resolution Mass Spectrometry in Molecular Structure Studies II. Automated Heteroatomic Plotting as an Aid to the Presentation and Interpretaiton of High Resolution Mass Spectra Data," Tetrahedron, 24, 5749 (1968). W. J. Richter, B. R. Simoneit, D. H. Smith, and A. L. Burlingame, "Detection and Identification of Oxocarboxylic and Dicarboxylic Acids in Complex Mixtures bv Reductive Silylation and Computer-Aided Analysis of High Resolution Mass Spectral Data," Anal. Chem., 41, 1392 (1969). The Lunar Sample Preliminary Examination Team, "Preliminary Examination of Lunar Samples from Apollo 11," Science, 165, 1211 (1969). S. M. Kupchan, W. K. Anderson, P. Bollinger, R. W. Doskotch, R. M. Smith, J. A. Saenz Renauld, H. K. Schnoes, A. L. Burlingame, and D. H. Smith, "Tumor Inhibitors, XXXIX. Active Principles of Acnistus arborescens. Isolation and Structural and Spectral Studies of Withaferin A and Withacnistin," J. Org. Chem., 34, 3858 (1969). A. L. Burlingame, D. H. Smith, T. 0. Merren, and R. W. Olsen, "Real-time High Resolution Mass Spectrometry," in Computers in Analytical Chemistry (Vol. 4 in Progress in Analytical Chemistry series), C. H. Orr and J. Norris, Eds., Plenum Press, New York, 1970, pp. 17-38. The Lunar Sample Preliminary Examination Team, "Preliminary Examination of Lunar Samples from Apollo 12," Science, 167, 1325 (1970). D. H. Smith, R. W. Olsen, F. C. Walls, and A. L. Burlingame, "Real-Time Mass Spectrometry: LOGOS--A Generalized Mass Spectrometry Computer System for High and Low Resolution, GC/MS and Closed-Loop Applications," Anal. Chem., 43, 1796 (1971). A. L. Burlingame, J. S. Hauser, B. R. Simoneit, D. H. Smith, K. Biemann, N. Mancuso, R. Murphy, D. A. Flory, and M. A. Reynolds, "Preliminary Organic An- alysis of the Apollo 12 Cores," Proceedings of the Apollo 12 Lunar Science Conference, E. Levinson, Ed., M.1.T.Press, Cambridge, Mass. 1971, p. 1891. PHS-398 Page Rev. 2-69 GPO : 1969 © - 350-360 DO NOT TYPE IN THIS SPACE-BINDING MARGIN Continuation page 14, 15. 16. 17. 18. 19. 20. 21. 22. D. H. Smith, "A Compound Classifier Based on Computer Analysis of Low Resolution Mass Spectral Data," Anal. Chen., 44, 536 (1972). D. H. Smith and G. Eglinton, "Compound Classification by Computer Treatment of Low Resolution Mass Spectra-Application to Geochemical and Environmental Problems ,"Nature, 235, 325 (1972). D. H. Smith, N. A. B. Gray, C. T. Dillinger, B. J. Kimble, and G. Eglinton, "Complex Mixture Analysis - Geochemical and Environmental Applications of a Compound Classifier Based on Computer Analysis of Low Resolution Mass Spectra," "Advances in Organic Geochemistry 1971," M. R. v.Gaertner and M. Weher, Ed., Pergammon Press, Oxford, New York, Toronto, Sydney and Braunschweig, 1972, p.249. D. H. Smith, B. G. Buchanan, R. S. Engelmore, A. M. Duffield, A. Yeo, E. A, Feigenbaum, J. Lederberg, and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference, VIII. An Approach to the Computer Interpre- tation of the High Resolution Mass Spectra of Complex Molecules. Structure Elucidation of Estrogenic Steroids," J. Amer. Chem. Soc., 94, 5962 (1972). D. H. Smith, A. M. Duffield, and C. Djerassi, "Mass Spectrometry in Structural and Stereochemical Problems, CCXXII. Delineation of Competing Fragmentation Pathways of Complex Molecules from a Study of Metastable Ion Transitions of Deuterated Derivatives," Org. Mass. Spectrom., in press. B. R. Simoneit, D. H. Smith, G. Eglinton, and A. L. Burlingame, "Applications of Real-Time Mass Spectrometric Techniques to Environmental Organic Geochemistry, II. San Francisco Bay Area Waters," Arch. Env. Contam. and Tox., in press. D. H. Smith, B. G. Buchanan, R. S. Engelmore, H. Adlercreutz, and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference, IX. Analvsis of Mixtures Without Prior Separation as Illustrated for Estrogens," J. Amer. Chem. Soc., submitted for publication. D. H. Smith, B. G. Buchanan, W. C. White, E. A. Feigenbaum, J. Lederberg, and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference X, INTSUM. A Data Interpretation and Summary Program as Applied to the Collected Mass Spectra of Estrogenic Steroids," Tetrahedron, submitted. D. H. Smith, "Mass Spectrometry," Chapter X in Guide to Modern Methods of Instrumentatl Analysis, T. H. Gouw, Ed., Wiley-Interscience, New Yerl, 1972. PHS -398 Page Rev. 2-69 GPO: 1964 0 350.460 SECTION I! — PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH (Give the following information for all professional personnel listed on page 3, beginning with the Principal Investigator. Use continuation pages and follow the same general format for each person.) NAME TITLE BIRTHDATE (Mo., Day, Yr.) Sridharan, Natesa S. Research Associate 10~2-h6 PLACE OF BIRTH (City, State, Country) PRESENT NATIONALITY (If non-U.S. citizen, SEX indicate kind of visa and expiration date) Madras, India India; pending permanent residerlce YX] Mate (Female EOUCATION (Begin with baccalaureate training and include postdoctoral} YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERRED FIELD Indian Institute of Technology, Madras, Bachelor of India Technology 1967 Electrical Engineering State University of New York, Stony Brook | M.S. 1969 Computer Science Ph.D. 1971 Computer Science HONORS University Fellow 1968-1971 SUNY Stony Brook Graduate Assistant 1967-1968 SUNY Stony Brook emens'Award (awarded for top rank . An in Eisctrigal Engineering 1967 ITT Madras yoerens, Merit Sehoal arship- { MAJOR RESEARCH INTEREST ROLE IN PROBS SED PROJECT Computer Applications in Chemistry Research Associate and Medicine RESEARCH SUPPORT (See instructions) RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with present position, list training and experience relevant to area of project List alf or most representative publications, Do not exceed 3 pages for each individual.) i97l-present Research Associate, Heuristic Programming Project, Stanford University 1970-1971 Consultant, IAC Computer Company, Long Island, N.Y. "Heuristic Theory Formation: Data Interpretation and Rule Formation". Machine Intelligence, Volume VII, 1972. (Co-Author). "An Application of Artificial Intelligence to Organic Chemical Synthesis” Doctoral Dissertation, SUNY StonyBrook, August, 1971. RHS-398 Rev. 3-7C