NtH ANNUAL REPORT RR-612 1/1/73 - 12/31/73 TT. TIT. IV. ANNUAL REPORT FOR RESEARCH PROJECT: "RESOURCE RELATED RESEARCH -—- COMPUTERS AND CVEMISTRY" “TH GRANT 5 R24 RRO?S12-03 TABLE OF CONTENTS Page Application For Continuation 1 Progress Report A. Applications of Artificial Intelligence to M-s Spectrometry S B. Extensions of the Computer -Mass Spectrometry System 10 C. Extending the Theory of Mass Spectrometry by a Computer ey Signature of Principal Invest: + osr 28 Res-urce Finances A. Summary of Resource Expenditures 29 lt = nditure Details - “ocounts A, B&C 30 1. Budget > xrlanaticn - Justification 25 ce, Estimate - Next Budget Period 3h 1. Budget Explanation ~ Justification 36 D. Supporting Data for Admiris:rative Extensica 3 Appendix A. “Applications of Artificial Intelligence for Chemical Inference XI" R. "Aprlications of Artificial Intelligence for Chemical Inference IX. Analysis of Mixtures Withcut Prior Sepe: «tion as Tllustrated for Estrogens" "Mass Spectrometry in Structural and Stereochemical Probler.: | CCXXTT. Delineation of Competing Fragmentation Fathwavs of “omplex Molecules from a Study of Me-astable Jon ™ - dere Deaterated Derivatives" Table of Contents, Cont'd. D. "Mass Snectrometry in Structural and Stereochemical Problems, CCXXXIV. Applications of DADI, a Technique for Study of “etastable Ions, to Mixture Analysis" "Mass Spectrometry in Structural and Stereochemical Problems, CCXXXVIII. The Effect of Heteroatoms upon the Mass Svectrometric Fragmentation of Cyclohexanones" "Applications of Artificial Intelligence for Chemical Inference XII. Exhaustive Generation of Cyclic and Acyclic Tsomers" "Applications of Artificial Intelligence for Chemical Inference XIII. Labelling Objects Having Symmetry" "Heuristic DENDRAL: Analysis of Molecular Structures” "Computer Generation of Vertex Graphs" "Phe Quantitation of BAIB in Urine by Mass Fragmentography” "The Determination of Phenylalanine in Serum by Mass Fragmentography" "The Simultaneous Quantitation of Ten Amino Acids in Soil Extracts by Mass Fragmentography" "An Analysis of Twelve Amino Acids in Biological Fluids by Mass Fragmentography" "The Quantitation of B-Amino Esobutyric Acid in Urine by Mass Fragmentograephy" "The Determination of Ethanol in Blood and Urine by Mass Fragmentography" "A Study of the Flectron Impact Fragmentation of Promazine Sulphoxide and Promazine using Specifically Deuterated Analogues” "Mass Spectrometry in Structural and Stereochemical Problems. CCXXXVII. Electron Impact Induced Hydrogen Losses and Mig: ations in Some Aromatic Amides" "Spectrometrie de Masse. IX. : Fragmentations Induites par Impact Electronique de Glycols-a-En Serie Tetraline" "Spectrometrie de Masse. VIII. Elimination d'can Induite par Impact Electronique dans Le Tetrahydro-1,2,3, 4 -Napthtal-ene-diol-1,2 " "Chlorination Studies I. The Reaction of Aqueous Hypochlorcus Acid with Cytosine" Table of Contents, Cont'd. U. “Chlorination Studies II. The Reaction of Aqueous Hypochlorous Acid with “-Amino Acids and Dipeptides" "Chlorination Studies IV. The Reaction of Aqueous Hypochiorous Acid with Pyrimidine and Purine Bases" "Applications of Artificial Intelligence for Chemical Inference X. INTSUM. A Data Interpretation Program as Applied to the Collected Mass Spectra of Estrogenic Steroids" "Analvsis cf Behavior of Chemical Molecules: Rule F.rmation on Non-Homegeneous Classes of Objects" SECTION I form Approved Gudeet Bureau ne 42 QEPARTMENT OF HEALTH, EDUCATION, AND WELFARE PUBLIC HEALTH SERVICE APPLICATION FOR CONTINUATION GRANT 1, TITLE GRANT NUMBER Ginn 9: 5 R24 RROOLLZ-.1. REVIEW GROUP TYPE PROGRAM TOTAL PROJECT PERIOD trom, May 1, 1971 REQUESTED BUDGET PERIOD rom: Jan. 1, 1974 Through: April 30, 1974 __ throughs: April 30, 197% and Chemistry Resource-Related Research -- Computers JA. PRINCIPAL INVESTIGATOR OR PROGRAM DIRECTOR G. APPLICANT ORGANIZATION (Name and Address-Street, City, State, Tin Code! (Name and Address. Street, Cily. State, Zip Code} Dr. Edward A. Feigenbaum Stanford University Professor of Computer Science Stanford, California 94305 Stanford University Stanford, Ca. 95305 28. DIGRIE 26. $ . S. PHS ACCOUNT NUMBER - Ph.D. 458210 3D. DEPARTMENT, SERVICE !ABORATORY ©. TITLE AND ADDRESS OF OFFICIAL IN BUSINESS OFFICE OF APPLICANT _— ORGANIZATION Computer Science Department QE. MAJOR SUBDIVISION School of Humanities & Sciences 3. ORGANIZATIONAL COMPONINT TC RECEIVE CREDIT FOR INSTITUTIONAL GRANT PUFFOSES . . K. D. Creighton Deputy Vice President for Business & F*nar Stanford University Stanford, California 94305 + RESEARC4 INVOLVING HUMAN SUBJECTS (See (nstructions) & No =: Yes APPROVED: __._ Date 4. INVENTIONS (See Instructions) ~ #4 No OO Yes-previously reported (G Yes-not previously reported TELEPHONE INFORMATION @ PERFORMANCE SITE(S) Computer Science Department LIA. PRINCIPAL INVESTIGATOR Area Tele. No & Department of Genetics cote |} Departmert of Chemistry PROCITA DIRECTOR (Item 28) 321-2300 Stanford University 415 Ext. 487° Stanford, California 94305 22B. Name of business official (Item 6) 7 321-2300 _K. D. Creighton 415 Ext. 2251 11C, Name and title of administrative oo ty DIRECT COSTS REQUESTED FOR BUDGET PERIOD Kat. tteen” Butler 415 }321-2300 $61,412 Sponsored Projects Offifer Ext. 2883 172 CONGRESSIONAL DISTRi-> -F APPLICANT ORGANIZATION SHOWN IN ITEM 4 Congressional Distr:.: #17 13 DO MOT USE THIS Spact 128. COUNTY OF APPLICANT CF cANIZAT: IN SHOWN IN ITEM 4 Santa Clara +: “ERTIFICATION AND ACCEPTANCE. We, the undersigned, certify that the statements hereir. =.< . sept. as lo any grant awarded, the obligation to comply with Public Health Service ter... sac conditions in effect true and complete to the best af our know‘enge 2°” at the lime of the award. 15. SIGNATURES 15A. PRINCIPAL INVESTIGATOR OR PROGRAM DIRECTOR DATE (Sign: -es required on oviginal eng ¢ Use inh. “Per” sig- oot acceplahte ) 15B, OFFICIAL SIGNING FOR APPLICANS “SGANIZAT:9N DATE Pu OF B ED Whe 2-70 cans obsolete) PROJECT NO. (D0 NOT USE THIS Sts .£ Pee U.S. Department of 2 | ai bey the + rence Informa v henaes HEALTH, EDUCATION, AND WELFARE oe PUBLIC HEALTH SERVICE toe te publication or publication | saterences NOTICE OF RESEARCH PROJECT a | EOF PROJECT Resource~Related Research -- Computers and Chemistry iro NAUES, DEPARTMENTS, AND OFFICIAL TITLES OF PRINCIPAL INVESTIGATORS OR PROJECT DIRECTORS AND ALL OTHER PROPEL SSIOMAL PERSONNEL ENGAGED ON THE PROJECT. rdeerrd Feigenbaum, Principal Investigator; Bruce Buchanan, Computer Scientist; N. Sridharan, computer Scientist; Alan Duffield, Research Associate; Ray Carhart, Research Associate, Harold Brown, Research Associate; Geoff Dromey, Research Associate; Tom Rindfleisch, Research \esociate; Dennis Smith, Research Associate: Ernest Steed, Research Engineer; Nicholas Veoizades, Research Engineer! Robert Tucker, Computer Programmer; William White, Computer Pregrammer. Kael AND ADDRESS OF APPLICANT INSTITUTION Stanford University Stanford, California 94305 grat aBY OF PROPOSED WORK — (209 words or less — Omit Confidential dato.) 1 eee Science Information Exchange summaries of work in progress are exchonged with government ond private agencies supporting resesrch ir tee bicesciences and are forworded to investigators who request such informations Your summary is to be used for these purposes. A computer program, named Heuristic DENDRAL, has been developed in the Heuristic Programing Troject at Stanford to work specifically with the problem of interpreting mass spectra. It hos already demonstrated its ability to interpret the low resolution mass spectra of nliphatic ketones, ethers and amines and the high resolution mass spectra of estrogenic steroids. The objective of the proposed work is to expand the capabilities of the Heuristic qo computer program in a variety of ways in order to make it a tool of wider utility. The original proposal was broken into three separable, but related, proposals. All of then erhance the power of the mass spectrometer as a tool for organic and biochemists and enha. the effectiveness of the Stanford interdepartmental mass spectrometry facility (Medical School and Chemistry Department) as a research resource. New work is proposed. hin t'< criginal three parts. - A. Tne first part proposes to entend the analytic capabilities of the Heuristic RAL program to the mass spectra of complex organic compounds. [In particular, t:. ‘ficient implementation of the cyclic structure generating algorithm will be the fo: . tart B. This section of the proposal for computer interpretation of the msss spectra ci netabolites has been divided into two categories, reflecting instrumentation and labc~st surport for this goal. . Port BO(E). The first subpart is devoted to the improvement of GC/MS data systers Cabin ber and the coupling of cxtracted data to the Heuristic DENDRAL propre for soar sper ho tae PROFES [ONAL SCHOOL (medical, dental, etc.) WITH - SIGNATURE OF PRINCIPAL INVESTIGATOR DATE - WHICH THIS PROJECT SHOULD BE IDENTIFIED . Humanities and Sciences ‘ DO NOT WRITE BELOW THIS LINE - FOR OFFICE USE ONLY SUPPORTING AGENCY € 31) NE SUPPORT (Check one) 1 YT fess y Stoff Negotiated Special Research Other : a teoonyral) Corteoct Project Gront Grant (Specify) -OBLIGATED CURPENT F.Y. | NUMBER OF FUTURE YEARS TENTATIVELY ASSURED | BEGINNING DATE ESTIMATED ~ BEYOND CURRENT FISCAL YEAR CORPL ED Ter ‘ NIH - 1946 (FORMERLY PHS- 166) REV, 6.72 mmMARY OF PROPOSED WORK ee Part B(ii). The second subpart is aimed at analysis of body fluids, such as urine, by €28 chromatography/mass spectrometry in the service of clinical problems. Part C. The last part is work aimed at extending the knowledge about mass spectrometry, and thus extending the power of the mass spectrometer, by using a computer to codify and reason about large collections of mass spectra. Progress Report Pact A. APPLICATIONS OF ARIIFICIAL INTELLIGENCE TO MASS SPECTROMETRY OBJECTIVES: Research activities carried out under Pact A of this project have been directed toward extending the reasoning power of Heuristic DENDRAL. Heuristic DENDRAL reresents a paradigm for attacking problems in one of the major areas of importance to any scientific discipline dealing with adlecules, the area of structure elucidation. We have focused our attention on the use of heuristic Programming techniques for analysis of mass Spectra and ancillary analytical data which can be obtained utilizing a mass spectrometer. It is convenient to discuss objectives, progress and plans by examining three broad areas of activity in research connected with Part A. We wish to note that these areas conform to our overall Strategy of PLAN-GENERATE-TEST. We have shown, earlier, how powerful this Strategy is when applied to the task of structure elucidation utilizing mass Spectral data. The areas and their objectives are the following: (I) PLANNER: (a) Extend the programs used for structure elucidation .o Structural analysis of complex molecules. (b) Assess the capabilities ani limitations of the PLANNER. (Cc) Generalize the programming techniques to reduce compound class dependence. (ad) Explore the utility of ancillary data available from the mass spectrometer, (II) STRUCTURE GENERATOR: ; (a) Complete the exhaustive, irredundant generator of molecular Structures. . (b) Develop efficient constraints 2n the generator to exploit. its potential utility. (Cc) Exploit the concepts developed for the structure generator in solving various structure-problems (related to m.s. and others) and isomer-problems. (III) PREDICTOR (a) Extend the Predictor to still pore complex molecular structures. (b) Explore the design of experimental strategies, utilizing Predictor functions, to differantiate among candidate solutions. We point out that the PLAN-GENERATE-TEST Strategy, although applied to Structure elucidation, has potential utility as a strategy for solving other chemical problems. Similarly, although we utilize mass spectral data almost exclusively, the same heuristic programming techniques allow facile extension to analysis of data from other types of analytical instrumentation. These were not objectives of the original research proposal but seen logical extensions for future work. We have illustrated the potential of these techniques for analysis of 13C NMR data (Carhart and Djerassi, 1973). fhis is discussed briefly under the PLANS Section, below. ‘PROGRESS: {I) PLANNER The function of the Planner is to analyze mass spectral data acquired dn a compound. The Planner attempts to derive structural information from these data using the rules of behavior of compounds in the mass spectrometer. Qbjective (a): Extend Programs. The Planner is presently embodied in a program which also contains a set of functions to assemble this structural information into complete molecules (a primitive Structure Generator) and to test these molecular structures with other, not necessarily mass spectral, rules {a pcrinitive Predictor). This performance program was written in this way to provide a useful tool for chemical studies while nore general versions of the Structure Generator and Predictor were being developed. This program and its performance have been described in some detail in a publication and in previous progress reports. A manuscript (Smith, et.al., 1973) has now appeared describing the application of this program to the analysis of mixtures of compounds without prior separation. Objestive (b): Assess Capabilities. We have extended the capabilities of the Planner so that we can analyz2 both low and high resolution mass spectral data. A low resolution mass spectrum is regarded by the program as a pseudo high resolution spectrum wherein possible elemental compositions of each peak are limited only by the inferred molecular formula of the compound. This results in more ambiguity with a commensurate increase in number of candidate solutions as would be expected considering the lower ~ specificity of low resolution data as compared to high resolution data. We have extended our capabilities for molecular ion determination utilizing a heuristic search technique through the space of plausible zaolecular ions. This technaiqgue has had significant success even when dealing with the low resolution mass spectra of compounds which display no molecular ion, for example the class of derivatized amino acids (trifluoracetyl, n-butyl esters) important to studies carried out unjier Pact B, below. We have segmented the performance program to decrease the amount of memory required for its operation. This should increase the chances for other groups to make use of the program. fhe limitations of the present performance program are primarily the requirement that some information about the class of compounds be kaown, and that, for each class, relatively detailed rules about the Rass spectral fragmentation of this class be available. The former linitation results primarily from the nature of the program in that a complete structure generator is not incorporated. The primitive structure generator available to the projram can only place substituents about an assumed skeleton. This limitation will be alleviated when a structure generator with GOODLIST and BADLISI constraints is available (see Structure Generator, below). The latter limitation is mor2 fundamental, but is characteristic of every spectroscopic technigae to one degree or another. It must be assumed that analysis of a mass spectrum, alone, may not lead to sufficiently unambiguous information about the structure of the compound yielding the spectrun. It is for this reason that extensions of the programming techniques to encompass data from other spectroscopic techniques are attractive. 3 Objective (c): Generalize Techniques. we have carried out several successful experiments to ensure that the perfornance program, used originally for analysis of estrogenic steroids, retains only procedures which are compound-class independent. By supplying fragmentation rules for other classes of compounds, we have successfully carried out structure elucidation of molecules in several diverse classes including other steroijal hormones and related compounds {progesterones, testosterones, androsterones), steroidal Sapogenins and derivatized amino acids. Objective (dj): Explore Utility Previous progress reports have sumnarized in some detail the ways in which data from ancillary techniques in mass spectrometry (metastable ion and low ionizing voltage data, labile hydrogen exchange) can be used by the program. The utility of metastable ions for aid in stucture elucidation continues as an active area of interest. Experience with the program has inspired studies on metastable ions, first, to help delineate the course of fragmentation of molecules with the purpose of extending and refining fragmentation rules used by the program (Smith, Duffield and Djerassi, 1973). Experience with the increased specificity of structural information with concomitant reduction in analysis time when metastable ion information is available (Smith, et.al., 1973) has led to a study of a new technique for detection and analysis of metastable ions (Direct Analysis of Daughter Ions, or DADI) and has illustrated the utility of this technique in mixture analysis (Smith, Djerassi, Maurer and Rapp, 1973). Experience with the PLANNER has led to several research activities relatei to, but not supported by, this grant. Our studies of estrogen aixtures isolated from pregnancy urine have suggested new compounds likely to be important in the human metabolism of estrogens. Some of these compounds are hitherto unreported structures and a synthesis progran is underway in Professor Djerassi's laboratory to produce some of these compounds. The Planner will be used as one method of comparison of the synthesized, authentic standards with those isolated from pregnancy urine. Work is also being carried out to explore the fragmentation of nodel Systems possessing two heteroatoms in close proximity. It is clear from the first of these studies (Block, Smith, and Djerassi, 1973) that the fragmentation of these difunctional systems does not reflect that of monofunctional analogs. More groundwork is required in this area t> obtain better fragmentation rules for these systens. II. STRUCTURE GENERATOR Objective {a): Complete the Generator The last progress report discussed the completion of both the basic Structure generator algorithm and program, which provide the capability for exhaustive generation of gcaph isomers of a given empirical formula, with prospective avoidance of duplicate structures. Since the tine of the submission of that report, manuscripts describing the structure generator, directed specifically to an audience of chemists, have been Submitted (Masinter, Sridharan, Lederberg, and Smith, 1973; Masinter and Sridharan, 1973). Some effort over the past year has been devoted to 4 verification of the completeness and irredundancy of the method. fe have extended existing combinatorial counting algorithms to check that the numbers of isomers generated are correct. We have used an interactive version of the generator to verify that variations (allowed by the algorithm) of the mechanism of generation yield the same set of isomers. In this way we are now increasingly confident that the program's performance accurately reflects the mathematically proven algorithm on which it is based. The Structure Generator has been briefly described, and placed in its context within Heuristic DENDRAL, in an invited paper presented at a NATO/CRS sponsored conference on Computer Representation and Manipulation of Chemical Information, held in Amsterdam in June, 1973 (Smith, Masinter and Sridharan, 1973). we have also begun to develop techniques to expand the scope of the generator. One example, which has been completed, is adding extensions to the CATALOG. The CATALOG contains the set of vertex-graphs fron which structures are assembled. The original CATALOG was not sufficient to generate all isomers of some potentially interesting compositions, 2.g., those involving graphs possessing nodes of degree >3. We now have a program which constructs complete sets of vertex-graphs containing nodes of degree >3 from the set of trivalent graphs in the original SATALIS. We have thus extended the capabilities of the generator. Othec such extensions are discussed in the PLANS section, below. Objective {b): Develop Constraints It is absolutely essential that we provide the mechanism for constraining the Structure Generator: without constraints it is merely a legal move generator, as in a chess-playing program. For structure elucidation problems, the Planner can determine many features of the molecular structure from various types of experimental data such as presence of functional groups, and the numbers of double bonds and rings. Partial information of this sort can be used to constrain the Structure Generator to the space of plausible candidate structures. From a graph-theoretic point of view, however, constraining the graph geherating algorithm is a difficult unsolved problen. We are presently formulating several types of constraints to apply to the Structure Generator. Some types of constraint await the development of new mathematical tools (see PLANS), while others can be immediately implemented with relatively minor alterations to the algorithm. The class of constraints preseatly receiving attention deals with types of unsaturation (rings or double bonds) desired in the final structures. Related to this constraint is the constraint of number of quaternary carbons present. The former information {number and nature of multiple bonds) is readily available from several spectroscopic techiigues, while the latter may be obtained from 13C NMR. The implementation of this class of constraints will be used as the model for future implementation of a GOODLISE {structural features known to be present) and a BADLIST {structural features known to be absent). It is possible that some ty pes of constraints may not be easily implemented within the algorithm. Thus, retrospective tests of isomers may be reauired to search for desired or unwanted features. We have developed some new approaches to graph matching which seen to be significantly more efficient than previous methods. Should prospective implementation of a constraint prove difficult, we will 5 have at our disposal some powerful graph matching tools to exercise the constraint. Objective {c): Exploit the Generator for Structure Elucidation we have demonstrated the utility of some subsystems of the structure generator, ©-+J-, the LABELLER, by exploring some problems of isomerism noted in the chemical literature. We have corrected the member and presented the identities of isomers formed by different substitutions of alkyl chains about a porphyrin nucleus. We are presently exploring some probleas of isomerism of carbocyclic ring systems, specifically C10H10 and (CH) 10 and C10H2n-4 tricyclic ring systems, n = 8 - 12, related to the mechanistics of isomeric interconversion. we have the complete list of all topologically possible 1176 6-membered Diels-Alder ring systems, using any combination of C,N,O and 5S. This list was generated using the PARTITIONER and an extended version of the LABELLER. These are all the 6-membered ring systems that can be embedded in structures resulting from the well-known Diels-Alder reaction. Of the 1176 possible ring systems, approximately 80% are unreported in the Ring Ind2x. Many of these are chemically unstable - undecscoring the need for a BADLIST implementation for the Cyclic Structure Generator. However, many of these unreported riny systems are certainly chemically plausible. Awareness of such gaps in relatively simple synthetic categories might lead to discovery of new categories of compounds with important biological effects. (III) PREDICTOR The function of the Predictor in the PLAN-GENERATE-TEST strategy is to perform a detailed evaluation of candidate solutions (structures) to a structure elucidation problem. It may use a more detailed model of spectroscopic behavior than that. embodied in a Planner to attempt to differentiate among possible solutions. Objective {a}: Extend the Program we have extended and generalized the Predictor used previously for saturated, aliphatic, monofunctional compounds. Given a list of structures and rules of fragmentation processes, it will predict a mass spectrum for each structure. Prediction of relative ion abundances is crude, but previous work has shown that even crude measures of ion abundance are usually satisfactory. The predicted spectrum can be matched then with the observed and candidates ranked according to the quality of the match. The program works with structures and rules of any complexity. An interesting philosophical question is how much kiaowledge should be brought to bear on interpretation of the data at the Planning vs. Predicting stages of analysis. It is our feeling that if more can be accomplished during Planning to constrain the Structure Senerator, the analysis will be more efficient. On the other hand, sone knowledge can be utilized only if a compiete structure is specified, so that its use is restricted to a predictive role. Moreover, Predictor Functions have a greater utility, as indicated in the subsequent™ section. Objective (b): Differentiate Structures The Predictor has a more obvious application in the design of 6 experimental strategies to differentiate among candidate structures. Rules of spectroscopic behavior utilized during Planning demand the presence of some data to evaluate. The Predictor can then be used to request additional data from any source to aid in differentiation. We have explored this approach by analyzing the spectrum of a compound with the performance program. The Predictor was used to evaluate the the set of candidate structures to define the minimum number of metastable defocussing experiments necessary to achieve a unique solution. Thus, no time need be spent acquiring unnecessary or redundant data. Clearly, this has important implications for future work in that many different types of data {e-g., NMR, IR) might be requested by the Predictor to facilitate identification. PLANS For th remaining period of this grant we propose to carry out the following extensions of the research outlined above. {I) PLANNER The major area of activity related to the present version of the Planner will be to focus our attention on using the program in support of chemical studies outlined under Part B (see below). The chemical extraction and derivitization procedures used in the analysis of body fluids restricts the types of compounds present in each separated fraction. Such simplifications make this a problem more amenable to attack. Only certain classes of compounds are present in each fraction, and we have some knowledge of the mass spectral fragmentation of these classes. We wish to couple the program to the results of library matching procedures so that we direct our efforts to structure elucidation of those components which have not been previously identified. This is particularly important in the context of analysis problens such as those discussed under Part B. We propose increasing the utility of the progran by removing two present constraints: {a) allow unspecified "dumny" atoms in the skeleton instead of requiring a rigidly fixed structural skeleton, and (b) allow fragmentation processes to be specified more flexibly - in particular, allow fragmentations in substituents on the skeleton instead of requiring all fragmentations to cut through the sxeleton. Although we are presently uncomfortable with immediate coupling of the Structure Generator to the Planner, we propose continued 2xploration of the problems of controlling the generator automatically. Actual implementation awaits a more comprehensive treatment of the problem of constraints. II. STRUCTURE GENERATOR The inclusion of a reasonable set of constraints is obviously required and will be the subject of most of our future development work. MWe plan to develop an interface to the present interactive version of the Structure Generator that speaks a more chemical language. This interface will be designed to avoid the present reguirement that the user know something about the program before he can use it. AS the Optimum method for implementation of a constraint is determined, the interface will be extended to translate the usual Specification of the constraint in chemical terms into rules acting at the level of the Ptogram. AS we stressed in development of the PLANNER, there are considerable advantages to building a powerful program in an 7 incremental fashion. These steps are logically directed to our longer term goal of developing a useful structure elucidation tool for the chemist, based on the structure generator. There are several other areas of interest which are peripherally related to the problem of constraints aad which will occupy our attention. The Structure Generator knows no chemistry other than atom names and their associated valence. There are several important areas where this is an immediate problem. For example, the program has no explicit awareness of the aromatic resonances, leading to a remediable redundancy in the list of isomers. An aromaticity-predictor is also indispensable for anticipating chemical behavior of a structure. We wish to deal with types. of isomers besides simple connectivity isomers. We need to have the facility for assembly of molecular sub-stroctures (the usual type of information inferred fron spectroscopic data) when such an assembly yields new rings or multiple bonds. All the above questions need a reexamination of the fundamental mathematical considerations. The present algorithm has been proven to yield complete and irredundant solutions. In devising new algorithms or variants of the present one, the burden of proof can be reduced to {the usually easier) equivalence to the previous algorithm. Professor Harold Brown, who was the mathematician instrumental in initial development of the labelling algorithms for structure generation, will be with us again for several months to help attack the problems outlined above. III. PREDICTOR Although the Predictor has been essentially finished for our own internal use, we propose to spend a modest amount of time in the coming months making it more usable by others. In particular, we wish to extend the initial work on predicting the new experiments necessary for distinguishing among candidate structures (e.g., predicting that a metastable peak at mass 70.1 would confirm one structure and disconfirm another). -In addition, we plan to work on cataloging some existing sets of mass spectrometry rules in such a way that the program can be easily used for different classes of problems. part A references (Published or submitted during year) Rp. Carhart and C. Djerassi, “Applications of Artificial Intelligence for Chemical Inference XI.... p.H. Smith, B.G. Buchanan, R.S. Engelmore, H. Adlercreutz, and c. Djerassi, "Applications of Artificial Intelligence for Chemical inference, IX. Analysis of Mixtures Without Prior Separation as Illustrated for Estrogens," J. Amer. Chem. Soc., September 5, 1973. D-H. Smith, A.-M. Duffield, and C. Djerassi, "Mass Spectrometry in Structural and Stereochemical Problems, CCXXII. Delineation of Competing Fragmentation Pathways of Complex Molecules from a Study of Metastable Ion Transitions of Deuterated Derivatives," Org. Mass Spectron., 7, 367 (1973). D.H. Smith, C. Djerassi, K.H. Maurer, and U. Rapp, “Mass Spectrometry in Structural and Stereochemical Problems, CCXXXIV. Applications of DADI, A Technique for Study of Metastable Ions, to Mixture Analysis," Je Amer. Chem. Soc., submitted (1973). JeM. Block, D.-H. Smith, and C. Djerassi, "Mass Spectrometry in Structural and Stereochemical Problems, CCXXXVIII. The Effect of Heteroatoms upon the Mass Spectrometric Fragmentation of Cyclohexanones," J. Org. Chem., submitted (1973). L-M. Masinter, N.S. Sridharan, J. Lederberg, and D.H. Smith, “Applications of Artificial Intelligence for Chemical Inference XII. Exhaustive Generation of Cyclic and Acyclic Isomers," J. Amer. Chen. Soc., submitted (1973). LeM. MasSinter and N.S. Sridharan, "Applications of Artificial Intelligence for Chemical Inference, XIII. Labelling Objects Having Symmetry, J. Amer. Chem. Soc., submitted (1973). D-H. Smith, L.M. Masinter, and N.S. Sridharan, “Heuristic DENDRAL: Analysis of Molecular Structures," to be published in the proceedings of the NATO/CRS Advanced Study Institute on Computer Representation and Manipulation of Chemical Information. N.S. Sridharan, "Computer Generation of Vertex Graphs", Stanford Computer Science Memo CS-73-381, Stanford University, July, 1973. Part B-i Gas Chromatograph - Mass Spectrometer Data Systen Development IBJECTIVES AND RATIONALE The objectives of this part of the research project are the improvement of GC/MS data systen capabilities and the coupling of extracted data to the Heuristic DENDRAL programs for analysis. We ultimately seek a substantial degree of interaction between the instrumentation and the analysis programs including computer specification and control of the data to be collected. in addition to the development goals, this portion of the project provides for the day-to-day operation of the GC/MS Systems in support of mass spectrum interpretation computer progran development (Parts A and C) and applications of GC and MS to biomedical and natural product sample analysis with collaborators. Our rationale for this approach is that the overall system should be designed for problem solving rather than just for data acquisition. This implies that analytical computer programs, after review of available experimental data, could be able to specify additional information needed to confirm a solution or distinguish between alternative solutions. Such requests could be passed back to an instrument management program to set up proper instrument parameters and collect the additional information. Dur initial objectives to implement an on-line, closed-loop System using the ACME computer facility have met with a number of difficulties. These grow principally out of ACME's limited computing capacity and commitments as a general time-sharing service. In addition, the scanning high resolution mass spectrometer has inherent Sensitivity limitations, which do not preclude a demonstration but Cather limit the practical Sample volume which could be analyzed. Until such limitations can b2 overcome, particularly in terms of computing support, we have focussed our efforts on an open-loop demonstration of such an approach. PROGRESS Progress has been made in demonstrating a GC/High Resolution Mass Spectrometry capability, in further developing automated data analysis algorithms, and in planning for the implementation of a data systen for the collection of metastable ion information. Progress in these and other areas directed toward the main research goals has been impacted by a transition in computing support which is still underway. This transition, discussed in more detail below, was occasioned by the phase-over of the ACME computing facility, which we had been using, from NIH grant Subsidy to a fully fee-for-service operation under Stanford University auspices, Summaries of the results and problems encountered in each of the areas follow. Gas Chromatography/High Resolution Mass Spectrometry (GC/HRHNS) de have verified the feasibility of combined gas chromatography/high resolution mass Spectrometry (GC/HRMS). Using programs described in Previous reports, we can acguire selected scans and reduce then automatically. The procedures are slow compared to "real-time" because of the limitations of the time-shared ACME facility. We have recorded sufficient spectra of standara compounds to show that the arr system is performing well. A typical experiment which illustrates some of the parameters involved was the following. A mixture (approximately 1 microgram/component) of methyl palmitate and nethyl stearate was analyzed by GC under conditions such that the GC peaks were well separated and of approximately 25 sec. duration. The masS spectrometer was scanned at a rate of 10.5 sec/decade, and a resolving power of 5000. The resulting mass spectra displayed peaks over a dynamic range of 100 to 1 and were automatically reduced to masses and elemental compositions without difficulty. Mass measur2ment accuracy appears to be 10 ppm over this dynamic range. we have begun to exercise the GC/HRMS system on urine fractions containing significant components whose structures have not been 2lucidated on the basis of low resolution spectra alone. Whereas more work is required to establish system performance capabilities, two things have become clear: 1) GC/HRMS can be a useful analytical adjunct to our low resolution GC/MS clinical studies (Part B-ii), and 2) the sensitivity of the present system limits analysis to relatively intense GC peaks. This sensitivity limitation is inherent in scanning instruments where one gives up a factor of 20-50 in sensitivity over photographic image plane systems in return for on-line data read-out. This limitation may be relieved by using television read-out systems in conjunction with extended channeltron detector arrays as has been proposed by researchers at the Jet Propulsion Laboratory. We can nevertheless make progress in applying GC/HRMS techniques to accessible effluent peaks and can adapt the improved sensor capability when available. These experiments have also shown that the ACME computer facility cannot reliably provide the rapid service required to acquire and file repetitive spectometer scans. This problem is to be expected in a heavily used time-shared facility without special configuration for high rate, real time support. Excepting possible requirements for real time data analysis {such as ina closed-loop system), this problen could be solved by implementing a large local buffer (e.g., disk) at the front-end data acquisition mini-computer. We are 2xploring this possibility in conjunction with the overall planning for computer support discussed below. Data Analysis Algorithms A. Peak Resolution One of the significant trade-offs to be made in GC/HRMS is that of sensitivity versus resolution. In maintaining high instrument resolution (in the range of 5,000-10,000) while scanning fast enough to analyze a GC effluent peak (approximately 10 sec/decade), systen Sensitivity is constrained as discussed above. We have worked on a method for reducing instrument resolution requirements through more Sophisticated computer analysis of a lower resolution output. In effect this transfers the burden of Overlapping peak detection and hass determination to the conputer instead of requiring inherently well resolved data out of the instrument. The advantage comes in better system sensitivity. Unresolved peaks are separated by an analytical algorithm, the Operation of which is based on a model peak derived from known Singlet peaks in the data. Actual tabulated peak models are used rather than the assumption of a particular parametric shape (e.g.e, triangular, Gaussian, etc.). This algorithm provides an effective 1/ increase in system resolution by approximately a factor of three thereby effectively increasing system sensitivity. By measuring and comparing successive moments of the Sample and model peaks, a series of hypotheses are tested to establish the multiplicity of the peak, minimizing computing requirenents for the usually encountered simple peaks. Analytic expressions for the amplitudes and positions of component peaks have been derived in the doublet case in terms of the first four moments of the peak complex. This eliminates tine consuming iteration procedures for this important multiplet case. Iteration is still required for more complex multiplets. B. GC Analysis The application of GC/MS techniques to clinical problems as described in Part B(ii) of this proposal has indicated the desirability of automating the analysis of the results of a GC/MS experiment. The SC/MS output involves extracting from the approximately 700 spectra collected during a GC run, the 50 or so representing components of the body fluid sample. The raw spectra are in part contaminated with background "column bleed" and in part composited with adjacent constituent spectra unresolved by the Gc. we have begun to develop a solution to this problem with promising results. By using a disk-oriented matrix transposition algorithm, the array of 700 spectra by 500 mass samples per spectrum can be rotated to gain convenient access to the "mass fragmentogram" form of the data. The transposition algorithm avoids many successive passes over the input data file as would be required ina straightforward approach. By generating a reorganized intermediate file, tine Savings by factors of 5-10 are achieved. The fragmentogram form of the data displayed at a few selected mass values, has been used at Stanford, MIT, and elsewhere for some tine to evaluate the GC effluent profile as seen from these masses. Mass fragmenotograms have the important property of displaying higher resolution in localizing GC effluent constituents. Thus by transposing the raw data to the mass fragmentogram domain, we can Systematically analyze these data for baselines, peak positions, and amplitudes, and thus derive better mass spectra for the relatively few constituent materials. These are free from background contamination and influences of adjacent GC peaks unresolved in the overall gas chromatogram. These spectra can then be analyzed by library search techniques or first principles as necessary. We have applied a preliminary version of this algorithm to several urine samples. These contain several apparently simple peaks which in fact consist of multiple components. The algorithm performs well in separating out these constituents although further testing is reguired. Closed-loop Instrument Control In the long term, it could be possible for the data interpretation Software to direct the acquisition of data in order to minimize ambiguities in problem solutions and to optimize system efficiency. The task of deciding among and collecting various types of mass Spectral information (e.g., high resolution spectra, low ionizing voltage spectra, or selectad metastable ion information) under closed-loop control during a GC experiment is difficult. Problems arise because of the large requirements placed on computer resources SA and present limitations in instrument sensitivity or data read-out imposed by the time constraints of GC effluent peak widths. Solutions to these problems may not be economically feasible within currently existing technology but seem achievable in the future. We are studying this problem in a manner which would entail a multi (two or three) - pass system. This permits the collection of one type of data (e.g., high resolution mass spectra) during the first GC/MS analysis. Processing of these data by DENDRAL will reveal what additional data are necessary on specific GC peaks during a subsequent GC/MS cun. Such additional data could help to uniquely solve a structure or at least to reduce the number of candidate structures. This simulated closed-loop procedure could demonstrate the utility of DENDRAL type programs to examine data, determine solutions and propose additional strategies, but will not have the requirement of operating in real-time. Some parameters in the acquisition of particular types of information, such as metastable data, will require computer control, even in the open-loop mode. We have considered plans to implement two aspects of instrument control, in addition to the magnetic scan control implemented for GC operation and reported previously. These include system resolution control, such as would be reguired to change from normal spectrum scanning mode to metastable scanning mode, and high voltage control necessary to selectively measure metastable ion fragmentation data. In addition to these we have considered the discrete switching of various electronic mode controls which are straightforward and not discussed in detail. Implementation plans for computer control of these instrument functions have been delayed because of the ACME computing facility transition which diverted the necessary hardware and software manpower. Resolution control involves changing the widths of the slits at the exit of the ion source and the entrance to the ion multiplier detector. Additional source and electrostatic analyzer voltages must be controlled to optimize performance, as discussed later. Mechanical slit adjustment is accomplished on the MAT-711 instrument by heating wires which support the slit jaws. The resulting expansion or contraction of those wires move the spring-loaded-jaws. As implemented by the manufacturer, the time constants involved in heating the control wires are 5-10 seconds. It is possible to speed this up to approximately 0.5 seconds by application of a controlled over voltage decreasing to the appropriate equilibrium value for the desired slit width. This was demonstrated by a series of experiments on an extra slit assembly mounted in a vacuum. jar in our laboratory. Cooling of the wires is relatively fast in the way they are nounted so no problem exists in that direction. It is desirable to have feedback to indicate the actual slit width achieved rather than relying on a slit assembly calibration. Stretching of the support wires or changes in the spring tension under temperature cycling would change this calibration. An optical scheme to measure slit width in situ is possible. We do not contemplate implementing this feedback immediately because it requires major changes to the instrument flight tube. Two types of metastable ion relationships are obtainable by suitable control of the double-focussing instrument. First, for a given daughter ion, one can trace the parent ions which give rise to “3 it. Second, for a given parent ion one can trace the various daughters to which it gives rise. The first measurement ("metastable defocussing") is the more straightforward for this instrument since parent ions can be enumerated by a simple scan of the accelerating voltage, holding the electrostatic analyzer (ESA) voltage and magnetic field constant. The second type of scan requires the coordinated scan to two of the three fields. We feel that joint computer control of the accelerating voltage and ESA voltage is the simpler approach since the magnetic field is more difficult to set and monitor because of hysteresis effects. For a resolution of 1000 in the metastable ion mass measurement, the voltages must be set to approximately .01-.02% accuracy. This requires a 14-16 bit digital- to-analog {D/A) converter to control the input (10 volts) to the operational amplifier which generates the high voltage. Similar D/A controls of ion source voltages for ion current and focus aptimization can be implemented using optical isolators to allow vernier control of the various high voltages around the nominal 8KV values. Computing Transition As mentioned earlier, the transition of the ACME computing facility from NIH subsidy to Stanford-sponsored fee-for-service operation has impacted our development efforts this past year. Both the 1ow resolution instrument used for routine body fluid analysis research and the high resolution instrument are affected. All computing support was previously obtained from the ACME facility, much of it as core research without explicit transfer of funds. The transition has reyuired consideration of both technical and economic factors. The new facility represents a combination of the previous ACME interactive and real time computing load with various administrative and batch computing loads on a new IBM 370/158 computer. This new environment will have even more difficulty in supporting real time computing needs than ACME did. No real time support has been available since the 360/50 service was discontinued on July 31, although terminal service was reestablished in mid-August. Data acquisition service via the IBM 1800 is expected to be operational by early November. Por the high resolution instrument, this transition, as a minimun, necessitates an interface modification (we previously sent data through the IBM 2701 interface no longer to be supported). It also amplifies the problems we encountered in sending and filing high Cate mass spectrometer data {particularly during GC/MS runs). These problems would be present to some extent in any general time-sharing service machine without specific hardware and software configuration provision for these needs (such provisions for real time support had been proposed in our SUMEX computer application). After examining a variety of alternatives, we conclude that a dedicated mini-computer solution (built -around a machine with the arithmetic capability of a PDP-11/45) would be highly attractive technically and relatively inexpensive. A stand-alone mini-computer Systen would cost in the range of $50,000-$60,000, augmenting existing equipment, plus approximately $9,000 per year maintenance and $2,000 per year for supplies. Estimates for 370/158 support, based 9n current charging algorithms and previous utilization experiance, run from $35,000-$50,000 per year. This spread is caused by uncertainties in the effects of planned measures to increase 2perating efficiency and possible changes to the rate structure. In lf any case, the mini-computer approach pays for itself in 1 to 2 years of operation and provides the responsiveness of a dedicated machine for real time support. Unfortunately our existing budget does not provide for this solution. The budget is very marginal for purchase of computing support from the 370/158 as well. This later approach is the only currently available one, however, since it can be implemented with relatively low start-up cost. fhe effect of budget limitations appears in terms of a reduced number of Samples which can be run. We have attempted to minimize the other budget costs {manpower principally) to increase the computing funds available. This will necessarily impact our development goals. We hope, in the renewal application for DENDRAL support, to be able to implement the more effective nini-computer approach for the high resolution spectrometer as a longer term solution. Wwe have undertaken an interim mini-computer solution for the low resolution spectrometer (Finnigan 1015 quadrupole) which is primarily used for our body fluid analysis studies. For the same reasons outlined above, a mini-computer solution is attractive. In the case of the low resolution quadrupole instrument, a lesser capacity machine will suffice. for immediate data acquisition and display functions. We have implemented such an interin System on a PDP-11/20 machine available fron other funding sources. This system, which is now Operational, allows the acquisition of GCyMS data, limited by the capacity of tne DEC tape storage medium to approximately 600 spectra, per experiment. For certain types of GC analyses, up to 1000 Spectra per experiment are required so this limits, to some extent, the utility of this interin System. A calcomp plotter is supported for display purposes. A fixed head disk ‘provides for library search procedures which are still being converted from the ACME system. We have applied to the NIH-GMS for funds to augment this system in order to relieve current limitations as part of a Genetics Center research proposal. FUTURE PLANS Our future plans are basically to continue development along the lines outlined above. wWe will complete the computing support transition steps described. These include primarily establishing a connection to the new 370/158 facility to provide interim support for the high resolution System. We will pursue additional software and hardware development goals as far as possible within the limited budget available. These efforts will concentrate for the most part on bringing up a metastable ion analysis data system. It should be reemphasized that the manpower levels proposed in the follow-on budget have been minimized to allow for purchasing computing time on the 370/158. The allocated manpower is reguired primarily for instrument operation and maintenance with mininal Provision for development efforts. SS Part B(ii). Analysis of Body Fluids by Gas Chromatography/Mass Spectrometry. The chemical separation of urine into the following fractions prior to GC/MS analysis has been described in previous DENDRAL Reports: free acids {analyzed by gc/ms as their methyl esters) amino acids (analyzed by gc/ms as their N-trifluoroacetate n-butyl ester) carbohydrates (analyzed by gc/ms as their trimethyl silyl ether derivatives) hydrolyzed acids (analyzed by gc/ms as their methyl esters) hydrolyzed amino acids (analyzed by gc/ms as their N-trifluoroacetate n-butyl esters) During the past year we have extended these methods of fractionation to the following body fluids: blood {after an initial precipitation of proteins by the addition of ethanol) and amniotic and cerebrospinal fluids. The following summarizes the results obtained from an analysis of these fluids during the past year by gas chromatography-mass spectrometry. URINE ANALYSIS: A. The Development of a "Metabolic" Profile Characteristic of Neonatal Tyrosenenia Using Combined Gas Chromatography-Mass Spectrometry. This work was carried out in collaboration with clinical colleagues from the Department of Pediatrics at Stanford University and a joint publication describing this research is in preparation. The study was based on a total of one hundred and four 24-hour urine samples from sixteen premature or small birthweight infants receiving treatment in the Stanford nursery. After exclusion of infants who became ill, died, or left the nursery, we were able to follow nine infants closely for periods of between 4 and 6 weeks from day 3 of life. All nine infants had birthweights of below 1500g and three of these were below 1000g. of the nine infants studied, five showed transient tyrosinemia as shown by a marked elevation in the urinary excretion of the tyrosine metabolites, p-hydroxyphenyllactic acid, p-hydroxyphenylpyruvic acid and p-hydroxyphenylacetic acid. There was also a less marked but distinct elevation in the urinary tyrosine output. Figures 1 and 2 show the metabolic profiles of the same infant ({J.L.) in the normal(a) and tyrosinemic({b) states. Figure 1 shows the free acid outputs, chromatographed as the methyl ester-methyl ether derivatives and Figure 2 is an expression of the free amino acids of the sane urines, chromatographed as the N-trifluoroacetyl n-butyl ester derivatives. In each case the concentration of each metabolite is a function of the peak height as compared to the height of the internal Standard. Table 1 is a summary of the ranges of urinary output of tyrosine and metabolites observed for all the infants in the study. TABLE 1 Daily Excretion in mg/kg Tyrosine p~HPLactic p-HPPyruvic p-HPAcetic SO Normal 0.2 ~- 3 0-5 0 - 0.5 0.2 - 2 Tyrosinemic . 3 - 15 5 - 50 0.5 - 5 0.5 - 5 As shown by Table 1 and Figure 1 neonatal tyrosinemia is characterized by a very large increase in the output of p-hydroxyphenyllactic acid and by a 10-50 fold excess of the latter over p-hydroxyphenylpyruvic acid. Studies of the hereditary defects in tyrosine metabolisa initially indicated that p-hydroxyphenylpyruvic acid was the major metabolite although more recently cases have been reported where p-hydroxyphenyliactic is in a 2-5 excess over p-hydroxyphenylpyruvic. These latter determinations were made using GC and GC/MS methods and therefore probably reflect the improved specificity of the analytical procedure (previously colormetric methods were used) rather than a difference of actual metabolic profile. Apart from the very large excess of p-hydroxyphenyllactic acid over its keto analog we could detect no significant differences between the profiles shown in neonatal tyrosinemia and those published for hereditary disease. Other metabolites such as p-hydroxymandelic acid, DOPA N-acetyltyrosine, which have previously been reported in tyrosinemic urine were not seen to be elevated. Be GC/MS Analysis of Urine from Children Suffering from Leukemia. This reasearch was carried out with twenty 24-hour urine samples supplied by Drs. Jordan Wilbur and Tom Long of the Stanford Children's Hospital. The acidic fraction of all urines studied in this project showed no abnormal metabolites nor were gross amounts of known acids detected. The amino acid fraction, however, of six of the urine samples showed the presence of an non-protein amino acid, beta-aminoisobutyric acid {(BAIB). In several of these instances the patients were excreting in excess of 1 gram of BAIB per day. The literature contains many references to increased BAIB excretion (genetic excretors, lead poisoning, pulmonary tuberculosis, march hemoglobinuria, thalassaemia and Down's Syndrome). The reported excretion of BAIB by leukemic patients was not substantiated by another investigator. There are several criticisms in the literature of the methods used for the quantitation of BAIB in biological fluids and in order to fill this void a sensitive, specific and rapid method for the quantitation of BAIB has been developed. (SEE: The Quantitation of BAIB in Urine by Mass Fragmentography; W.E. Pereira, Re Summons, W.E. Reynolds, T.C. Rindfleisch and A.M. Duffield, in press). C. GC/MS Analysis of Urine from Patients Suffering from Hodgkin's Disease. During this study 20 urine samples from patients with diagnosed Hodgkin's Disease (Department of Oncology, Stanford University Medical Center) were analyzed and in general, no abnormal metabolic profile could be found in any urine. There was one exception in which an individual was noted to excrete massive quantities of adipic acid (of the order of 1 gram per day). D. Detection of Metabolic Errors by GC/MS Analysis of Body Pluids. This project results from a collaborative effort between the Departments of Genetics and Pediatrics of the Stanford University Medical Center. To date over 50 samples have been analyzed; the majority (35) being 77 urine, while amniotic fluid (10), blood (6), and cerebrospinal fluid (6) were also analyzed. It has been and will continue to be our practice to analyze aliguots of fluid samples in collaboration with clinical investigators obtained for valid diagnostic purposes completely divorced from this research on GC/MS analysis techniques. This investigation is not intended to serve as a screening program for a large population but rather to focus on those individuals who exhibit suggestive clinical manifestations such as psychomotor retardation and progressive neurologic disease as well as suggestive pedigrees. In the case of amniotic fluid the hope is to be able to monitor the condition of the fetus in those pregnancies which might be considered at risk. To date we have investigated specimens from normal pregnancies in order to establish the catalog of compounds to be observed in amniotic fluid. Prom this base it could prove possible to identify materials which might identify the health of the fetus. We have been able to confirm the presence of orotic acid in a urine from a person found to have orotic aciduria while another urine sample was used to demonstrate our ability to identify the characteristic metabolites present in isovaleric acidemia. The following description refers to a urine from a child with hypophosphatasia. A child died 33 hours after birth in Fresno, California, with the classical signs of hypophosphatasia. This genetic defect is marked by high phosphoethanolamine {PEA) concentrations in urine of affected homozygotes and unaffected heterozygotes. After derivatization (in this instance the TMS ethers of the water soluble carbohydrate fraction were prepared) we were able to detect by GC/MS large concentrations of ethanolamine and phosphoric acid but not PEA itself. The derivatization procedure we used most likely hydrolyzed PEA. We were able to guantitate for this compound in the infant's urine using an amino acid analyzer, and PEA excretion was extremely high (over 200 times normal values for infants) confirming the diagnosis. Next we examined urine samples from the child's parents, presumed heterozygotes, by GC/MS and by the amino acid analyzer. Again, no PEA was detected by the former method although the presence of ethanolamine and phosphoric acid was demonstrated. We determined the following excretion levels of PEA by amino acid analyzers Newborn infant: 94 micromoles per 100 ml. (Normal 0.21-0. 33) Father: 269 micromoles per 24 hours {normal 17-99) Mothers 32 micromoles per 24 hours {normal 17-99) It is of interest that in this family the affected infant and his unaffected father both show subnormal serum alkaline phosphatase activity. The mother, who did not excrete increased amounts of PEA, was found to have normal activity of this enzyme in her serum. The following table summarizes the serum phosphatase activity measurements: Newborn infant: 0.2 units* {normal 2.8-6.7) Fathers: 0.7 units (normal 0. 8-2. 3) Mothers: 3.4 units (normal 0.8-2. 3) (* - 1 unit is that phosphatase activity which will liberate 1 millimole of p-nitrophenol per hour per iiter of serum) SF E. Drug Analysis Service Using GC/MS We were recently contacted by physicians to rapidly identify a drug self-administered by a patient in the Stanford University Hospital. From the mass spectrum the drug was identified as pentazocaine within the hour. Although not part of the formal DENDRAL proposal we expect that similar cases may arise in the future and we intend to respond positively to such requests. Development of Library Search Routines for Mass Spectrum Identification The analysis of a single body fluid fraction produces between 600 and 750 mass spectra. In order to cope with the interpretation of the daily production of mass specta (about 8 body fluid fractions for a total of between 4,800 and 6,000 mass spectra) we have begun the implementation of library search routines. Concurrent with the analysis of body fluids for netabolic content we have been recording the mass spectra of many reference compounds. This collection represents the beginning of the construction of a library of reference spectra. Late in 1973 we expect to receive from Dr. S. Markey, University of Colorado Medical Center, a more comprehensive library which he has collated from contributors (including our own laboratory). in the field of biological applications of gas chromatography/mass spectrometry. Prior to the demise of the ACME computer faciliity at Stanford University, we ran library search routines on data collected from urine fractions. Because of the ACME System being heavily loaded, our programs took about one zinute per compound identification. However, the experience gained will be used to implement library search routines on our current PDP-11 GC/MS data system. In addition we have sent mass spectra from several urine analyses to Dr. S. Grotch, Jet Propulsion Laboratory, Pasadena, California, in order that he could use his library search routines on real data. In this instance the limiting factor for efficient compound identification was the library content which was limited to a few compounds of biological Significance. In addition those compounds of interest that were present in the library were often in a derivatized form different from that used in our analytical methodology. ° Application of GC/HRMS to Body Fluid Analysis We reported in the last annual report of the DENDRAL project that the Varian MAT 711 mass spectrometer was interfaced with a gas chromatograph for the recording of low resolution mass spectra. We have now used this system for the recording of HRMS of gas chromatographic fractions from urine analyses. We were able to record HRMS scans over several gas chromatographic peaks of interest in a number ot urine fractions. The high resolution results were found to be of a high quality in mass measurement accuracy. When. using the MAT 711 instrument for GC/HRMS the sensitivity of the ion source was a limiting factor in that less intense gas chromatographic peaks often lacked sufficient material to generate acceptable high resolution mass spectra. Notwithstanding this limitation the HRMS data recorded on different urine fractions was used to confirm the identification of several metabolites. If by chance the metabolite of concern was available only in quantities insufficient for direct GC/HRMS, preparative GC would be used to concentrate the component of interest for subsequent HRMS. “9 RESOURZTE OPERATION Over the term of this grant our mass Spectrometry laboratories have provided support to numerous research projects in addition to the DENDRAL computer progran development project funded under this grant. These cover a variety of applications at Stanford, in the United States, and abroad. Included ace problems in the study of human netabolites, biochemistry, and natural product chemistry. Samples have been run in collaboration with outside people both on the MAT-711 GC/High Resolution Mass Spectrometer system and the Finnigan 1015 GC/Low Resolution Quadrupole Mass Spectrometer systen. The low resolution system has also been Supported by a NASA research grant. : The following tables summarize the Support rendered in terms of numbers of samples run through various types of analysis: I. MAT-711 High Resolution System (Period covered 11/71 ~— 6/73). Batch Batch GC/High GC/Low High Resol. Low Resol. Resol. Resol. MS MS MS aS DENDRAL program devel. . 317 3 Stanford Genetics (Body fluid analysis) 39 17 13 Stanford Chemistry (non- DENDRAL - Dr. Dijerassi's group) 91 112 50. Stanford Chemistry (non- DENDRAL —- Drs. Vantamelen, Johnson, Mosher, Collman, Altman, Goldstein) 29 23 4 Stanford Surgery (Dr. Fair) . 8 : Dr. Adlerkreutz (Finland) . 10 Dre Venien (France) 26 Dr. Gilbert, Mors, Baker (Brazil) 40 4q4 Dr. Orazi (Argentina) 19 1 Dr. Subramanian {India) 10 5 Dr. Khastgir (India) 5 Dr. O'Sullivan (Ireland) 5 Dr. Badr (Libya) 30 Dr. Mital (India) 5 624 215 13 54 Samples Samples Samples samples If) FINNIGAN 1015 Low Resolution System (period covered 8/72-8/73) Note the samples run are specified by fluid type. Each fluid is extracted and derivatized as described in Part B (ii) and therefore may represent several GC/LR4S analyses. Specific discussions of the results of various of the analyses run are discussed earlier in Part B(ii). GC/Low Resolution MS Stanford Pediatrics (Drs. Cann, Sunshine and Johnson) 141 urines 7 Amniotic Fluids 6 bloods 2 cerebrospinal fluids Stanford Oncology (Dr. Rosenberg) 20 urines Stanford Psychiatry - Genetics (Drs. Brodie and Cavalli-Sforza) 4 cerebrospinal fluids Stanford Respiratory Medicine (Dr. Robin) 2 urines 2 bloods Stanford Pharmacology (Dr. Kalman) 2 extracts Stanford Biochemistry (Dr. Stark) 8 extracts Stanford Children's Hospital (Drs. Wilbur and Long) 24 urines UC San Francisco Medical School - Dermatology {Dr. Banda) 2 urines Menlo Park V.A. Hospital (Dr. Forrest) 13 extracts Palo Alto V.A. Hospital. {Drs. Hollister and Green) 7 extracts University of Puerto Rico School of Medicine (Dr. Garcia-Castro) 7 urines 243 samples PART B PUBLICATIONS The following summarizes the publications resulting from research in the low resolution mass Spectrometry laboratory over the past year, including body fluid analysis. This laboratory has heen jointly Supported by NIH (DENDRAL) and NASA. The listed publications include research relevant to both sponsors. The Determination of Phenylalanine in Serun by Mass Fragmentography. Clinical Biochem., 6 (1973) By WE. Pereira, V.A. Bacon, Y. Hoyano, R. Summons and A.M. Duffield. The Simultaneous Quantitation of Ten Amino Acids in Soil Extracts by Nass Fragmentography Anal. Biochem, 55, 236 (1973) By. W.E. Pereira, Y. Hoyano, W.E. Reynolds, R.wE. Summons and A.M. Duffield. An Analysis of Twelve Amino Acids in Biological Fluids by Mass Fragmentography. ‘ Anal. Chen., By R-E. Summons, W.E. Pereira, W.E. Reynolds, T.cC. Rindfleisch and A.M. Duffield. The Quantitation of B-Amino isobutyric Acid in Urine by Mass Fragmentography. Clin. Chim. Acta, in press By W.E. Pereira, RE. Summons, W.E. Reynolds, T.C. Rindfleisch and A.M. Duffield. The Determination of Ethanol in Blood and Urine by Mass Pragmentography. Clin. Chim. Acta By W.E. Pereira, R.E. Summons, T.C. Rindfleisch and A.M. Duffield. A Study of the Electron Impact Fragmentation of Promazine Sulphoxide and Promazine using Specifically Deuterated Analogues. Austral. J. Chem., 26, 325 (1973) By M.D. Solomon, R. Summons, W. Pereira and A.M. Duffield. Mass Spectrometry in Structural and Stereochemical Problems. CCXXXVII. Electron Impact Induced Hydrogen Losses and Migrations in Sone Aromatic Amides Org. Mass Spectry., in press. By A.M. Duffield, G. DeMartino and C. Djerassi. Spectrometrie de Masse. Ix. Fragmentations Induites par Impact Electronique de Glycols- -En Serie Tetraline Bull Soc. Chim, France, 2105 (1973). Spectrometric de Masse VIII. Elimination d‘can Induite par Impact Electronique dans Le Tetrahydro-1,2,3,4-Napthtal-ene-diol-1, 2. Org. Mass Spectre., 7, 357 (1973). By P. Perros, J.P. Morizur, J. Kossanyi and A.M. Duffield. Chlorination Studies I. The Reaction of Aqueous Hypochlorous Acid with Cytosine. ; Biochem. Biophys. Res. Commun., 48, 880 (1972) By W. Patton, V. Bacon, A.M. Duffield, B. Halpern, Y. Hoyano, W. Pereira and J. Lederberg. a Chlorination Studies Iz. The Reaction of Aqueous Hypochlorous acid with ~Amino Acids and Dipeptides. ; . Biochim. et Biophys. Acta, 313, 170 (1973). By. W.E. Pereira, Y. Hoyano, R. Summons, V.a. Bacon and A.M. Duffield. Chlorination Studies Iy. The Reaction of Aqueous Hypochlorous Acid with Pyrimidine ana Purine Bases. Biochen. Biophys. Res. Commun., 53, 1195 (1973). By Y. Hoyano, V. Bacon, R8.E. Summons, W.E. Pereira, B. Halpern and A.M. Duffield. Part C. EXTENSION OF THE THEORY OF MASS SPECTROMETRY BY COMPUTER OBJECTIVES: Part C of the DENDRAL effort, termed Meta-DENDRAL, aims at providing theory formation help for Chemists interested in the mass spectrometric behavior of new classes of compounds. Our goals are necessarily long-range because theory formation by computer is itself an exciting, unsolved problem in computer science. We have chosen to explore this problem in the context of mass Spectrometry in order to make frontier computer research results available to working scientists. [he problem of finding judymental rules for use in a computer program is common to many biomedical computing projects, such as medical diagnosis and therapy recommendation programs. In order to give these Programs the knowledge that makes then perform at acceptable levels, a medical expert is often asked to Summarize his own knowledge of the problem area in rules that the program can use. The Meta-DENDRAL theory formation program is a paradigm for the kind of assistance that computers can give to the medical experts in this role. Programs of this sort can, first of all, provide the expert with an interpreted Summary of a large collection of "hard" empirical data. Second, the program can Suggest to the expert plausible rules that appear Yo explain major features of the data. Thus, the expert is able to assimilate large collections of data in the rules given to the computer, We believe that the meta-DENDRAL work is a useful model on which fruitful work in other biomedical problems can be based. The over-all strategy of this research is to model the theory formation activity of scientists. we Start with a set of empirical data which are known molecular Structures and their associated nass spectra. By exploring the possible nechanistic explanations of each masS spectrum, the program is able to find a set of mechanisms that appear to be characteristic for the class of molecules. These Characteristic mechanisms constitute the general mass Spectrometry rules for the class, or a first-level theory for the class. Further cefineaents of the rules give more Sophisticated restatements of the theory. We have designed the Programs in such a Way as to provide useful results from the intermediate Steps. The progress section discusses several sets of results that have been obtained, even though the entire program has not yet been conpleted. PROGRESS: In the past ten months (since January, 1973) the theory formation programs have seen significant application and significant new extensions. In addition, the work has been described in publications for both chemists and computer scientists. Applications of Existing Programs. The INTSUM program, for interpreting and Summarizing the mass spectra of many known compounds of one class, was described in the previous annual report as essentially finished. In this last period we have used this program to help understand the mass spectrometry of several Pes classes of compounds, including estrogens, equilenins and other estrogenic steroids, androstanes, alkyl pregnanes, vinyl quinazalones, amino acids and aromatic acids. An article written for mass spectroscopists and soon to appear in Tetrahedron (Smith, et.al, enclosed) describes this program and its usefulness in understanding the previously unreported mass Spectrometry of the equilenins. The amino acid and aromatic acid results are useful for interpreting the mass spectra taken from those fractions of urine (see Part B). The INTSUM program is available to anyone who requests it, as stated in the article soon to appear. Because of the complexity of the progran, we reconmend that mass Spectroscopists use this program on a network computer after they have collected a number of mass spectra from a class of compounds whose fragmentation mechanisms they wish to investigate, Recent Extensions to Meta-DENDRAL. In this last period significant progress has been made on the theory formation programs that use the interpreted summary of the data provided by the INTSUM program. A Simple rule formation program, described previously (HI7), fimwds the characteristic mass spectrometry mechanisms for a class of compounds, assuming that the compounds exhibit regular behavior as a class. Recent work has removed the restriction that the compounds must behave as a class - important classes can be found by the program within the set of given compounds. The procedure was described in a paper for the Third international Joint Conference on Artificial Intelligence, which is enclosed. At the same time that the rule formation program looks for characteristic mechanisms, the class Separation procedure refines the class of molecules that appear to behave uniformly (i.e., appear to exhibit most of the characteristic mechanisms). Another important extension of the theory formation program makes the rule descriptions more general and less Specific to the class of compounds studied. The mechanisms in the rules are now described generally in. terms of the kinds of bonds that break, and not in terms of the precise relations of the bonds to the skeletal structure common to the class. Por example, a rule is now stated as "Any bond that is the second bond fron a nitrogen atom is likely to break", rather than "In the skeleton R1-C2-N3-C4-R5 the bond between atoms 1 and 2 and the bond between atoms 4 and 5 are both likely to break", These general descriptions will allow much more freedom in the kinds of interpretations that can be placed on the INTSUM results. It is possible, for example, to alter the set of predicates used to describe bonds without altering the program. The program can be conceptualized as a search program through the Space of possible combinations of predicates. Some predicates describe the type of bond (e.g., ‘*single'), others describe the atoms joined by the bond (e.g., "nitrogen't, "secondary'), and others describe the bonds and atoms next away from the bond that breaks. Some a priori heuristics limit consideration of complex predicates to chemically meaningful combinations, for example, by forbidding consideration of a Single atoa as both carbon and nitrogen. Other heuristics guide the process of expansion by forbidding a new predicate to be added to a description if its addition reduces the explanatory power of the existing description. For example, if a high average intensity is associated with breaking the a X-X bond in X-X-N and further specification of either of the x*s reduces the average intensity, then the description is not changed. In addition to the work just mentioned, a generative model of rule formation has been pursued by Carl Farrell in his dissertation work directed by Professor Feigenbaum and Dr. Buchanan. He has written a program which accepts, as input, descriptions of specific molecules and all the primitive actions that might explain the mass spectra of those molecules. The output of the Program is a set of general Situation-action rules that describe classes of molecules that seem to be characteristically show evidence of Significant actions. PLANS In the following period we plan to increase the performance capabilities of the theory formation program in several ways. 1. Sample Selection. The program's current strategy is to find the Tules exhibited by most or all of the molecules in the initial Sample. If the molecules are diverse, the rules will be diverse. Thus, we plan to add a preprocessor that can select a "Simple" set of molecules for the rule formation to work with. For example, unbranched (straight-chain) compounds should be expected to present fewer Complications for initial theory formation than highly branched compounds. The effects of the complicating features can be studied after the simple rules have been found. 2- Rule Clarification. After simple rules have been found, we want the Program to clarify the conditions under which the rules hold. By studying more complicated molecules, the program can find the simple rules that no longer hold for these cases. For example, we want the program to discover that terminal alpha carbons {as in CH3~x-nN) are Special. Or, the program should discover the effects of double bonds by examining new cases even though the molecules in the original set contained no double bonds. 3. Experimentation. Because the original set of molecules contains the simpler examples from which it is easier to find characteristic mechanisms, the program will need to clarify rules in the Way Suggested under (2). For a human scientist, this means describing new experiments to perform that will help place limits on the range of applicability of the rules. Looking at additional arbitrary molecules may be helpful, but not as helpful as looking at the specific molecules that will resolve specific questions about the preliminary rule set. 4. Integration of Results. When the program has examined two or more classes of molecules, it should be able to integrate the results into a COmmon set of nechanisms (if any are common). The set of predicates useq by the integration program may not have to be wider than the set useq by the rule formation progran, but one would expect the rules themselves to be more general. For example, integrating aliphatic amine and ether results should combine the separate alpha-cleavage rules (one with nitrogen, one VS with oxygen) into'a more general rule (specifying ‘N or Of, or *heteroatom'). PART C REFERENCES (Published or submitted during this year) D.H.e. Smith, B.G. Buchanan, W.C. White, E.aA. Feigenbaum, C. Djerassi and J. Lederberg, "Applications of Artificial Intelligence for Chemical Inference X. INTSUM. A Data Interpretation Program as Applied to the Collected Mass Spectra of Estrogenic Steroids". Tetrahedron. In press. BeG. Buchanan and N.S, Sridharan, "Analysis of Behavior of Chenical Molecules: Rule Formation on Non-Homogeneous Classes of Objects". In proceedings of the Third International Joint Conference on Artificial Intelligence, Stanford University {August, 1973). (Also Stanford Artificial Intelligence Project Memo No. 215.) Related Publications D. Michie and B.G. Buchanan, "Current Status of the Heuristic DENDRAL Program for Applying Artificial Intelligence to the Interpretation of Mass Spectra", August, 1973. E.H. Shortliffe, sS.G. Axline, B.G. Buchanan, T.C. Merrigan and S-N. Cohen, "An Artificial Intelligence Program to Advise Physicians Regarding Antimicrobial Therapy". Computers & Biomedical Research. In Press. The undersigned agrees to accept responsibility for the scientific and technical conduct of the project and for provision of required progress reports if a grant is awarded as the result of this application. pofrlr i Shack. p. Fikeg teh rr rremndcerenatees | Date - aon. Edward A. Feigenbaum ay edd Principal, Investigator 74 . fn, te i . Mak , sored, a shee. myo TOP UD RISA FATA FOR FROM THROUGH . UE DUEOE Chatop —— __ 1/1/73 12/31/73, SECTION 115 Ghani NUMBER 5_R24 RROO612-03 _ The following porfpears fs your CURRENT PHS budget. Do not raclide sharing funds. Th. atoi-nation in Conjunction with that oro. th ‘ "he aced in delermininey tt, amount of supno- cor the NE? wtp oeriod. ~ — T a T ee | tO MATED : a : PEtai Ree eb ay wer t , tld 8/ edits “ | fx: * AND MEHINDI oe ‘ =o an CAT onere (AS app oved by “ 31 fis ; k ane oe re ws o « awarding ov th (nsert Date) QF CUeRENT Col. 3) ve BUDGE! PLA OD qa) (2) (3; 4; ‘s, he nel (Salaries) 111,885 12,722 37 483 110,205 1,67" Fringe Benefits 18,195 11,636 6,372 18,008 187 Consultant Costs - - - ~- ~ Equipme:- 8,000 2,945 1,704 4 ,6h9 3,35] Su. oes 5,900 1,768 1,45 3,222 2,674 Dov. -stic 1,000 1,233 - 1,233 ; foo. TRAVEL Foreign - - - - - Patient Costs _ - ~= - - Alterations and Rene ations - - _ _ -_ Cher 39,061 31,80h 14,912 46,726 i 955) _ we + - Total Direct Costs 18h 01.2 122,108 62,925 134 033 L 2 sacrer t Cost’ iff inciuded in award) 76,955 47579 26 ,882 7h 462 ! 2Le7 - -$o o-oo LL. i. 's > I $260,999 | $169,687 | $88,807 § 2258, hus 2,505 ‘ © below to: tall items of equipment purchased or expects © Explain any significant ba!..-ce of deticit shown ws ely Category of Column 5. D List alt other -eseare!) une -t for Principal investiga’.:1 by scurce, Pre: --t title “nd annual amount. Be tie ley ove Ae sees wie §@ ‘Use Cont 2G to be purchased during this budget period which have a unit cost of § SECTION He HON Hl FISCAL DATA FOR + KOENT PUDGLT PERIOD unaivazhoar:. Account A FROM 1/1/73 “THROUGH | GRANT NUMBER 12/31/73 | 5 R2k RRO612-02 the following pertans fo your CURRE!!T FHS bu dget. Do not inciude cost sharing funds. Thi wo hs used IN delecnining the amount of Support for the NEX: ou dget period. is information in Conjunction with that prowcet on be. ~ ACTUAL ESTIMATED to — SHR | THRUST capt gg] TUM Este | Co us, iwert BUDGET 8/31 , 73 OBLIGATION: FOR AND OBL IE 1245 wh A BUDENT CATEGORIES (As approved by |“! ~~! 1 | REMAINDER (Col. 2 prus Wars ‘ awarding unit) (Insert Date) OF CURPENT Col. 3) from Og: Wy BUOCET FERIOD i) @) 3) (4) 15) Personnel (Salaries) 38,931 oy 5L47 13,248 37,395 1,536 Fringe Benefits 6,359 3,864 2,252 6,116 22 Consultant Costs -0- ~O- ~0- ~O- -C- Equipment ~O0— ~0O- -0~ -0- -C- Supplies 200 100 100 200 -o~ Oomestic 500 500. - 500 “l= TRAVEL Foreign ~Q— -0- -O= -0Q- -0- Patient Costs -~O0=- -~Q- -0— -0- -0- Alterations and Renovations ~0O- -0- -O- -0- -0~ Other 9 5256 8437 2,590 11,027 (1,772) Total Direct Costs 55 s246 37 ,048 18 3190 55 »238 5 indirect Costs (If included in award) 20 ,673 14 5038 . 7 »967 21 »605 1c TOTALS | ——-_____..__ dg, $77,919 * 51,086 $25,757 $ 76,83 aw Use space below to: 8 List all items ot equipmert purchased or expected to be purchased during this budget Period which have e C ft xplain any «in:Hicant balance or deticit shown in any category of Column 5. D. Lic’ st) mthes vesearch support for Principat Investigator by seurss erceiect title 264 5 cat amount. Pan | (att previous “editions ‘et. dtete) weve 120 : PAGE 4 {Use Continuation Ps- writ cost of $1000 or more SECHION UT CURRENT PMAOGET PERIOD mniivi Mowe Account B The following per. we De used mm cteteren FISCAL DATA FOR "TE anam 1/1/73 8 fo your CURRENT PHS tudpet. Oo not include cost sharin cing the amount of support for the NEXT budget period. SFCTION a PULA 12/31/73 CHANT AUMALR 5 R24 RR-00612-03 g funds. This information in conjunction with that Brovided on Page 2 . _ ~ ACTUAL ESTIMATED wa | avstis. | alisteage| (auatiane |r, A uncttearecones i aowoeaty | 82 2TS | Oia a aldo | he oe awarding unit) «Insert Date) Of CURRENT Col. 5 fiom ton ce BUDGET PLRIOD at) (2) @ 4 (5) Personnel (Salaries) hg »636 3k »T54 16 9254 51 ,008 (2 ; 272) Fringe Benetits 7 » 864 5 > 561 2 3763 8 ; 32h ( 4 ft } Consultant Costs - - ~ ~ - * PUPIL . Equipment waint MAT-711 8,000 2,945 1,704 4,649 3535+ Supplies 5,500 1,548 1,27 2,822 2,678 : Domestic _ 98 _ 98 r 9£ ) TRAVEL Foreign — - ~ - = Patient Costs _ - _ _ - Alterations and Renovations _ a - _ - Other 10,000 4,066 9,033 13,099 (3,c9¢ Total Direct Costs 80 ,000 48 3972 31, C28 80 »000 . Indirect Costs (If included In award) 36 ,800 22,306 1h 3583 36,889 (89) TOTALS ————-» | $116,800 $ 71,278 $45,611 116,889 $ (89) Use space below to: 8. List all items of equipment purchased or expected to be purchased durin C. E .ptain any significant balance of deficit shown in any category of Co!umn 5. O. List alf other research support for Principal Investigator by source, Project Sitle, and annual amount. g this budget periog wich have @ unit cost of $1000 oF more. har, 2840) 7 (AH previous editions ob- alete) Hiv 1-70 PAGE 4 (Use Continuation Pages as Wecessery) Section ata Se ing #ROM THROUGH CRANT NUMBER SECTION Ul FISCAL DATA FOR CURRLNT BUDGET PERIOD . HAMA L2 MONTHS) Account C 1/1/73 12/31/73 9~R24-RRO0612-03 The fatlowing pertains to your CURRENT PHS thudget. Do not inctude cost sharing funds. This information in conjunction with that provided on Pag “HE be used in determining the amount of support for the NEXT budget period. scrum sum qt | eM? | elite] Temetsuaaro | genie, A BUDGET CATEGORIES (As approved by 8/31/ 73 at MAINDER Monee (Sustra't Ce: ‘ awarding unit) (Insert Date) OF CURKIAT Col. 3) from Col 2) BUOGET PF RIOD (1) 2) @) (4) (5) Personnet (Salaries) oh 5318 13,821 Ts 981 21,802 2,516 Fringe Benefits 3,972 , 2,211 1,357 3,568 Lok Consultant Costs - - - - - Equipment _ - ~ = - Supplies 200 120 80 200 - Domestic | ; 500 635 - 635 (135) TRAVEL : Foreign _ ~ - - - Patient Costs ~ _ - - - Alterations and Renovations _ - = - - Tota! Direct Costs 48,795 36,088 12,707 48,795 ~0- indirect Costs (If included In award) 17 485 11,235 y 2132 15 3967 1, 518 TOTALS, —————> | $66 589 * 47,323 | * 17,439 | %64,760 ¥ 3,518 Use space below to: B. List all items of equipment purchased or e-pected to be purchased durin C. Explain any cignificant baiance or deficit shown in any category of Column 5, O. List all other research Support for Principal Investigator by source, proje:t title, and annual amount, g this budget period which have 8 unit cost of $1000 or more. Pus 2590 F (All previous editions ob-stete) PAGE 4 = (Use Continuation Pages as necessary) REY 170 SECTION 18 (Continued) Grant Number SECTION IN-BUDGET (Continued) 5 R24 RRO0612-03 8B. Supplemental information regarding ITEMS In the proposed budget for thé in? berfod which require explanation or justification. (See tostructions) Salaries are lower on Account A because University guidelines for merit raises and cost of living raises were less than the budget estimates. Salary costs slightly exceed our previous estimate on Account B for two reasons. First, Mr. Reynolds (Electrical Engineer) left to take another job in June and Mr. _ Stefik (Computer Programmer) left to attend graduate school in September. Their replacements, Mr. Veizades and Mr. Tucker respectively, receive slightly higher salaries. Second, we had planned to contract with Varian Associates for maintenance of the MAT~711 Mass Spectrometer. Varian could not provide that support so we have taken on the task ourselves. Some of the added salary costs reflect associated machine shop and vacuum system support by local personnel. The budgeted Salary amount on Acccunt C is lower. because of a three month lapse in support during the summer quarter of a student research assistant at full time. Also fall quarter support for a student research assistant will be less due to a change in personnel from a third year graduate student to a first year graduate student at a lower pay scale. Fringe benefit costs on Account B are slightly higher reflecting the added salary costs. Also Stanford increased its fringe benefit rate from 16% to 17% in September. Equipment maintenance costs on Account B are lower than expected for the MAT-711 spectrometer. This results from a lower price for locally provided maintenance, several months of instrument down time during the computing support transition, and better than average performance (no major breakdowns of expensive equipment. Supplies are lower on Account: B primarily because of instrument down time during the computing transition, Travel expenses on Account B covers registration fees at a local Artificial Intelligence Conference. "Other" expenses for Accounts A and C are higher than budgeted due to insréc .-- computing costs for two reasons. First, as the computer programs become more useful, we find the demand for computer time increases. Second, as a result of termination of ACME service, we could no longer take advantage of more favorable overnight rates than on the campus facility. Account B costs are also higher because of the termination of free ACME service and the resulting need to operet - on the fee-for-service campus facility. Indirect costs are less than expected on Accounts A and C because a larger portion cf the grant was spent on campus computer facilities which is not sut ject to indirect costs. Account B indirect costs are slightly higher because of an increase in the University indirect rate from 46% to 47% of Net Total Direct Costs on September 1, 1973. he (All previous editione obsolete) PAGE 3 292 SECTION It ~~ FROM THROUGH GRANT NUMBER SECTION 11—BUDGET wsvatcy 12 montas) 1/1/74 4/30/74 5 R24 RRO0612-04 A HEMIZE DIRECT Co.1% PLQUESTED FOR NEXT BUDGET PERIOD PERSONNEL — TIME OR SALARY FRINGE BENEFITS oo . _ EFFORT RLQUESTED (See tnstevcteons) Tota, NAME (Last, First, Initial) TITLE OF POSITION % HRS. (2) (o) (c) © (e) 7 PRINCIPAL INVESTIGATOR poe See separate detailed listing = ” ge Zo ZO ZR % a - Zar Subtotals, ————>1$ 35 802 $ 6,087 ABZ (Indicate cost of each item Ilsted below) TOTAL (Columns (d) and (e) ———~-———p> | 5 _ 41,889 CONSULTANT COSTS (See instructions) | EQUIPMENT f BO fit DWE +3! -0- eho te ~ oz SUPPLIES See Accounts A, B & C Z 3 a DOMESTIC 5 400 TRAVEL _ FOREIGN PATIENT COSTS (See instructions) ALTERATIONS AND RENOVAT tons OTHER EXPLNSES (ttemlze) See Accounts A, 2 & C TOTAL DIRECT COST (Enter on Page 1, Item 20) INDIRECT me Sewe Date of DHEW Agreement: C Not Reaves. - COST 47 __xxsxNTDC Pending C1 under ne.:.. (See Instructions) PHS 2590-1 (&ff previnne editions obsolete) fey 4-79 lf this is a special rate (e.g. off-site), exriatn. PAGE 2 | 61,412 +. Welk SECTION 18 FROM THROUGH GRANT HUMBER SECTION H—BUDGET wsuatty 12 wontus) 1/1/74 4/30/74 5 R24 RROO612-04 A. HILMIZE DIRLCT COSTS REQUESTED FOR NEXT BUDGET PERIOD ACCOUNT B PERSONNEL ~ TIME OR SALARY FRINGE ornceits | ~~ To EFFORT REQUESTED (See tnsteuctions) TOTAL NAME (Last, First, Initial) TITLE OF POSITION %/HRS. (a) (b) (c) (@) te) ty PRINCIPAL INVESTIGATOR re See separate detailed listing z Z:. Zoe Subtotals —————>1$ 14 886 $ 2,531 225 & (indicate cost of each Item listed below) TOTAL (Columns (6) and (e) ——-»- 447 417 ith the CONSULTANT COSTS (See instructions) $ i —~ -0- EQUIPMENT MeN a ayy Stee 4 ’ . , supPpuES Electronic supplies - $333; Gas chromatograph supplies (columns, —_._8ases, etc.) - $250: Organic chemicals, glassware, etc. - $500; Data recording paper (uv sensitive, calcomp, brush) computer supplies (paper, ribbons, etc.) - $167 ~ $333; Mini- MM deere: bere verheceegti DM ee DOMESTIC + TRAVEL — _ FOREIGN — —— 70- PATIENT COSTS (See instructions) BX 8 : s -O0- ALTERATIONS ANO RENOVATIONS ~ oe _ i % -0- OTHER EXPENSES (itemize) ~ 1: Telephone, reproduction, office supplies, ete. - $500 2 Computer terminal rental - $533 [: Mini-computer maintenance contract - $1,000 ae ey ree DEN Mass spectrometer repair and replacement parts - $1,667 f= ~ Computer usage (370/158) - $4,300 i* 8,000 oe | TOTAL DIRECT COST (Enter on Page 1, Item 10) > 27,000 INDIRECT -— — --% S&W* Date of DHEW Agreement: DD Not Requested cost __ 47 _«& wee NIDC Pend 228 . CG Unoer negot:awo.. wv (See Instructions; °If this is @ special rate (e.g. off-site), explain. PHS 2590-1 (All previous editions obsolete) PAGE 2 Rev 1-70 BS ‘SECTION It GPAHT MUM SER ne FROM THROUGH SECTION H—BUDGET wsuatty 12 montus) 1/1/74 4/30/74 5 R24 RROO612-04 “A. HEMIZE DIRECT COSTS REQUESTED FOR NEXT BUDGET PERIOD ACCOUNT A PERSONNEL TIME OR SALARY FRINGE BENEFITS — a. EFFORT REQUESTED (See instructions) TOTAL NAME (Last, First, tnitialy TITLE OF POSITION %/HRS. ay (b) (c) rr) (e) If) PRINCIPAL INVESTIGATOR yore See separate detailed /listing z ys z go é z. z Z zZ a Eg = Subtotals, —————> 13 12,930 $2,198 ZB (Indicate cost of each Item fisted below) TOTAL (Columns (d) and (e) —_———P ot 15,128 CONSULTANT COSTS (See Instructions) $ -0- --- lay So EQUIPMENT Tee Z - ~-—-4 $ te. _70- suppues —- Office ae - TE - j= j= ' ~~) t DOMESTIC TRAVEL FOREIGN PATIENT COSTS (See instructions) ALTERATIONS AND RENOVATIONS OTHER EXPENSES (itemize) Publications - $100 Computer Terminal Rental - $500 Computing Time & Storage - $3240 TOTAL DIRECT COST (Enter on Page 1, item 10) a [ eeereecerteennine pe Hip bet ev i) . ‘ t 1 ' ae 3,840 * 19,218 INDIRECT cost (See fnstructlons) PUS 2590-1 (All previous editions obsotete) Rev 1-70 a SkWE .-—41_» wwe+NTDC if this is 2 special rate (e.g. off-site), explain. Date of DHEW Agreement: Pending © Net Requestec C Unéer negotiation » tr PAGE 2 Pe SECTION It GRANT NUMBER FROM THROUGH SECTION H—BUDGET = cwsuatey 12 montis) 1/1/74 4/30/74 5 R24 RROO612-04 “A HEMIZE DIRECT COSTS REQUESTED FOR NEXT BUDGET PERIOD ACCOUNT C PERSOTINEL . — TIME OR SALARY FRINGE BENEFITS seen Fo ee ig ~ es EFFORT REQUESTED (See Instructions) TOTAL NAME (Last, Firat, Initial) TITLE OF POSITION %/HRS. (a) to) tc) (¢) (e) ' " PRINCIPAL INVESTIGATOR oT See separate detailed listing zg z z Bay Bie 2... Subtotals, —————> 1 $ 7 986 $1,358 BBB S2 Bz (indicate cost of each item listed befow) TOTAL (Columns (4) and (e) ——_——_—_> |} $$ 344 CONSULTANT COSTS (See Instructions) oe $ -0- EQUIPMENT SUPPLIES Office DOMESTIC TRAVEL FOREIGN > _o. . PATIENT COSTS (See instructions) ~~ RNY gi . FEES 6 3 ALTERATIONS AND RENOVATIONS —— _— — ' -Q- OTHER EXPENSES (Itemize) ~ = Publications - $100 Computer Terminal Rental - $500 ; Computer Time & Storage - $5,000 Ba): as /* 5,600 1 FOTAL DIRECT COST (Enter on Page 1, Item 10) 1 15 ; 194 Date of DHEW Agreement: Pending ——_—_—-% Saws 47 _.» zeeNTDC INDIRECT cost C Mot Requesies 1 Uncer negor.. awl {See tnstructions) “if this ts a special rate (e.g. off-site), explain. PHS 2590-1 (All previous editions obsolete) Rev. 1-70 PAGE 2 SECTION 11 (Continued) Grant Number SECTION !#—BUDGET (Continued) > R24 RRO0612-04 B. Supplemental Information regarding ITEMS In the Proposed budget for the next period which require explanation of justification, (See Instructions) All of the proposed budget figures are based on previous and current operating expense experience within the $61,417 total awarded. Manpower, supplies and other costs have been minimized to provide as much as possible for computing costs on the new SCIP 370/158 computer facility for Account B. Even so, the money which can be allocated to computing is below expected operating costs. This will limit the number of samples which can be processed through our mass spectrometry instrumentation. We anticipate the possible requirement to pay 50% of a $10,000 U.S. Customs duty on Account B for the purchase of the MAT-711 Spectrometer from Varian-MAT in Germany. The other 50% would be paid by Varian. This was previously budgeted in year 02 but was deferred by NIH. We have been negotiating with the Department of Commerce for a waiver of these fees based on the unavailability of a similar instrument domestically. These negotiations are still underway with uncertain outcome. We do not include the $5,000 in this budget but will negotiate separately with BRB for reinstatement of the deferred funds if payment is necessary. Personnel costs for Accounts A and C are supporting the same people with the exception of the student research assistant on Account C. Mr. Farrell, who previously worked on the project left and has been replaced by Mr. Mark Stefik who, as a first year graduate student, receives slightly less salary. Drs. Buchanan, Brown, Carhart and Dromey, who currently work for the project but receive support elsewhere, are shown at zero salary level to indicate a realistic manpower level with regard to computing and other related expenses. The requested manpower on Account B is required for the operation and maintenance of the mass spectrometry systems. Again, the same personnel receive support with two exceptions. Mr. Veizades, who replaced Mr. Reynolds, provides electronic and mechanical engineer: support for system development and maintenance, and Mr. Tucker, who replaced Mr. Stefik, provides computer programming support for data system development and maintenance. Travel estimates are a bit higher on Accounts A & C in light of the deficit experienced in the current year and the expected continuation of travel needs. No money is requested for equipment purchase. Supplies and "other" budget items are detailed in each catagory and should be self-explanatory. All costs are estimated based on past experience and have been held to the lowest minimum due to the total budget level allotted for the year 04. Actual computer service will be somewhat less than anticipated, however, because of the termination of ACME service, Equipment maintenance costs for the PDP-11/20 minicomputer and the MAT-711 spectrometer have been budgeted under "other". Note that we have taken on, as the most cost effective approach, the maintenance of the spectrometer in addition to other locally built equipment. We had previously planned to obtain this service from Varian-MAT, but this proved unsatisfactory. PHS 2590-1 (AN previous editions obselete) PAGE 8 REV. 1-70 ; 2¢ SUPPORTING DATA FOR ADMINISTRATIVE EXTENSION We have discussed with the Biotechnology Resources Branch possible relationships between a continuation proposal of this DENDRAL Resource- Related Research grant and the proposed SUMEX nationally shared computing resource (RR-00785). The problem of synchronizing these proposals may entail a request for administrative extension of the present DENDRAL award to prevent lapse of constituent programs pending their review in different organizational formats. A The monthly costs of such an extension are detailed here, on the basis of austere minima for the continuity of the team's work. Our attached estimate for the "next" period (1/1/74-4/30/74) already represents such a baseline budget since we have had to absorb significantly increased computing costs (see "current" and "next" budget explanations) in reducing manpower support. The numbers shown below are therefore the monthly direct costs derived straightforwardly from the budget estimate for the next period. They would be allocated among the various budget categories in a similar manner. Part A: $4,805 monthly Part B: $6,750 monthly Part C: $3,798 monthly Total: $15,353 monthly (direct costs)