OR. KEOERBERG- RESOURCE-RELATED RESEARCH COMPUTERS AND CHEMISTRY (RR-00612 RENEWAL APPLICATION) Submitted to the BIOTECHNOLOGY RESOURCES BRANCH OF THE NATIONAL INSTITUTES OF HEALTH December, 1973 School of Medicine Stanford University CAI, COON meena an ne eed ecg Teer oe or ane ror ener d Form Approved SECTION | Budget Bureau No, 68-RO249 DEPARTMENT OF LEAVE BLANK HEALTH, EDUCATION, AND WELFARE TYPE IPR PUBLIC HEALTH SERVICE OGRAM NUMBER |REVIEW GROUP [FORMERLY GRANT APPLICATION COUNCIL (Month, Year) DATE RECEIVED To BE COMPLETED BY PRINCIPAL INVESTIGATOR (/tems 1 through 7 and 15A) 4. TITLE OF PROPOSAL (Do not exceed 53 typewriter spaces} Resource Related Research - Computers and Chemistry (RR-00612 renewal) 2. PRINCIPAL INVESTIGATOR 2A. NAME (Last, First, Initial} Djerassi, Carl 2B. TITLE OF POSITION Professor of Chemistry 13. DATES OF ENTIRE PROPOSED PROJECT PERIOD (This application) FROM THROUGH 5/1/74 4/31/79 4. TOTAL DIRECT COSTS RE. [6. DIRECT COSTS REQUESTED QUESTED FOR PERIOD IN FOR FIRST 12-MONTH PERIOD 2G. MAILING ADDRESS (Street, City, state, Zip Code) Department of Chemistry Stanford University Stanford, California 94305 3%. OEGREE Ph.D. 2F. TELE. Area LEPHONE DATA | 415 321-2300, Ext. 2783 ITEM 3 $1,350,795.00 $276,197.00 &. PERFORMANCE SITES) (See Instructions) Department of Genetics, Department of Chemistry, and Department of Computer Science Stanford University 2G. DEPARTMENT, SERVICE, LABORATORY OR EQUIVALENT (See instructions) : Department of Chemistry 3H. MAJOR SUBDIVISION (See Instructions) School of Humanities and Sciences 7. Research involving Human Subjects (See /nstructions) ACONO B.E3 YES Approved: C.£) YES — Pending Review Date & Inventions (henews/ Applicants Only - See Instructions} A.EINO B.(_] YES — Not previousiy reported Cc. CJYES — Previously reportea TO BE COMPLETED BY RESPONSISLE ADMINISTRATIVE AUTHORITY (items 8 through 13 and 158) 9. APPLICANT ORGANIZATION (S} (See instructions] Stanford University Stanford, California 94305 IRS No. 94-1156365 Congressional District No. 17 10. NAME, TITLE, AND TELEPHONE NUMBER OF OFFICIALS) r SIGNING FOR APPLICANT ORGANIZATION(S} c/o Sponsored Projects Office Telephone Number (s) (415) 321-2300, X2883 Tt. TYPE OF ORGANIZATION [Check appiicable item) COFeEpeRAL (Clistate CJ LOCAL EX OTHER (Specify) Private, non-profit University 12. NAME, TITLE, ADDRESS, ANC TELEPHONE NUMBER OF OFFICIAL IN BUSINESS OFFICE WHO SHOULD ALSO BE NOTIFIED IF AN AWARD IS MADE K. D. Creighton Deputy Vice Pres. for Business & Financ: Stanford University Stanford, California 94305 Telephone Number (415) 321-2300 ,X2551 GZNTZATIONAL COMPONENT 10 RECEIVE CREOIT FOR INSTITUTIONAL GRANT PURPOSES (See /astructions) 20 School of Humanities and Sciences ENTITY NUMBER (Formerly PHS Account Number) 458210 14. 16 CERTIFICATION AND ACCEPTANCE. We, the undersigned, certify that the statements herein are true and complete to the best of our knowledge and accept, as to any grant awarded, the obligstion to comply with Public Health Service terms and conditions in effect at the time of the averd, SIGNATURES A. SIGNATURE OF PERSON NAMED IN ITEM 2A DATE (Signatures required.on ; Original copy only. TB. SIGNATURE) OF PERSONS? NAMED IN ITEM 10 DATE Use ink, “Per” signatures B. SIGNATURE(S) OF PERSON(S) not acceptable) WIM 398 (FOREERLY PHS 398 Rev. 1/73 SECTION 1 DEPARTMENT OF HEALTH, EDUCATION, AND WELFARE LEAVE BLANK PUBLIG HEALTH SERVICE PROJECT NUMBER RESEARCH OBJECTIVES NAME AND ADDRESS OF APPLICANT ORGANIZATION Stanford University Stanford, California 94305 II oe en eee NAME, SOCIAL SECURITY NUMBER, OFFICIAL TITLE, ANO DEPARTMENT OF ALL PROFESSIONAL PERSONNEL ENGAGED ON PROJECT. BEGINNIN INVESTIGATOR Carl Djera ber Professor of Chemistry, Department of Chemistry; Joshua Professor of Genetics, Department of Genetics; Edward Feigenbauz of Computer Science, Department of Computer Science; Bruce > esearch Computer Scientist, Department of Computer Science; Alan Duffie » Research Associate, Department of Genetics; Dennis Smith, e Tec sociate, Department of Computer Science: Harold Bro > be supplied Associate, Department of Computer Science; Geoff Dromey, SS# applied at a later date, Department of Computer Science. “TITLE OF PROJECT Resource-Related Research -- Computer and Chemistry USE THIS SPACE TO ABSTRACT YOUR PROPOSED RESEARCH, OUTLINE OBJECTIVES ANO METHODS. UNDERSCORE THE KEY WORT (NOT TO EXCEED 10) IN YOUR ABSTRACT. The -bjectives of this research program are the development of innovative computer and biochemical analysis techniques for application in medical research and closely related aspects of investigative patient care. We will apply the unique analytical capabilities of gas chromatography/mass spectrometry (GC/MS) with the assistance of data interpreting computer programs utilizing artificial intelligence techniques, to investigate the chemical constituents of human body fluids in a variety of clinical contexts. Specific subtasks of- this program include; 1) the application of artificial intelligence (AI) techniques to programs capable of interpreting mass spectra from basic principles as well as extending mass spectral theory by analysis of solved spectrum-structure examples, 2) the extension of GC/MS data systems to provide stand-alone capabilities for collecting low and high resclution mass spectral and metastable ion data, 3) the application of GC/MS and AI techniques to analysis of biomolecular structure elucidation problems of a large number of collaborators, and 4) the extension of artificial intelligence techniques to an interactive system for computer assisted structure elucidation based on a variety of data. LEAVE BLANK NIH 398 (FORMERLY PHS 398) PAGE 2 Rev. 1/73 ; a a —— 1 SECTION IH — PRIVILEGED COMMUNICATION M THROUGH DETAILED BUDGET FOR FIRST 12-MONTH PERIOD 4/31/75 AMOUNT REQUESTED (Omit cents) FRINGE BENEFITS 5/1/74 DESCRIPTION (itemize) TIME OR EFFORT *%/HRS, PERSONNEL SALARY TOTAL TITLE OF POSITION PRINCIPAL INVESTIGATOR LISTE) SEPARATELY ON ATTACHED CONSULTANT COSTS EQUIPMENT ui nt Purchase (First Year Items Onl Displa ermina PDP 11/20 Upgrade nt Maintenance: -1 ul Part et GO 00 SUPPLIES upplie oO Liquid Nitrogen DOMESTIC TRAVEL FOREIGN PATIENT COSTS (See instructions} RATIONS AND RENOVATIONS OTHER EXPENSES (/temize) Publications, telephone, office supplies, postage 4,000 Computer Terminal Lease (4) 5,400. Computer Usage - 370/158 (First Year Item Only) 5,060 TOTAL DIRECT COST (Enter on Page 1, Item 5) pe 276.197 DATE OF DHEW AGREEMENT: INDIRECT % S&W* [WAIVED COST 47 exN'TD 2%. 1 (2 UNDER NEGOTIATION wiTH: . Cc - J 9 (See Instructions]. ———1L_.% Une — “IF THIS IS A SPECIAL RATE (e.g off-site], SO INDICATE, NUH 398 (FORMERLY PHS 398) PAGE 3 Rev. 1/73 ‘ PRINCIPAL INVESTIGATORS: C. Djerassi J. Lederberg E. Feigenbaum RESEARCH ASSOCIATES: B. Buchanan (1) A. Duffield D. Smith N. .Sridharan H. Brown G. Dromey PROGRAMMERS: W. White R. Tucker SENIOR RESEARCH ASSISTANT: A. Wegmann ELECTRONICS ENGINEER: N. Veizades GLASS BLOWER/MACHINEST: E. Steed RESEARCH ASSISTANTS: L. Masinter M. Stefik To Be Appointed SECRETARIAL SUPPORT: K. Wharton TOTAL: (1)Dr. Buchanan's salary charges do not begin until 9/1/74 at which time DETAILED SALARY DATA NIH GRANT #RR-00612 5/1/74-4/31/75 to Effort 10 10 10 30 25 100 100 100 100 100 100 100 60 25 100 100 100 100 Fringe Salary Benefits Total -O- -0- -0- -0- -0- -0- 2,910 514 3,424 7,000 1,237 8,237 6,195 1,094 7,289 16,200 2,862 19,062 16,050 2,835 18,885 16,200 -2,862 19,062 15,500 2,738 18,238 14,400 2,545 16,945 14,100 2,491 16,591 15,000 2,650 17,650 11,670 2,062 13,732 4,410 779 5,189 5,070 895 5,965 4,915 868 5,783 4,915 868 5,783 9,400 1,662 11,062 $163,935 $28,962 $192,897 his NIH Research Career Development Award expires. SECTION || — PRIVILEGED COMMUNICATION DIRECT COSTS ONLY (Omit Cents) BUDGET ESTIMATES FOR ALL YEARS OF SUPPORT REQUESTED FROM PUBLIC HEALTH SERVICE DESCRIPTION iGavene | ADDITIONAL YEARS SUPPORT REQUESTED (This application only) TAILED B 2ND YEAR 3RD YEAR 4TH YEAR STH YEAR BRERA 7TH YEAR TOTAL PERSONNEL COSTS 192,897 | 210,611 | 225,129 | 240,630 [257,383 [1,126,650 CONSULTANT COSTS (include fees, travel, etc.) -0- -0- -0- -0- -o- -0- EQUIPMENT 58,100 | 11,770 | 12,947 | 14,241 | 15,665 | 112,723 . SUPPLIES 9,600 6,920 | 7,612 8,370 | 9,207 41,709 DOMESTIC TRAVEL 1,200 1,320 | 1,452 1,597 | 1,757 7,326 FOREIGN -0- -0- -0- -0- -0- -0- PATIENT COSTS -0- -0- -0- -0- -0- -0- ALTERATIONS AND RENOVATIONS -0- -0- -0- -0- -0- -0- OTHER EXPENSES 14,400 } 10,340 | 11,374 | 12,511 | 13,762 62,387 TOTAL DIRECT COSTS 276,197 | 240,961 | 258,514 | 277,349 | 297,774 {1,350,795 TOTAL FOR ENTIRE PROPOSED PROJECT PERIOD (Enter on Page 1, [tem 4) ————_——» $ 1,350,795 page if needed.) See following pages for budget justification. REMARKS: Justify ail costs for the first year for which the need may not be obvious. For future years, justify equipment costs, as well as any significant increases in any other category. if a recurring annual increase in personnel costs is requested, give percentage. (Use continuation “NER 398 (FORMERLY PHS 39¢) Rev. 1/73 ee er ree ane ape Rudgqet Justification Tha availability of existing equipment - including the mass spectrometer and SUMFX computer - avoids the need for reguesting funas for major laboratory itens and sabstantial computing costs. Thus, the major expense in the resulting hudget is for personnel. We feel that the personnel listed here are necessary to carry out the researzth, as justified helow. Pecurring costs are about £227,000 per year. First year expenditures are higher to provide the instrumentation necessary for mass spectrometry service in the first year. #e are regiesting funds for five years to coincide with the funding of the AIM-SUMBY resource, to which we hopa to make significant constributions. This badqet overlaps slightly with the buiget for the Genetics Pasearch Canter (J. Lederberg, Principal Investigator). Dr. Alan Duffieldats 25% salary budgeted here is covered hy the other budget {vhere 100% of his salary is budgeted). 10% of Ms. Annemarie #eamann's salary is covered there (with 109% of her salary huaqeated here). These are the only overlapping items. We have no 9fficial notification of Genetics Tenter funding; if the present proposal is successful, the Genetics Center budget will be adjusted accordingly. tn the five-year budget, salaries are increased by 6% per year and staff benefits are computed at 17% for the period 5/74-8/74, 18% For the period 9/74-8/75, and are increased 1% per year thereaftar, based on current University projections. Other budget catejories are increased by 19% per year to account for inflation. Parsonnal: RBRICE GS. 33 CHANAN Dr. Bruce Buchanan holds an NIH Research Career Development Award to work on applications of artificial intelligence to health-related problems, inclujiing theory formation by computer. Tis work on those aspects of this grant is thus consistent with the Davelonment Award. Half-time support is requested after the tkici year of the Development Award (starting September, 1974) to tover tha sontingency that the award will not be extended to the Full five years. These funds will he returned if the Award is extended, DENNIS H. SMITH Pr. Dennis H. Smith has been a memher of the DENDRAL project since July, 1971. He has been responsible for the MS and its computer suppoct, and has been involved in the application of the AI programs to structural studies of hiomedically important zompounas, primarily steroids. These responsibilities will tontinune ia the future, with particular emohasis on providing the mass spectrometer/Al program link to the use®r community and its Bass spectrometry and general structure elucidation needs, and in oroviding the necessary chemical knowledge and input for development of the computer programs and user interfaces for the proposed computer assisted structur® elacidation effort. ALAN OUFFIELD Jr. Alan Doffield is the senior scientist in charge of the mass spactrometry facilities of the GRC. Because of his expertise in the analysis of mass spectra from various fractions of human body fluids, ho will provide the link between the structure elucidation technianes of this proposal and other scientists with similar problems. The GC/HRMS facilities are also expected to provide service to the Genetics Center for hiqh resolution analysis of zompourds isolated from body fluids. VATESA SRIDHARAN Dr. Sridharan will he responsible for developing interface routines that allow new researchers to make use of the structure alucidation programs. We expect these routines to accept infornatioa about a research problem, in semi-formal terms, and translate it into a format the program can use. They should be sonplete enough so that indiviiual researchers do not need to know about the inner workings of the programs. In addition, he will sontinue to help Dr. Brown and Mr. Masinter with development of the cveclic generator program. (Within a few days of this writing, Dr. Sridharan has decided to take a leave of absence. During his absence we will recruit another Research Associate to perform his Inti2s.} HAROLD RROWN Nr, Hacold Brown's knowledge of graph theory and combinatorial mathematics is essential to the development of the cyclic structure yenerator. Many problems with development and imolamentation of this program have required sophisticated, new mathematical solutions worked out by Dr. Brown. For example, qganerating the dictionary of cyclic graphs and assembling substructices involve problems in graph theory that Dr. Brown is surrantly working on. Or. 3rown has submitted a proposal to the NSF to cover his salary for this research. If that grant is awarded, funds requested here for his salary will not be needed. P. GEOFF DROMEY Dr. 3eoff Dromey is a chemist with strong conputer science interests who has been associated with the project since Soptamber, 1973. He has become familiar with many aspects of the NENOR’L performance programs and will be expected to help outside researchers use those programs. Yn adjition, he will he jeveloping new programs, such as the program for molecular ion Jetermination from mass spectra. WILLTAM C. WHITE Mr, Willian White provides high-level programming support for the theory formation programs, including helping to devise new programs in response to new research problems as well as implamenting them. He wrote almost all of the LISP code for the INTSIM program, for example, and is currently responsible for the AULEGEN program, MO, ANNEMAPTE WEGMANN “s, Annemarie Wegmann is the Senior Research Assistant in charge af the GC/HRMS system. She was formerly head of Hewlett-Packard's Palo Alto gas chromatography applications laboratory and has been cresponsibla for the operation of the GC/MS system since the Yelivory to our laboratory of the MAT-711 (November, 1971). Her technical ability is absolutely essential to the continued gperation and development of the mass spectrometry facility. INSTRIMEN®T SUPPORT PERSONN EL oO tessears, Yaizades and Steed will assist part time in maintaining the GC/ES system. Mr. Veizades is an Flectronics Engineer who is responsible for the electronic and mechanical systems as well as proviling the necessary voltage read-out and control development for the metastable analysis data system. Mr. Steed is a Research "nzineer rasponsible for the system glasswork and vacuum system mai nkenanca. x RORFRT THCKER “Mr. Pobert Tucker implements and maintains the computer programs for data acquisition and reduction of MS data. This includes translating existing PL/ACME into FORTRAN and PDP-11 assembly langiayze. In addition, he will be responsible for improving these proqrams for repetitive HRMS scans, implementing the multiplet resolution algorithm and the software necessary for semi-automated tallaction of metastable ion data. PRY M, MASTNTER Larcy Masinter, Research Assistant, will continue to work with - Lederberg and Brown on the developmert of the cyclic ucture generator. His LISP expertise has been an invaluable ource for every member of the research team. Yuwie es aan = sy D “AO We ¢ MARK STEREPIK Mr. Mark Stefik, Research Assistant, combines two years of axzerience on the ACME/™S data acquisition system with a long-tern commitment to computer science. He has developed interactive library search capabilities for the mass spectrometer and will continue to improve them. His knowledge of the data acyuisition conpuater programs will he very valuable in assisting initial translation of those programs into FORTRAN (from PL/ACME code) for tha axtendead PNP-11/20 systen. RESPFARIH ASSISTANT - unnamed wo have interviewed two prospective Research Assistants, hoth of whom have broad chemical experience and strong conputer science interests. We request funds to hire one of them to provide additional links between computer science technigues and structure alucidation problems. SECRETARIAL SUPPORT Qna Cull-time secretary is necessary for the secretarial support af this number of scientists. Ms. Kathleen Wharton is now with the Computer Science group. POUTPMENT DNRCHASEs t= discussad in the text (Section III.A), in the first year we olan t9 augment our existing PDP-11/720 computer (4k memory) to allow its %peration as a stand-alone data system. We plan to ada 16k of memory ($3,900), a floating print arithmetic unit ($7,500), in industry compatible tape drive ($9,900), a disk drive (10,500), a low soveed communications interface ($1,000), and a bootstrap loader and clock ($1,209). These devices together with state sales tax total to £34,000. The prices quoted are representations »f the most cost-effective suppliers of the respective devices we rave heen able to locat2. We will continue to review the market haforea implementation to maximize technical and cost performance. As stated above, we plan to provide interface proyrams to provide the communication link between the users and the programs. The universal language of molecular structure is diagranmatic representation of the structures, drawn usually in two dimensions (oc as two-dimensioral representations of three dimensional information). Therefore, we feel that a graphics terminal such as the DET GT-40 is necessary for effective sharing of the programs among Stanford users. The GT-40 terminal is a good choice for nerforniaq this structural display task, for a number of reasons. Programs are available for input and output of structural information which can be modified to run on a GT-40 (e.g., we have just implemented on an experimental basis routines made available to us by R. Feldman, NIN); Sophisticated structural display provarams have been written especially for a GT-40 which we would hope to noint; and the ATM-SUMFX resource will specifically support one GT-40 for use by the SUMEX staff. This terminal will be physically located in the MS laboratory since all of the users will interact with that laboratory. FQOUTPMENT MATNTSNANCE: Maintenanc2? is budgeted for the proposed stand-alone PDP-11/20 system under DEC contract base? on current prices. Also included +3 a hudget for maintenance of the MAT-711 system. This estimate is based on our experience with parts replacements to date. We will provide the necessary maintenance manpower (see personnel justification) because Varian cannot provide adequate service. SUPPLIFS: Supplies are budgeted in various categories based on our operating ayparience to date. Flectronics supplies include parts necessary for maintaining our electronics and test equipment (#1,000) as well as parts in the first year for the metastable ion data system (3,200). These comprise several D/A and A/D converters for 3ccalerating voltage, ESA voltage, and magnetic field control as well as parts to upgrade the Hall probe mass marker. GC supplies jnclude carrier gases, columns, phases, syringes, septa, etc., for SC/MS operation. The liquid nitrogen is required for cold trap yperation on the MAT-711. Chemicals, glassware, etc., include the various organic chemicals, glassware, apparatus, glass tubing, atc., needed to support the GC/MS laboratory operation. Data recording media include special nv sensitive recoriing paper for the MAT-711, paper for GC and instrumentation recorder, and salcomp paver and pens for ion currennt and spectrum plotting. “iri-computer supplies include paper, magnetic tape, ribbons, spare disk cartridges, etc., for data system operation. TPAVELS The travel budget covers estimated neels (2 east coast and 2 west soast) trips for attending related professional meetings and Literfactiag potential program users nationally. Pomestic travel is budgeted for two Fast Coast trips and two California trips per year among all personnel. No foreign travel is budgeted. ITHER: The “otharc” budget includes operating telephone, office supplies, postaqe, re2production, etc., support necessary for this project basel on our previous experienzte. The "computer usage" allocation provides a continued limited usage of the 370/158 computer during the angqmentation of the PDP-11/20 systam. This cost does not apnear in later years. Terminal rental covers four terminals to he distributed among the ™S laboratory, the lTomputer Science Dept., and J. Lederberg's lahcratory. BIOGRAPHICAL SKETCHES /O SECTION II —- PRIVILEGED COMMUNICATION Principal Investigator: Carl Djerassi BIOGRAPHICAL SKETCH ; (Give the following information for all professional personnel listed on page 3, beginning with the Principal Investigator. Use Continuation pages and follow the same genera’ format for each person. } NAME TITLE BIRTHDATE (Mo., Day, Yr.) Carl DJERASSI Professor of Chemistry October 29, 1923 PLACE OF BIRTH (City, State, Country) PRESENT NATIONALITY (ff non-U.S. citizen, SEX indicate kind of visa and expiration date) Vienna, Austria U.S.A. Male (] Femate EDUCATION (Begin with baccalaureate training and include postdoctoral) YEAR SCIENTIFIC INSTITUTION AN TI N AND LOCATION DEGREE CONFEARED FIEID Kenyon College A.B. (summa 1942 Chemistry, Biology cum laude) University of Wisconsin Ph.D. 1945 Organic chemistry, Biochemistry (minor) HONORS Hon, D.Sc., Natl. Univ. of Mexico (1953), Kenyon College (1958), Worcester Polytechnic Institute (1972); Hon. Prof., Fed. Univ. Rio de Janeiro (1969). Member U.S. National Academy of Sciences, American Academy of Arts and Sciences, foreign member, Royal Swedish Academy of Sciences, German Academy of Natural Scientists (leopoldina), Brazilian Academy _of Sciences, (cont, below) MAJOR RESEARCH INTEREST Niqt, prod. chemist ROLE IN PROPOSED PROJECT (steroids, alkaloids, terpenoids, antibiotics) and _ ; chem. applications of physical methods (mass Principal Investigator _spec,, optical rotatory aispersion, circular RESEARCH SUPPORT (See instructions) dichroism). Current Total % Time Grant Title Period Year Budgeted Effort NIH AM 04257 Mass Spectrometry in 10/1/70 to $52,306 $316,016 10% Organic and Biochemistry 9/30/75 NIH GM AM Marine Chemistry with 1/1/73 to $75,650 578,180 18% 06840-15 special emphasis on steroids 12/31/77 NSF Pending Grant Application #P3P3689, Magnetic Circular Dichroism in Organic Molecules, in the amount of $27,640. RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with present position, list training and experience refevant to area of project. List all or most representative publications. Do not exceed 3 pages for each individual.) Academic Experience: Professor of Chemistry, Stanford University, 1959~present. Associate Professor (1952-1954) and Professor (1954-1959), Wayne State University. Industrial Research Experience: Ciba Pharmaceutical Co., summit, N.J.: Research Chemist, 1942-1943 and 1945-1949, Syntex Corporation: Associate Director of Chemical Research (Mexico City) 1949-1952, Research Vice President (Mexico City) 1957-1960; (Palo Alto, California) 1960-1968, ; President, Syntex Research 1968-paesemt, Zoecon Corporation (Palo Alito), President, 1968-preesat. Editorial Boards: 1972, (Current) Journal of the American Chemical Society, Steroids, Tetrahedron, Organic Moss Spectrometry. (continued on next page) Honors (cont.) Mexican Academy for Scientific Investigation. Hon. Fellow of Phi Lambda Upsilon. Amer. Academy of Pharmaceutical Sciences, British Chemical Society and Mexican Chemical Society, Phi Beta Kappa. Numerous hon. lectureships including 1964 Centenary Lecturer (The British Chemical Society) and 1969 Annual Chemistry Lecturer, Royal Swedish Academy of Engineering. American Chemical Society Award in Pure Chemistry (1958), Baekeland Medal (1959); Fritzsche Award (1960).’ Intra-Science Research Foundation Award (1969),’ Freedman Patent Award of American Institute of Chemists (1971). Foreign Member, Royal Swedish Academy of Sciences (1972). D.Sc. (hon.), Worcester Poiytechnic Institute (1972). Scheele-Lecturer, Pharmaceutical Society of Sweden (1972); American Chemica! RHS-398 Society's Award for Creafive Invention (19/73), National Medat cr serence (7/5). Rev. 3-70 age U.S, SOVERNMENT PRINTING OFFICE : 1971 O - 481-796 Hf ‘ ; DO NOT TYPE IN THIS SPACE-BINDING MARGIN BIOGRAPHICAL SKETCH (C. Djerassi) Continuation page = Principal Investigator:Cart Djerassi RESEARCH AND/OR PROFESSIONAL EXPERIENCE (cont.) Miscellaneous: {Chairman of the AAAS Gordon Research Conferences on Steroids and Natural Products (1952-1954); Member of American Pugwash Committee (1968 to present); Chairman of Latin America Science Board of National Academy of Sciences (1966-1968); Chairman of National Academy's Board on Science and Technology for International Development. - PUBLICATIONS Author or co-author of six books and approximately 800 publications dealing with natural products (notably steriods, terpenoids, alkaloids and antibiotics), medicinal chemistry (primarily antihistamines, oral contraceptives and anti-inflammatory agents) and applications of physical methods (mass spectrometry, optical rotatory dispersion, magnetic circular dichroism) to organic and biochemical problems. PHS -398 Page Rev, 2-69 GPO : 1969 © - 350-360 fer TITLE BIRTHOATE (Ata, Gay, £4) Rahs rofessor and Executive Head, LEDERBERG, JOSHUA Department of Genetics 5-23-25 PLACE OF BIRTH (City, State, Counvy) PRESENT NATIONALITY (/f non-US estizen, SEX : Indicate kind of visa and expiration data) Montedair. New Jersey U.S.A. TA Male (i Feralas EDUCATION (Begin with baccalsureate training und include postdactorai)} ~ YEAQ SCIENTIFIC INSTITUTION ANO LOCATION DEGREE oan | eee Columbia College, New York B.A. 1944 College of Physicians & Surgeons, : Columbia University, New York (1944-46) Yale University Ph.D. 194? Microbioglozy HONORS 1957 =- National Academy of Sciences 1958 - Nobel Prize in Medicine MAJOR RESEARCH INTEREST ROLE IN PROPOSED PROJECT Molecular Genetics; Artificial Intelligencg PRINCIPAL INVESTIGATOR RESEARCH SUPPORT (See sasxrucaons) SEE ATTACHMENTS: RESEARCH ANDO/ORA PROFESSIONAL EXPERIENCE (Starting with present positon, list {raining end expenence relevant to area of prcjech hist alt or most representanve publications Do notexceed 3 pages for each individual) 1961- 1957-1959 1957 1950 1947-1959 | 1946-1947 1945-1946 Stanforc University _,Director, Kennedy Laboratories for Molecular Medicine oo... 0002 eee ce ten “Eggo Professor, Genetics and Biology, and Executive Head, Department of Genetics, Stanford University - University of Wisconsin Chairman, Department of Medical Genetics Melbourne University, Australia Fullbright Visiting Professor of Bacterlolcgy University of California, Berkeley Visiting Professor of Bacteriology University of Wisconsin Professor of Genetics Yale University. Research Fellow of the Jane Coffin, Childs Fund for. Medical Research Columbia University. Research Assistant in Zoology Professional Activities: e NIMH: National Menral’ Health Advisory Council 1967- 1961-1962 President (Kennedy)'s Panel on Mental Retardation 1960- NASA Committees: Lunar and Planetary Missions Board 1958-"_ National Academy of Sciences: Committees on Space Biology . 1950-. President's Science Advisory Committee panels: National Institutes of Health, National Science Foundation study sections (genetics) RHS-398 - Rev. 3-70 Grant Number 1) NASA:NGR-05-020-004 2) NIH:AI-05160 3) NIH:GM 4) NIH:RR-00785 5) NIPX. Computer Lab- alth Care Réegoukce Progran @ N1HW:GM00295 h/ RESEARCH SUPPORT SUMMARY FOR JOSHUA LEDERBERG Grant Title Cytochemical Studies of Planetary Micro-organisms Genetics of Bacteria Genetics Research Center Stanford University Medical Experimental Computer Facility (SUMEX) Successor to #3 Lagge Scale~Qcreening o dy Fluids for tabolic Nhigns of DiseaSe.with Comput @x managed Chromatogra and Mass ‘Spectrometr Training Grant in Genetics Current Year $150,000 60,000 547,035 571,567 a 121,172 N Total Award $3,950,000 280,000 2,609,383 2,769,262 908, 238 321,163 12/7/73 Grant Term Budgeted % Time 9/60-8/74 4% (Future support dubious) 9/68-8/73 15% (Renewal Pending) 9/73-8/78 10% (Pending) 10/73-7/78 202% 9/73<8/78 vs (Pend > Program Funds impounded) 7/1/73-6/30/77 15% SELECTED LIST OF PUBLICATIONS Lederberg, J., 1959 A View of Genetics Les Prix Nobel en 1958: 170-89. Buchs, A., A. B. Delfino, A. M. Duffield, C. Djerassi, B. G. Buchanan, E. A. Feigenbaum, and J. Lederberg, 1970. Applications of Artificial Intelligence for Chemical Inference. VI. Approach to a general method of interpreting low resolution Mass spectra with a computer. Helvitia Chimica Acta 53 (6): 1394-1417. Feigenbaum, E. A., B. G. Buchanan, J. Lederberg, 1971 On generality and problem solving: a case study using the DENDRAL program in Machine Intelligence 6, (B. Meltzer and D. Michie, eds.), Edinburgh University Press, P. 165-190. Reynolds, W. E., V. A. Bacon, J. C. Bridges, T. C. Coburn, B. Halpern, J. Lederberg, E. C. Levinthal, E. Steed, R. B. Tucker, 1970 A Computer Operated Mass Spectrometer System. Analytical Chem. 42:1122-1129, September 1970. Lederberg, J. "Use of Computer to Identify Unknown Compounds: The Automation of Scientific Inference" in Biochemical Applications of Mass Spectrometry (G. R. Waller, ed.). John Wiley & Sons, New York (in press). 1S SECTION Il — PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH (Give the following information for all professional personnel listed on page 3, beginning with the Principal Investigator. Use continuation pages and follow the same general format for each person} NAME TITLE . BIRTHDATE (Mo,, Day, Yr.) Principal Investigator, Feigenbaum, Edward A. © DENDRAL Project 1+20-36 PLACE OF BIRTH (City, State, Cauntry) PRESENT NATIONALITY (/f non-US citizen, SEX indicate kind of visa and expiration date) Weehawken, New Jerse U.S. Citi | > y Citizen 7 Male (1 Female EDUCATION (Begin with baccalaureate training and include postdoctoral) . YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERRED FIELD Carnegie Institute of Technology Pittsburgh, Pennsylvania B.S. 1956 Electrical Engineering Ph.D. 1959 Benavioral Sciences. .HONORS and memberships: American Psychological Association; Association for Computing Machinery (Member of the National Council 1966-68); American Association for the Advancement of Seience, SIGBIO Chairman, 11/73-present. MAJOR RESEARCH INTEREST : ROLE iN PROPOSED PROJECT Artificial Incelligence , Principal Investigator RESEARCH SUPPORT (See instructions) RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with present position, list training and experience relevant to area of projece List al’ or most representative publications, Do not exceed 3 pages for each individual.) 1965~ Stanford University, Computer Science Department Faculty 1965-1968 Stanford University, Director, Computation Center 1963 Summer Research Training Institute in Computer Simulation of Cognitive Processes (National Science Foundation) 1962 Carnegie Corporation. Summer Research Training Institute in Heuristic Programming. Faculty member. 1960-1964 University of California, Berkeley Research-Center for Research in Management Science, 1960-1964 Research-Center for Human Learning, 1961-1964 Assistant and Associate Professor, School of Business Administration, 1960-64 1957-1960 The RAND Corporation, Santa Monica, California 1956 - IBM Seientific Computing Center, New York Selected Publications: "Applications of Artificial Intelligence for Chemical Inference I. The Number artt of Possible Organic Compounds. Acyclic Structures Containing C, H, O andi, J. Am. Chem. Soc., 91, 2973 (1969). (Co-Author). "Applications of Artificial Intelligence for Chemical Inference II. Interprevation of Low Resolution Mass Spectra of Ketones", J. Am. Chem. Soc., 91, 2977 (1969). (Co~Author). . . RHS-398 Rev. 3-70 ; . /, b Publications of Edward Feigenbaum "Applications of Artificial Intelligence for Chemical Inference III. Aliphatic Ethers Diagnosed by their Low Resolution Mass Spectra and Nuclear Magnetic Resonance", J. Am. Chem. Soc., 91, 7440 (1969). (Co-Author). "Heuristic DENDRAL: A Program for Generating Explanatory Hypotheses in Organic Chemistry", in Machine Intelligence 4, Edinburgh University Press, 1969. (Co-Author). — “Toward an Understanding of Information Processes of Scientific Inference in the Context of Organic Chemistry", in Machine Intelligence 5, sdinburgnh University Press, 1970. (Co-Author). "A Heuristic Program for Solving a Scientific Inference Problem: Summary of Motivation and Implementation", Stanford Artificial Intelligence Project Memo No. 104, November 1969. (Co-Author). “Applications of Artificial Intelligence For Chemical Inference IV. Sa urated Amines Diagnosed by Their Low Resolution Mass Spectra and Nuclear i Resonance Spectra", Journal of the american Chemical Society, 92, 6 (Co-Author). "Applications of Artificial Intelligence for Chemical Inference V. An Approach to the Computer Generation of Cyclic Structures. Differentiation Between All the Possible Isomeric Ketones of Composition C6H100", Organic Mass Spectrometry, 4, 493 (1970). (Co-Author). "Applications of Artificial Intelligence for Chemical Inference VI. Approach to a General Method of Interpreting Low Resolution Mass Spectra with a Computer", Chem. Acta Helvetica, 53, 1394 (1970). (Co-Author). "On Generality and Problem Solving: A Case Study Using the DENDRAL P " in Machine Intelligence 6, Edinburgh University Press (1971). (Co-Au "A Heuristic Programming Study of Theory Formation in Science”, 3 of the Second International Joint Conference on Artificial Intel Imperial College, London (September 1971). (Co-Author). n” ae 4 -_ roceedings r nee, ~~ v + ra Be a "Applications of Artificial Intelligence for Chemical Inference VIII. An Approach to the Computer Interpretation of the High Resolution Mass Specvra of Complex Molecules. Structure Elucidation of Estrogenic Steroids", Journal of the American Chemical Society, 94, 5962-5971 (1972). (Co-Author). "Heuristic Theory Formation: Data Interpretation and Rute Formation", in Mechine Intelligence 7, Edinburgh University Press (1972). (Co-Author). "Applications of Artificial Intelligence for Chemical Inference X. Intsum A Data Interpretation Program as Applied to the Collected Mass Spectra of Estrogenic Steroids", Tetrahedron, 29, 3117 (1973). (co-author). /7 SECTION It - PRIVILEGED COMMUN. ION BIOGRAPHICAL SKETCH {Give the following information for all professional personnel listed an page 3, beginning with the Principal Investigator. Use continuation pages and follow the same general format for each person. ] NAME TITLE BIRTHDATE (to., Day, Yr) Buchanan, Bruce G. Research Computer Scientist 7-77-40 PLACE OF BIRTH (City, State, Country) PRESENT NATIONALITY {/f non-US citizen, SEX indicate kind of visa and expiration date) St. Louis, Missouri U.S.Citizen . 44) Mate (1 Female EDUCATION (Begin with baccalaureate training and include postdoctoral} YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERRED FIELO Ohio Wesleyan University B.A. 1961 | Methematics Michigan State University M.A., Ph.D. | 1966 Philosophy ‘HONORS Recipient of National Institutes of Health Career Development Award (1971-21976) MAJOR RESEARCH INTEREST ROLE iN PROPOSED PROJECT Artificial Intelligence Associate Investigator RESEARCH SUPPORT (See instructions) NIH Research Career Development Award, GM-29662 elf RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with present position, list training and experience relevant to area ofproject List or most representative publications, Do not exceed 3 pages for each individual.) 1972-present Research Computer Scientist, Stanford University 1966-1971 Research Associate, Stanford Artificial.Intelligence Project Publications: "On the Design of Inductive Systems: Some Philosophical Problems". British Journal for the Philosophy of Science 20 (1969), 311-323. (Co-Author). "Applications of Artificial Intelligence for Chemical Inference II. Interpretation of Low Resolution Mass Spectra of Ketones". Journal of the American Chemical Society, 91, 2977-2981 (1969). (CorAuthor). "Applications of Artificial Intelligence for Chemical Inference I. The Nurber of Possible Organic Compounds: Acyclic Structures Containing C, H, O and N". Journal of the American Chemical Society, 91, 2973-2976 (1969). (Co-Author). “Applications of Artificial Intelligence for Chemical Inference III. Aliphatic Ethers Diagnosed by Their Low Resolution Mess Spectra and NMR Data". Journal of the American Chemical Society, 91, 7440-45 (1969). (Co-Author). “Heuristic DENDRAL: A Program for Generating Explanatory Hypotheses in Organic . Chemistry". Machine Intelligence.4, Edinburgh University Press:(1969). (Co-Author). RHS-398 Rev. 3-70 1S Publications of Bruce Buchanan: "Toward an Understanding of Information Processes of Scientific Inference in the Context of Organic Chemistry". Machine Intelligence 5, Edinburgh University Press (1969). (Co-Author). "On Generality and Problem Solving: A Case Study Using the DENDRAL Progren". _ Machine Intelligence 6, Edinburgh University Press (1969). (Co-Author). "Some Speculation About Artificial Intelligence and Legal Reasoning". Stanford Law Review, Vol. 23, No. 1, November 1970. (Co-Author). - “Applications of Artificial Intelligence for Chemical Inference VI. Approach to a General Method of Interpreting Low Resolution Mass Spectra with a Computer”. Chemica Acta Helvetica, 53, 1394 (1970). (Co-Author). "An Application of Artificial Intelligence to the Interpretation of Mass Spectra". Mass Spectrometry Techniques and Appliances (1970). "Applications of Artificial Intelligence for Chemical Inference IV. Saturated Amines Diagnosed by Their Low Resolution Mass Spectra and Nuclear Magnetic Rescnarce Spectra". Journal of the American Chemical Society, 93, 6831 (1970). (Co-Author). "The Heuristic DENDRAL Program for Explaining Empiricel Data". Proceedings of IFIP Congress 1971, Ljubljana, Yugoslavia. (Co-Author). "A Heuristic Programming Study of Theory Formation in Science". Proceedings of Second International Joint Conference on Artificial Intelligence, Imperial College, London (1971). (Co-Author). “Applications of Artificial Intelligence for Chemical Inference VIII. An Approach to the Computer Interpretation of the High Resolution Mass Spectra of Complex Molecules. Structure Elucidation of Estrogenic Steroids". Journal of the American Chemical Society, 1972. (Co-Author). "Heuristic Theory Formation: Data Interpretation and Rule Formation". Machine Intelligence 7, Edinburgh University Press (1972). (Co-Author). "Review of Hubert Dreyfus' 'What Computers Can't Do: A Critique of Artificial Reason'", Computing Reviews (January, 1973). "Applications of Artificial Intelligence for Chemical Inference IX. Analysis o Mixtures Without Prior Separation as Illustrated for Estrogens". Submitted to Journal of the American Chemical Society. (Co-Author). o a. tne "Applications of Artificial Intelligence for Chemical Inference X. Intsum A Data Interpretation Program as Applied to the Collected Mass Spectra of Estrogenic Steroids". Tetrahedron, 29, 3117 (1973). (co-author) "Rule Formation on Non-Homogeneous Classes of Objects". In proceedings of the Third International Joint Conference on Artificial Intelligence (Stanford. 1973). (co-author). "Current Status of the Heuristic DENDRAL Program for Applying Artificial Intelligence to the Interpretation of Mass Spectra". DENDRAL Project Memo, August 1973 /? Biographical Sketch of Bruce G. Buchanan Memberships: Association for Computing Machinery (ACM) Philosophy of Science Association American Association for Advancement of Science (AAAS) 20 SS St ey Lg et 2 ri eee Use continuation cops and foilow ine sa general format for each person} NAME Alan M, DUFFIELD TITLE . BLATHOAT Research Associate Decen PLACE OF BIATH (City, State, Country) Perth, Western Australia PRESENT NATIONALITY [/f non-US citizen, SEX indicate kind of visa and expiration date} Australian, Permanent resident Tomircrant Vien (2 Male [7 Femaia EQUCATION (devia with Daccatauregte waning and include postdoctoral) . INSTITUTION AND LOCATION DEGREE YEAS SCIENTIEIC CONFERRED FIELO University of Western Australia B. Sc(lst Class , F . Hons) 1958 Organic Chemistry ~ University of Western Australia Ph.D. 1962 Organic Chemsitry | HONORS 1 MAJOR RESEARCH INTEREST ‘Applications of mass spectrometry to Biology and Biomedical Problems ROL? IN PROPOSED PROJECT Organic Chemist/mass spectroscopist RESEARCH SUPPORT (See instrucdons) N/A : 4 i Of Most representatve publications Oo not exceed 3 cages for each individual.j settee ee fins 18 Rev, 3-79 1970 1969 1965 1963 1962 69 63 An Application of Artificial Int Mass Spectrometry, ©. New York, 1971, po, 121-178 By B. G S3uchancn, A. M. Du Bn a ee ae rT ee tt te “ Research Associate, Department of Genetics, Stanford University School of Medicine - Eead of the Mass Spectrometry Laboratory, Chemistry Department Stanford University . Research Associate, Department of Chemistry, Stanford University Postdoctoral Fellow, Department of Chemistry, Stanford University RESEARCH ANO/OR PROFESSIONAL EXPERIENCE (Starting with present position, fist training and experience reievant £0 area of pra,eC& List ail Postdoctoral Fellow, Department of Biochemistry, Stanford University School of Medicine. . 7 PUBLICATIONS SINCE 1971 alligence to the Interpretation of Mass Spectra. B.W.G. Milne, Ede, John Wiley and Sons, ffield and A. V. Robertson Ly 10, ll, 12, Mass Spectrometry in Structural and Stereochemical Problems. CCIV. Spectra of Hydantoins.II. Electron Impact Induced Fragmentation of some Substituted Hydantoins. Org. Mass Spectr., 5, 551 (1971) By R. A. Corral, 0. 0. Orazi, A. M. Duffield and C. Djerassi Electron Impact Induced Hydrogen Scrambling in Cyclohexanol and Isomeric Methylcyclohexanols. | ~ Org. Mass Spectr., 5, 383 (1971) , sO By R. H. Shapiro, S. P. Levine and A. M. Duffield .. Derivatives of 2-Biphenylcarboxylic Acid. Rev. Roumain. Chem., 16, 1095 (1971) By A. T. Balaban and A. M. Duffield Alkaloide aus Evonymus europaea L. Helv. Chim. Acta, 54, 2144 (1971) By A. Kldsek, T. Reichstein, A. M. Duffield and F. Santavy Studies on Indian Medicinal Plarts. XXVIII. Sesquiterpene Lactones of Enhyura Fluctuans Lour. Structures of Enhydrin, Fluctuanin and Fluctuadin. Tetrahedron, 28, 2285 (1972). By E. Ali, P. P. Ghosh Dastidar, S. C. Pakrashi, L. J. Durham and A. M. Duffield The Electron Impact Promoted Fragmentation of Aurone Epoxides. Org. Mass Spectr., 6, 199 (1972) By B. A. Brady, W. I. O'Sullivan and A. M. Duffield The Determination of Cyclohexylamine in Aqueous Solutions of Sodium Cyclamate by Electron Capture Gas Chromatography. Anal. Letters, 4, 301 (1971) By M. D. Soloman, W. E. Pereira and A. M. Duffield Computer Recognition of Metastable Ions. Nineteenth Annual Conference on Mass Spectrometry, Atianta, 1971, p. 63 - . By A. M. Duffield, W. E. Reynolds, D. A. Anderson, R. A. Stillman, Jr. . and cC. E. Carroll Spectrometrie de Masse. VI. Fragmentation de Dimethyl-2,2-dioxolanes-1,¢- Insatures. Org. Mass Spectr., 5, 1409 (1971) By J. Kossanyi, J. Chuche and A. M. Duffield Chlorpromazine Metabolism in Sheep. II. In vitro Metabclism and Preparation of 3H-7-Hydroxychlorpromazine. ~~ Journees D'Agressologie, 12 , 333 (1971) By L. G. Brooks, M. A. Holmes, I. S. Forrest, V. A. Bacon, A. M. Duffield and M. D. Solomon Mass Spectrometry in Structural and Stereochemical Problems. CCXVII. ’ Electron Impact Promoted Fragmentation of O-Methyl Oximes of Some a,8-Unsaturated Ketones and Methyl Substituted Cyclchexanones. Canadian J. Chem., 50, 2776 (1972) By Y. M. Sheikh, R. J. Liedtke, A. M. Duffield and C. Djerassi As M. Publications 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. Thermal Fragmentation of Quinoline and Tsoquinoline N-Oxides in the Ion Source of a Mass Spectrometer. Acta Chem. Scand., 26, 2423 (1972). By A. M. Duffield and 0. Buchardt _ ‘Applications of Artificial Intelligence for Chemical Inference. VII. An Approach to the Computer Interpretation of the High Resolution Mass Spectra of Complex Molecules. Structure Elucidation of Estrogenic Steroids. J. Amer. Chem. Soc., 94, 5962 (1972) By D. H. Smith, B. G. Buchanan, R. S. Englemore, A. M. Duffield, A. Yeo, E. A. Feigenbaum, J. Lederberg and C. Djerassi Mass Spectrometry in Structural and Stereochemical Problems. CCXIX. Identification of a Unidirectional Quadruple Hydrogen Transfer Process in 7-Phenyl-hept-3-en-2-one O-Methyl Oxime Ether. Org. Mass Spectr., 6,1271 (1972). By R. J. Liedtke, Y. M. Sheikh, A. M. Duffield and C. Djerassi An Automated Gas Chromatographic Analysis of Phenylalanine in Serum. Clinical Biochem., 5, 166 (1972) | . By E. Steed, W. Pereira, B. Halpern, M. D. Solemon and A. M. Duffield Pyrrolizidine Alkaloids. XIX. Structure of the Alkaloid Erucifoline. Coll, Czech. Chem. Commun., (1972) . By P. Sedmera, A. Kiasek:s A. M, Duffield and F, Santavy. Mass Spectrometry in Structural and Stereochemical Problems. CCXXII, Delineation of Competing Fraementation Pathways of Complex Molecules from a Study of Metastable Ion Transitions of Deuteratec Derivatives. Org. Mass Spectr., 7, (1973) -By D. H. Smith, A. M, Duffield and C. Djerassi Chlorination Studies I. The Reaction of Aqueous Hypochlorous Acid with Cytosine. Biochem. Biophys. Res. Commun., 48, 880 (1972) By W. Patton, V. Bacon, A. M. Duffieid, B. Halpern, Y. Hoyano, W. Pereira and J. Lederberg A Study of the Electron Impact Fragmentation of Promazine Sulphoxide and Promazine using Specifically Deuterated Analogues. Austral. J. Chem., 26, (1973). By M. D. Solomon, R. Summons, W. Pereira and A. M. Duffield Spectrometric de Masse. VIII. Elimination d'eau Induite par Impact Electronique dans le Tetrhydro-1,2,3,4-naphtalenediol-1,2. Org. Mass. Spectrom., 7 (1973). By P. Perros, J. P. Morizui, J. Kossanyi and A. M. Duffield The Determination of Phenylalanine in Serum by Mass Fragmentography Clinical Biochem., submitted for publication (1973). By W. E. Pereira, V. A. Bacon, Y. Hoyano, R. Summons and A. M. Duffield xT SECTION If — PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH (Give the following information for all professional personnel listed on page 3, beginning with the Principal Investigator. Use continuation pages and follow the same general format for each person.) NAME TITLE BIRTHDATE (Mo,, Day, Yr) Dennis H. Smith Research Associate 11/12/42 PLACE OF BIRTH (City, State, Country) PRESENT NATIONALITY (/f non-U,S. citizen, SEX indicate kind of visa and expiration date) New York USA X5 mate “Cleemate EDUCATION (Begin with baccalaureate training and include postdoctoral} YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONEERRED FIELD Massachusetts Inst. of Technology Cambridge, Mass. S.B. 1964 Chemistry University of California, Berkeley Berkeley, California Ph.D. 1967 Chemistry HONORS Alfred P. Sloan Foundation Scholarship NASA Predoctoral Traineeship Phi Lambda Upsilon, Sigma Xi MAJOR RESEARCH INTEREST ROLE IN PROPOSED PROJECT ~ Spectromet and A.I. in Chemist : Mass Sp ty ry Research Associate RESEARCH SUPPORT (See instructions) N/A RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with present position, fist training and experience reievant to area of projecé List ait or most representative publications, Do not exceed 3 pages for each individual.) 1971-Present Research Associate, Stanford University, Stanford,Ca. 1970-1971 Visiting Scientist, University of Bristol, Bristol, England 1967-1970 Assistant Research Chemist, University of Calif.at Berkeley, Berkeley, Ca. 1965-1967 NASA Pre-Poctoral Traineeship, University of Calif.at Berkeley,Berkeley, Ca. Publications: See attached list. RHS398 : Rev. 3-70 oo uf 6. 10. PUBLICATIONS: D. H. SMITH H. G. Langer, R. $. Gohlke and D. H. Smith, "Mass Spectrometric Differential Thermal Analysis," Anal. Chem., 37, 433 (1965). S. M. Kupchan, J. M. Cassady, J. E. Kelsey, H. K. Schnoes, D. H. Smith and A. L. Burlingame, "Structural Elucidation and High Resolution Mass Spectrometry of Gaillardin, a New Cytotoxic Sesquiterpene Lactone," J. Amer. Chem. Soc., 88, 5292 (1966). D. H. Smith, Ph.D. Thesis, "High Resolution Mass Spectrometry: Techniques and Applications to Molecular Structure Problems," Dept. of Chemistry, University of California, Berkeley, Califomia (1967). H. K. Schnoes, D. H. Smith, A. L. Burlingame, P. W. Jeffs and W. D&pke, "Mass Spectra of Amaryllidaceae Alkaloids: The Lycorenine Series," Tetrahedron, 24, 2825 (1968). A. L. Burlingame, D. H. Smith and R. W. Olsen, "High Resolution Mass Spectrometry in Molecular Structure Studies, XIV. Real-time Data Acquisition, Processing and Display of High Resolution Mass Spectral Data," Anal. Chem., 40, 13 (1968). A. L. Burlingame and D. H. Smith, "High Resolution Mass Spectrometry in Molecular Structure Studies Il. Automated Heteroatomic Plotting as an Aid to the Presentation and Interpretation of High Resolution Mass Spectral Data," Tetrahedron, 24, 5749 (1968). W. J. Richter, B. R. Simoneit, D. H. Smith ond A. L. Burlingame, "Detection and Identification of Oxocarboxylic and Dicarboxylic Acids in Complex Mixtures by Reductive Silylation and Computer-Aided Analysis of High Resolution Mass Spectral Data," Anal. Chem., 41, 1392 (1969). The Lunar Sample Preliminary Examination Team, "Preliminary Examination of Lunar Samples from Apollo 11," Science, 165, 1211 (1969). S. M. Kupchan, W. K. Anderson, P. Bollinger, R. W. Doskotch, R. M. Smith, J. A. Saenz Renauld, H. K. Schnoes, A. L. Burlingame and D. H. Smith, "Tumor Inhibitors, XXXIX. Active Principles of Acnistus arboresceis. Isolation and Structural and Spectral Studies of Withaferin A and Withacnistin," J. Org. Chem., 34, 3858 (1969). A. L. Burlingame, D. H. Smith, T. O. Merren and R. W. Olsen, "Real- time High Resolution Mass Spectrometry," in Computers in Analytical Chemistry (Vol. 4 in Progress in Analytical Chemistry series), C. Hh. Ov and J. Norris, Eds., Plenum Press, New York, 1970, pp. 17-38. ao PUBLICATIONS: D. H. SMITH Page 2 Il. 12. 13. 14. 15. 16. 17. 18. 19. 20. The Lunar Sample Preliminary Examination Team, "Preliminary Examination of Lunar Samples from Apollo 12," Science, 167, 1325 (1970). D. H. Smith, "Mass Spectrometry," Chapter X in Guide to Modern Methods of Instrumental Analysis, T. M. Gouw, Ed., Wiley-Interscience, New York, 1972. D. H. Smith, R. W. Olsen, F. C. Walls and A. L. Burlingame, "Real-time Mass Spectrometry: LOGOS--A Generalized Mass Spectrometry Computer System for High and Low Resolution, GC/MS and Closed-Loop Applications," Anal. Chem., 43, 1796 (1971). A. L. Burlingame, J. S. Hauser, B. R. Simoneit, D. H. Smith, K. Biemann, N. Mancuso, R. Murphy, D. A. Flory and M. A. Reynolds, "Preliminary Organic Analysis of the Apollo 12 Cores," Proceedings of the Apollo 12 Lunar Science Conference, E. Levinson, Ed., M.1.T. Press, Cambridge, Mass., 1971, p. 1891. D. H. Smith, "A Compound Classifier Based on Computer Analysis of Low Resolution Mass Spectral Data," Anal. Chem., 44, 536 (1972). D. H. Smith and G. Eglinton, "Compound Classification by Computer Treatment of Low Resolution Mass Spectra-Application to Geochemical and Environmental Problems," Nature, 235, 325 (1972). D. H. Smith, N. A. B. Gray, C. T. Pillinger, B. J. Kimble and G. Eglinton, "Complex Mixture Analysis - Geochemical and Environmental Applications of a Compound Classifier Based on Computer Analysis of Low Resolution Mass Spectra," Adv. in Org. Geochem., 1971, p. 249. D. H. Smith, B. G. Buchanan, R. S. Engelmore, A. M. Duffield, A. Yeo, E. A. Feigenboum, J. Lederberg and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference, VIII. An Approach to the Computer Interpretation of the High Resolution Mass Spectra of Complex Molecules. Structure Elucidation of Estrogenic Steroids," J. Amer. Chem. Soc., 94, 5962 (1972). D. H. Smith, A. M. Duffield and C. Djerassi, "Mass Spectrometry in Structural and Stereochemical Problems, CCXXII. Delineation of Competing Fragmentation Pathways of Complex Molecules from a Study of Metastable lon Transitions of Deuterated Derivatives," Org. Mass. Spectrom., 7, 367 (1973). P. Longevialle, D. H. Smith, H. M. Fales, R. J. Highet and A. L. Burlingame, "High Resolution Mass Spectrometry in Molecular Structure Studies, V. The Fragmentation of Amaryllis Alkaloids in the Crinine Series," Org. Mass Spectrom., 7, 401 (1973). a6 98 ARO ctl a ae PUBLICATIONS: D. H. SMITH Page 3 21. 22. 23. 24. B. R. Simoneit, D. H. Smith, G. Eglinton and A. L. Burlingame. "Applications of Real-time Mass Spectrometric Techniques to Environmental Organic Geochemistry, If. San Francisco Bay Area Waters," Arch. Env. | Contam and Tox., 1, 193 (1973). D. H. Smith, B. G. Buchanan, R. S. Engelmore, H. Adlercreutz and C. Djerassi, "Applications of Artificial Intelligence for Chemical inference, IX. Analysis of Mixtures Without Prior Separation as Illustrated for Estrogens," J. Amer. Chem. Soc., 95, 6078 (1973). D. H. Smith, B. G. Buchanan, W. C. White, E. A. Feigenbaum, J. Lederberg and C. Djerassi, “Applications of Artificial Intelligence for Chemical Inference, X. INTSUM - A Data Interpretation and Summary Program Applied to the Collected Mass Spectra of Estrogenic Steroids," Tetrahedron, 29, 3117 (1973). G. Loew, M. Chadwick and D. H. Smith, "Applications of Molecular Orbital Theory to the Interpretation of Mass Spectra. Prediction of Primary Fragmentation Sites in Organic Molecules," Org. Mass Spectrom. , 7, 1241 (1973). 7 SECTION Il ~ PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH (Give the fottowing information for all professional personnel listed on page 3, beginning with the Principat Investigator, Use continuation pages and follow the same general format for sach person} NAME TITLE BIRTHDATE (Ma, Osy, Yr.) Sridharan, Natesa S. Research Associate 10/2/46 PLACE OF BIRTH (City, State, Country) PRESENT NATIONALITY (/f non-US citizen, SEX indicate kind of visa and expiration date) ., India; Madras, India 5/73-U.S. permanent residence (R) Mate (7) Femate EDUCATION (Begin with beccelaureate training and include postdoctoral) * - YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFEARED FIELD Indian Institute of Technology, Madras Bachelor of India Technology 1967 Electrical Engineering State University of New York, Stony Brook |M.S. 1969 Computer Science Ph.D. 1971 Computer Science HONORS University Fellow - 1968-1971, SUNY Stony Brook; Graduate Assistant - 1967-1968, SUNY Stony Brook; Siemens! Award (awarded for top rank in Electrical Engineering) - 1967, ITT Madras; National Merit Scholarship - 1963-1967, ITT Madras MAJOR RESEARCH INTEREST ROLE IN PROPOSED PROJECT Computer Application in Chemistry and Medicine Research Associate RESEARCH SUPPORT (See instructions} RESEARCH AND/OR PROFESSIONAL E XPERIENCE (Starting with present position, dist training and experience relevant to sraa of projpct List aif or most representative publications, Do not exceed 3 pages for each individual.) 1971-present Research Associate, Heuristic Programming Project, Stanford University 1970-1971 Consultant, IAC Computer Corp., Long Island, N.Y. Sridharan, N.S., "An Application of Artificial Intelligence to Organic Chemical Synthesis" Doctoral Thesis, State University of New York at StonyBrook, 1971. Sridharan, N.S., "Search Strategies of Organic Chemical Synthesis", Third Internationa] Joint Conference on Artificial Intelligence (3IJCAI), Stanford, 1973 Sridharan, N.S. (co-author), "Heuristic DENDRAL: Analysis of Molecular Structure", Proc. NATO Advanced Study Institute, Amsterdam, 1973. Sridharan, N.S. (co-author), "Heuristic Theory Formation'', Machine Intelligence, Volume 7, Edinburgh, 1972. Hin 398 (FORMERLY PHS 390) Rav. 1/73 ; U, 8, GOVERNMENT PRINTING GFFICE : 1871 0-45 .-456 SECTION II — PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH (Give the folowing information for all professional personne! listed on page 3, beginning with the Principal Investigator. Use continuation pages and follow the same general format for each person.) NAME TITLE BIRTHDATE (Ma,, Day, Yr.) Brown, Harold D. Associate Professor July 12,1934 PLACE OF BIRTH (City, State, Country} PRESENT NATIONALITY (/f non-U.S. citizen, SEX , indicate kind of visa and expiration date) ; South Bend, Indiana U.S. . (imate Cl Femate EDUCATION (Begin with baccalaureate training and include postdoctoral) YEAR SCIENTIFIC INSTITUTION AND LOCATION OEGREE CONFERRED FIELO University of Notre Dame, Notre Dame, M.Sc. 1963 Mathematics Indiana Ohio State University, Columbus, Ohio Ph.D. 1966 Mathamatics (No Baccalaureate Degree) HONORS Summa Cum Laude - Notre Dame MAJOR RESEARCH INTEREST JROLE IN PROPOSED PROJECT Applied Discrete Mathematics - Computer Science | Research Associate RESEARCH SUPPORT (See instructions} Principal Investigator, NSF-GP-16793 (Expires March, 1974) Pending Proposal NSF (Proposed starting date September, 1974) RESEARCH AND/OR PROFESSIONAL EXPERIENCE (Starting with present position, dist training and experience refevant to erea cf project List aff or most representative publications, Do not exceed 3 peges for each individual.) Visiting Associate Professor, Computer Science, Stanford University , 1971-72, 1973-present Associate Professor, Mathematics, Ohio State University, 1966- Visiting Professor, Mathematics, Rhine,Westf. Tech. Hoch., Aachen, 1972 and 1973 Visiting Member, Courant Institute, New York University, 1967-68 Instructor/Assistant Professor, Assistant Chairman, Mathematics, Ohio State U., 1963-65 Assistant to the Chairman, Mathematics, University of Notre Dame, 1960-63 Director or Associate Director, NSF-SSTP, 1964-70 NIH 398 (FORMERLY PHS 396) Rev, 1 * m3 U.S, COMERYMENT PRINTING OFFICE : 1077 C €59- 78 A ee AE TE ON ETRE Vitae Page 2 Publications Near Algebras, Ill. J. Math, 12(1968), Pg. 215. Distributor Theory in Near Algebras, Comm, Pure App. Math. XX1(1968), Pee 5356 An Algorithm for the Determination of Space Groups, Math Comp. 23(1969), Pe. 499. Some Empirical Observations on Primitive Roots, with H. Zassenhaus, J. Number Theory 3(1971), Pe. 306. A Generalization of Farey Sequences, with K, Mahler, J. Number Theory 3(1971), pe. 364, Basic Computations for Orders, Stanford CS Memo STAN~CS-72-208, An Application of Zassenhaus' Unit Theorem, Acta Arith. XX(1972), Pe. 154. Integral Groups I: The Reducible case, with J. Neubiser and H. Zassenhaus, Numer, Math. 19(1972), Pg. 386. Integral Groups II: The Irreducible Case, with J. Neubl’ser and H. Zassenhaus, Numer, Math. 20(1972), Pg. 22. ’ Integral Groups III: Normalizers, with J. Neubuser and H, Zassenhaus, Heth. Comp. 27(1973), Pee 167. Constructive Graph Labeling Via Double Cosets, with L. Hjelmeland and L, Masinter, Discrete Math, in press and Stanford CS Meno STAN~CS-72-318, An Algorithm for the Construction of the Graphs of Organic Molecules, with L, Masinter, Discrete Math. in press and Stanford CS hemo STAN-CS-73-261,. | The Crystallographic Groups of 4-dimensional Space, with J. Neubiser, H. Wondratschek and H, Zassenhaus, Wiley-Interscience in press. JO SECTION Il — PRIVILEGED COMMUNICATION BIOGRAPHICAL SKETCH {Give the foliowing information for all professional personnel listed on page 3, beginning with the Principal Investigator. Use continuation pages and fotlow the same general format for each person.]} NAME TITLE BIRTHDATE (Ma, Day, Yr) DROMEY, Robert Geoffrey Research Associate 11/21/46 PLACE OF BIRTH (City, State, Country) PRESENT NATIONALITY (/f non-U.S citizen, SEX indicate kind of visa and expiration date) Castlemaine, Victoria, Australia ‘Australian, J-1 Visa, Exp. 10/8/74 () Mate Cl Femate EDUCATION (Begin with baccalaureate training and include postdoctoral) YEAR SCIENTIFIC INSTITUTION AND LOCATION DEGREE CONFERRED FIELO Swinburne College of Technology, Diploma of 1968 Chemistry Melbourne, Australia Appl. Chem. La Trobe University Ph.D. 1973 Molecular Science Melbourne, Australia HONORS csTRO Postdoctoral Studentship Commonwealth Postgraduate Research Scholarship Walter Lindrum Memorial Scholarship Equivalent of First Class Honors Master of Science Preliminary (1969) MAJOR RESEARCH INTEREST anniication of ROLE IN PROPOSED PROJECT Artificial Intelligence Techniques to Bio; Medical and Chemical Problems. Research Associate RESEARCH SUPPORT (See instructions) RESEARCH AND/OR PROFESSIONAL E XPERIENCE (Starting with present positicn, list reining and experience refevent to ares of project List ail or most representative publications, Do not exceed 3 pages for each individual.) 1973 DENDRAL Project, Stanford University, Computer Science Department 1973 Software Development for Graphics Systems, LaTrobe University, Computer Centre 1969-73 Construction, development and applications of an on-line photoelectron spectrometer LaTrobe University, Chemistry Department 1969-73 Application of Deconvolution Techniques to the Processing of Experimental Data. Publications: "Deconvolution and Its Applicatim to the Processing of Experimental Data", Intl. Journal of Mass Spectrometry and Ion Physics, 1970, 4. (co-author). "Inverse Convolution in Mass Spectrometry", Intl. Jnl. Mass Spec. Ion Phys.,1971, 6. (co- author). "A Combined Time Averaging-Deconvolution Technique Applied to Electron Impact lonization Efficiency Curves", Internation Journal of Mass Spectrometry & Ion Physics, 1971, 6. (co-author). "The Perfect Direction and Velocity Focus at 254934' in a Cylindrical Electrostatic Field'', Reviews of Scientific Instruments, 1973, 44. (co-author). MiH 398 (FORMERLY PHS 396) Rev, } " 3 U, S. GOVERNMENT PRINTING CRFICE . 0971 O - wot PEE Ts F R. G. Dromey "Detection of Spin-Orbit Splitting in the Photoelectron Spectrum of Oot by Deconvolution", Chem. Physics Letters (in press), 1973. (co-author) "The Effect of Finite Line Widths on the Interpretation of Photoelectron Spectra", Journal of Electron Spectroscopic (accepted for publication). (co-author). "An On-line Ultraviolet Photoelectron Spectrometer for High-Resolution Studies of Molecular Structure'’, Australian Journal of Chemistry (accepted for publication). (co-author). "Photoelectron Spectroscopic Correlation of the Molecular Orbitals of the Alkanes and Alkyliodides", Journal of Molecular Structure (submitted for publication). (coauthor). “Comparison of the Photoelectron Spectra and the Photoionization Efficiency Curves for the Alkyliodides", Transactions of the Faraday Society (submitted for publication). (co-author). "A Convolution-Deconvolution Algorithm Using Fast Fourier Transforms), Decuscope, 1973 (in press). Tar RESEARCH PLAN RTOMOLECULAR CHARACTEPRTZATION: ARTIFICIAL INTELLIGENCE A Program of Resource-felated Research T. INTRODUCTION A. %Ibijectives B. Background and Rationale C. Relationship to AIM-SUMFX and the Genetics Research Center It. SPECIPIC AIMS Itt. MFTHODS Tv. SIGNTFICANCE OF PROPOSED RPESEARCE V. FACILITIES & EQUIPMENT Vr. ET BLIOGRAPHY Table 1 Pigures 1-3 Appendix A: Letters of Interest Appendix RB: 1973 Annual Report to the NTH S$ I. INTRODUCTION This renewal application is intended to. sustain and augment the capabilities of the mass spectrometry (MS) program which has served as a major institutional resource at Stanford for some yoars. With previous support from NASA and NSF it has made possible a highly interdisciplinary set of research projects ranging over: artificial intelligence (AT) in bionolecular tharacterization, natural product chemistry, clinical biochemical studies on steroids, and the machanisms of molecular fragment Formation in mass spectrometry. While the facility equipment for mass spectrometry has been funied mostly by other agencies, connected research programs embrace several NIY research projects as wall. In addition, this activity was closely coupled with the ACME Madical School computer resource (1966-1973) and will have Similar associations with the new ATM-SUMEX conputer resource cetently fanded by the BRR (see Section T.7). Previous support reflects the diversified facets of this interdisciplinary research. NASA haS supported projects in new iasteouneatation, including the initial mass spectrometer-computer link, NSF has supported chemical research, and ARPA has supported auc artificial intelligence research and initial application to mass spectrometry, Overall cuthacks have forced NASA to reduce Funding for this area of research despite their interest. Under ARPA support to Drs. Feigerbaum and Lederberg for AI research, the DENDRAL program became recognized as one of the most successful AI applications programs. However, ARPA is chartered to fund frontier computer science research and no longer provides funds for the DENDRAL applications programs. ARPA has indicated a reluctance to continue funding to this groun for the theory formation work in chemistry, although we expect to continue to ceceiv2 ARPA support for more theoretical aspects of our research oroaram (@.g9., automatic programming). We previously submitted a comprehensive proposal to the NIH (R8-00785, 3/28/73) which included an application for the AIM-SUMEX computing resource and a renewal of the existing DENDRAL grant {RR-00612). This proposal was approved for 5 years by the National Advisory Research Resources Council. Certain reservations were, however, communicated to us: they concerned aspecially what we must agree was an anbitious effort to close the sontrol loop for "Yintelliqent automation" whose costs overreached the immediate utility of the expected result. During subsequent discussions with the Biotechnology Resources Branch, taking into account the council review and a number of diverse policy issues, we ayreed administratively to seqment the two components of the yriginal proposal. The AIM-SUMEX portion of the original proposal (excluding DENDRAL) was recently funded for 5 years as a national resource for artificial intelligence in medicine. The present proposal for resource-related research in biomolecular characterization and artificial intelligence is an elaboration of thea DENPRAL portion incorporating intensive reexamination and revision of the previous proposal. Bith the dALfferentiation of priorities represented by AIMN-SUMEX, the Sanetizs Research Center (37C), and continuing work on artificial intelligence under Dr. Feigenbaum's leadership, the asreseit renewal application places more emphasis than heretofore an raal-world oriented applications. Correspondingly, we have aqceel that it is now more appropriate that Dr. Djerassi should be Aesignated as Principal Investigator in this phase of our work. Rs outline? in section B.2, the interests and responsibilities of Professors Djerassi (Chemistry), Feigenbaum (Computer Science) and Laderberg (Genetics) have been closely interdigitated. With their further rtonnections with many colleagues, these programs enjoy a high deqree of university-wide participation. For example, the Janatics Department is also closely affiliated with Biology, Biochemistry, Pediatrics, Psychiatry and Medicine through joint appointments or joint research projects or both. This breadth would be difficult to obtain except at a few institutions where the medical school is both academically ani geographically inteqrated with the university to the degree that characterizes the Stanford University environment. GLOSSARY OF ABBREVIATIONS ACME Advanced Computer for Medical Research (Nih-funded computer resource, 1968-1973) AT artificial intelligence AIM-S'MEX- A comprehensive computer resource intended to serve the national requirement for artificial intelligence in medicine. This will he implenented at the Stanford University facility called AIM-S'MEX AROA Advanced Research Projects Agency of the Department of Defense, BR Biotechnology Resources Branch T32MR carhon-12 magnetic resonance 3¢ gas chromatography or gas chromatograph 3R2r Genetics Research Center (Stanford, J. Lederberg, Principal Investigator; NIGMS-approved and awaiting funding. Grant #P01-GM 20832-01) HERES high resolution mass spectrometry TR infra-red IRL Tnstrumentation Research Laboratory (Stanford Genetics Department) Lees low cesolation mass Spectrometry ¥CD magnetic circular dichroisn MS Mass Spectrometry or mass spectrometer VASA National Aeronautics & Space Administration yep nuclear magnetic resonance NSF National Science Foundation IRD optical rotatory dispersion DULSACME a modified version of the PL-1 computer language (for the Stanford ACMF computer facility) SUMEX Stanford University Medical Experimantal Computer Resource (NIH funied computer resource, 1973-1978) Ty ultra-violet JE A. OBJECTIVES: Core Research. The funds now applied for would permit 1) the continued funding of the 4S laboratory as a biomolecular sharactterization resource; 2) advanc2ment of laboratory instrumentation capability in specific areas of GC-HRMS and the exploitation of metastable peak analysis. 3) the fucther development of AI computer techniques to match the instrunentation. This work will emphasize practical utilization for applications in biomolecular characterization connected with other on-qoing biomedical research programs. ft will include, for 2xanple, a) the analysis of mixtures by GT/MS; b) metastable peak analysis for difficult problems of pure compounds and of mixtures not rceadiily separable by GC; c) optimized data analysis for sharacterization of MS peaks ani d) heuristic analysis of spectra for the molecnlar ion composition. Juc projact is the only systematic effort, to our knowledge, currentiy underway in this country for computer assisted structure slicidation. Subsequent to our early publications, an intensive program has been mounted in Japan in similar areas. This situation may be contrasted with computer assisted organic synthesis, an area receiving considerable attention from several research groups. These capabilities can be beneficially provided to a wider community via the ATM-SUMEX resource. Research on the amulation of human intellect by computer programs will undoubtedly influance the efficiency with which chemical research can be applied to ever more complex problems of health, e.g., intermediary metabolism and its pathologies; environmental influenctes on health; the development and critical validation of new therapeutic agents. The athievament of these objectives depends on the continued naintenance and development of the DENDRAL AT programming system (sae balow). The advent of tha AIM-SUMEX facility will remove some of the serious computational limits on the exercise of this system that have delayed recent prograss. PFducation. Tn our university setting, pre-doctoral and post-doctoral aducation of course constitutes a part of our mission. As far as is practically possible, research participation in the DENDRAL program has been coupled with dissertation work by graduate students and post-doctoral research experience respectively. Examplas of people (and their research area) whose education has been enhansed in this way are the following: Scadnate Students: J. Simek, pedagogical aspects of the structure qenerator; Wai Lee Tan, synthesis of new estrogen compounds; H. PFqgert, 13°MR of amines and steroidal ketones; C. Van Antwerp, 13CMR of steroidal alcohols; c. Farrell, theory formation fron mass Spectral data: L. Masinter, development of the structure qJenerator: M. Stefik, AT applications to chemistry. 37 Postdoctoral Fellows: G. Dromey, theory formation from analytical Jata: &. Gritter, mass spectral fragmentation of hiologically active steroids; 8&8. Carhart, analysis of 13CMR spectra by DENDRAL-like programs; $. Hammerum, development of better Fragmentation rules for progesterones. Formal organization. Phis project has been a long-term commitment of Djerassi, Lederberg and Feigenbaum functioning in effect as so-investigators. We coordinate our activities with day-to-day tontacts ia the pursuit of convergent research objectives. In the light of the extension of our collaborativ2 activity during the last t#o y32ars, we are now organizing a formal advisory group to jaclude, in addition to ourselves, H. Cann, J. Barchas, and E£. Van Tamelen. This group will advise the principal investigators on the direction of the program with respect to allocating available facilities and seeking out and helping other collaborators. This Jesignuation simply recognizes the fact that many of our colleagues have alreaiy heen engaged in r2lavant collaborative research with 4S. A MS resource has recently been funded at the University of Talifornia/Berkeley, under the direction of Dr. A.L. Burlingame. Drs. Djerassi and Burlingame have recently engaged in some tollaborative research which was made more successful by the sharing of facilities ard expertise available at one institution but not at the other. We would hope to maintain and strengthen these contacts to avoid unnecessary duplication of effort. We plan to discuss with Dr. A.L. Burlingame the most appropriate procedures for coordinating the related activities of our respective programs at the University of California/Berkeley and hara. Phis may take the form of raciprocal membership in advisory rommittees. The “hardware resource" to which this application is pegged has n2aen identified as the MS facility. While these instruments alone represent an investment of over $300,090, funded previously by several agancies, they do not cepresent th2 most ilaportant resource. He would uses this designation instead for the working team led by the princioal and co-investigators. The skills embraced by this ycoup incliite, as mentioned, computer science, structural organic chemistry, molecular biology, instrumentation engineering anda wide cange of other disciplines. They are represented not only in the princinal professors but in a diversified and accomplished professional research staff {see Budget Justification). The program for which funds are now requested is the vital means by which the interests of this group can be sustained ina corrdinated effort that would be very costly both in funds and in tine if it had to be reconstructed from scratch. Without the finantial support now requested, this line of collaborative research will have to be abandoned, with it a unique style of interdisciplinary collaboration, and the MS facility will be terminated. Se mR, BATKGROUND AND RATIONALE 1. The Structure Flucidation Problen a) The General Problem. Analysis of molecular structure is a major activity in our program of resource related research. For the specific task of elucidating molecular structures, i.e., the topology of atom-to-atom connectivities, analysts utilize a mixture of information derived from chemical proceiures and spectroscopic techniques. Each item of information, if not redundant or uninterpretable, contributes to the solution of the problem. Chemists draw upon a tremendous body af specific knowledge about the task area (e.q., clinical shemistry, biochemistry), molecular structure, spectroscopic techniques, etc., in order to piece together this information and iafer the structure of molecules. These features, and the relative simplicity of the final concept of a structure, make the prohlem particularly well-suited for applications of the techniques of AI to assist research workers performing the task. b) Njerassi'ts Laboratory. Professor Djerassi has been concerned with structure elucidation problems since the beginning of his shamical research. His activities at Stanford have been concerned heavily with the application of particular spectroscopic tachniques to structural studies of bismedically important tonpounds. These techniques include optical rotatory dispersion (ORD) and, more recently, maqnetic circular dichroism (MCD) (both of them supported initially by the NIH). Since 1961 he and his yroup have also been concerned with MS because of the power of the technique, in terms of specificity and sensitivity, as an analytical tool for structure elucidation. Four books and approximately 250 articles on 4S have been puhlished by him and 4is colleayjues. The technique of MS does not suffice for all structure Jatarmination problems, but it is a very powerful tool in areas where there exists a body of knowledge about the MS behavior of related molecules. When sample size is limited MS may well be the only technique that can be utilized. The recent availability of hiqh resolution mass spectrometers has mad2 HPMS the technique of choicte for many applications because under ideal conditions the axact mass number uniquely specifies the the empirical formula of a molecule or fragment. On a parallel course, the technique of 37/™S, routinely available with low resolution mass spectrometers (GC/LEMS), has revolutionized investigations wherever complex mixtures ace encountered. All of the above considerations argue that an extension of MS at Stanford to provide routine GC/LRMS and SC/HRMS analyses would be the next logical step to assist researchers depending on this facility for solutions of their structure elucidation problems, 2. Historical Background a) Mass Spectrometry Lahoratory. Prior to the existing DENDRAL qrant, the groundwork was laid for computerization of the existing mass spectrometers, an Associated Flectrical Industries MS-9 high resolution mass spectrometer and an Atlas TH-4 low resolution mass spectrometar. This work, supported primarily by NASA via the 37 Trastrunmontation Research Laboratory (T&L) in the Department of Sanetics, cesulted in link-up to the then axisting ACME computer facility via a PDP-11 mini-computer which acted as a buffer between the spectrometers and ACME. Initial data acquisition and reduztion programs were written for the system and utilized ona Limited basis. The funding of the DENDRAL proposal, NIH grant RPR-612 (May 1,1971-present) in conjunction with additional resourztes provided by the IRL resulted in a najor improvement to thes2 capabilities. The fruits of these efforts are described uniter section I.B.3 {below}. b) Summary of Early DENDRAL Development. In 1964, Lederberg devised a notational algorithm for chemical structures (termed DENDRAL) that allowed questions of molecular structure to ba framed in precise graph-theoretic terms. (Refs. 1,3-5,12). He also showed how to use the DENDRAL algorithm to generate complete and irredundant lists of structural isomers. (Refs. 1,5). In 1965-66 Lederberg and Feigenbaum began 2axploring the idea of using the isomer generator in an artificial intelligence program - searching the space of possible structures for plausible solutions to a problem much as a chess-playing program searches the space of leqal moves for the best moves. (Refs. 7,12). This approach quaranteas that every possible solition to a problem is considered - aither inolicitly, as when whole classes of unstable structures ace rajected, or explicitly, as when complete molecules are tested for plausibility. In either case, an investigator easily jJetermines the criteria for rejection and acceptance and knows that no possibilities have heen forgotten. This approach also quarantees that structures appear in the list only once - that autonzorphic representations of the samo complex molecule have not b22n included. In both these respects the computer program has an advantage over manual approaches to structure elucidation. c) Tnitial collaboration with Djerassi. (Refs. 14,15,19, 20, 21,22,24). Lederberg and Feigenbaum realized that (a) only through application to real problems could the AI approach be materially advanced and critically evaluated, and (b) MS appeared to be a fruitful applications area. MS appeared to be an excellent problem area because of the close relationship between spectral Fraqmentation patterns and molecular structure for many classes of noleculas. Dijerassi'’s interest and expertise - and daily interactioa between members of his group and the AI group - led to a series of joint publications describing the approach and initial results of the programs. The success of these collaborative afforts led to the proposal to the NIH for initial funding to extend these efforts. 4) Efforts Under NIH Funiing for DENDRAL. (Refs. 25-41). The initial funding by NIH provided the opportunity to upgrade the instrumentation and computer programs. In particular we were able to mount a concerted project on both the analysis of mass spectra of bionedically important compounds and the mathematical aspects of molecular structure. Progress reports to the NIH describe this research in detail. The most recent annual report appears in Appendix B. A series of publications directed to audiences both in computer science and chemistry are listed in the bibliography. The following section (Section 3) summarizes the capabilities for 40 strusture alucidation which, in thamselves, constitute an important result of past work. a) Related Research. An important side effect of tha DENDRAL project is the extent to which additional research was inspired and carried out to fill qyaps in existing knowledge. This research, not supported by the QNENDAAL grant, has been beneficial to on-going DENDRAL work, and vice-versa. Publications which have arisen from this research are Listed in the bibliography (Refs. 58-70). A brief review of these publications should indicate the need for precise specification of the kaowleige elicited from chemists and used in computer programs. AS an example, consider the description and application yf an early algorithm for generation of cyclic structural isomers (21). This paper considered the problem of spectroscopic Jiffarentiation of isomers of [°6H100. Unsaturated ethers fall in one of the classes of isomeric compounds which must be considered, but the MS of unsaturated ethers had not been investigated Systematically. This work was subseguently carried out in Professor Djerassi's laboratory independently of DENDRAL support, but of benafit to DENDRAL (62). Other examples will be found in the Bibliography (Refs. 58-79). 3. Existing Capabilities #o have worked to develop distinctive capabilities for molecular structure elucidation, bringirg together a high quality HRMS 3ysten and AI programs applied to biomolecular characterization. The feasibility of our analytical approach has been demonstrated in saveral problem areas, basel upon the development both of a MS syst2n and a general set of computer programs for use in new areas. The princival capabilities are summarized below. These are now in yeiag and were developed primarily under NIH funding to this project, with additional support supplied by ARPA and NASA in specific areas. (These agencies have reduced funding levels for this work because overall cutbacks have forced NASA to cut out this area of research despite their interest and ARPA is chartered to provide funds for frontier computer science research but not for applications. Thus the NIH is the principal of support for Future development of anplications programs in the interdisciplinary area of artificial intelligence/heaith related chemical problems.) a. HRMS System and Coupled SC/LRMS System. We have coupled the NIH-supported Varian-MAT 711 High Resolution Yass Spectrometer with a Hewlett Packard Gas Chromatograph and Jeanoastrated its utility for GC/LRMS analysis of such difficult analytic problems as the free sterols (i.e., not derivatized) isolated from marine and other sources. Advanced data reduction techniques for this instrument were written for use with the ACME conputer system (360/50) and row exist in Stanford's new 370/158 which tontinues to support the PL/YACME language. SC/HRMS scans on extracts from urine and amniotic fluid demonstrated this system's ztapability to provide high quality mass measurements on complex nixtures obtained from biological sources. An example of one SC/HRMS run on the amino acid fraction of amniotic fluid is presented below (Sec. III.D). 4/ b. DENDRAL Structure Senerator (Refs. 1-6,14,31,37, 38, 40,41) Tho DENDRAL Structure Generator progran accomplishes exhaustive and irredundant generation of isomers, with and without rings. This proqram quarantees consideration of every candidate structure - either implicitly, as when whol® classes of structures are forbidjten, or explicitly, as when individual compounds in a class are specified. It corresponds to the "legal move generator" of convouterized chess playing and other heuristic programs. c. DRENDRAL Planner (Refs. 25, 28,33) qe have written a very general set of computer programs for Aaterminingy structural features from analytical data in well-defined areas. Such general planning programs have been written fot low and high resolution MS, interpreted proton NMR spectroscopy and 13CMR data. J. INTSUM (Refs. 26,29,34,35) INTSIM is a computer program that aids in finding interpretive rules for “4S. The program interprets a large collection of MS Jata actoriing to criteria specified by the investigator. Then it summarizes the data to show which of the possible interpretations sean most plausible. @. PULEGEN {Refs. 26,35) ROLEGEN is the current rule generation program that suggests various rules of interpretation for the MS data summarized by INTSIM. Although not finished, the program can provide useful assistance in practical theory formation. f. Ancillary Techniques 1. The MS facility provides other types of experiments in MS, including ultra-high resolution measurements (masses determined via peak matching), defocussed metastable ion determinations (Bacbar-Flliott technique) and low ionizing voltage experiments. These data are utilized by both scientists and programs where appropriate. 2. Additional computer prograns provide added problenm- solving assistance. a. Predictor program for predicting major features of mass spectra. b. Programs for drawing and displaying chemical structures. c. Subroutines developed in conjunction with or existing as parts of the Structure Ganerator for problems of partitioning, construction »f vertex-graphs, and constructive graph labelling. These can he applied? to answer certain questions of isomerism which do not reyuire the complete generator. For example, the labelling algorithm can list all structures resulting from substituting sites of a carbocyclic skeleton with stated numbers of different Functional groups. g. Other Spectroscopic Techniques Available to us are the facilities of Professor Djerassi's laboratory for work requiring additional spectroscopic data. Also available on a fee for service basis are extensive spectroscopic facilities (NMR, I.8., and U.V.) of the chemistry department. These woulld be utilized for collecting additional data on particular structure problems and gathering data on known tompounds {particularly in the area of 13CMR) as the AI programs beacoma knowledgable about other spectroscopic information. Fr h. Chemical Facilities The staff and facilities of the chemistry department represent substantial synthesis capabilities and general chemical know-how. This resource can be called upon to provid? assistance in synthesis of model or labelled compounds, derivatization of mixtures, and so forth. For example, a graduate student in shemistry is presently engaged in thesis research dealing with the laboratory synthesis of a new astrogen metabolite strongly suspected to be a component of certain preqnancy urines. The previously proposed structure of this tompound was one of the sandidate structures inferred by the planner in a study of astrogen mixtures (11-dehydroestradiol-17-alpha, ref. 33). 4. User Community ®conogmic utilization of existing and proposed facilities can be realized by sharing them with a community of users. Lacking supplementary funds that would be needed for a comprehensive, naior service facility, this community will include the following yr>duos, but will be informally available to others. A. Stanford Community i) Stanford Chemistry Department (except for Hodgson, all are heavily supportei by the NIH in their research efforts) Latters of interest are attached to the proposal in Appendix A. Prof. C. Djerassi - Steroids, marine sterols Prof. W. Johnson - steroids Prof. E. Van Tamelen - steroids, triterpenoids, other natural products Prof. H. Mosher - natural products {(e.g., marine toxins) Prof. K. Yodqson - biological ligands, ligand-metal complexes Prof. J. Collman ~ cytochrome P450 models ii) Stanford Medical School Collaborators The following research projects in the Stanford Biomedical Community will furnish samples for mass spactrometric analysis under the present proposal. Attached to this proposal (Appendix A) are copies of the letters of interest in the proposed facility received from the principal investigators of these qrants. Pe. James RF. Trudell, Department of Anesthesia, Stanford University School of Medicine. Drug metabolite identification in humans. Dr. Irene S. Forrest, Biomedical Research Laboratory, Veterans Administration Hospital, Palo Alto. Drug metabolite identification in humans. Dr. I. Rabinowitz and D.I. Wilkinson, Department of Dermatology, Stanford University School of , Medicine. Prostaglandins. Prof. Fugene D. Robin, Department of Respiratory Medicine, Stanford UWriversity School of Medicine, Ratio of NADt/NADH in cells by measnring ratio of oxidized to reduced redox pairs. Dr. Leo E. Hollister, Veterans Administration 43 Hospital/sNepartment of Medicine, Stanford University School of Medicines. Metabolism of Marihuana. Dr. Hiram 4. Sera, Pharmacy Devartment, Stanford University Hospital. Drug Identification. Dr. Sumner M. Kalman, Department of Pharmacology, Stanford University School of Medicine. Drug and drug metabolite identification. Dr. Jack Barchas, Department of Psychiatry, Stanford University School of Medicine. Neurotransmitters and. related compounds in man. De. Keith A. Kyenvolden, chemical Fvolution Branch, NASA Ames Research Center, Mountain View, Calif. Amino acids, acids in geochemical samples, structure of products formed from electrical discharges in gas mixtures. Dr. William PR. Fair, Department of Urology, Stanford University School of Medicine. Identification of the prostatic antibacterial factor; polyamines (putrescine, spernine, spermidine) in body fluids of patients with prostatic carcinoma. Besiias the user projects just summarized, other major prospects are in sight. At the time of writing, the chair of pharmacology is vacant. Conversations with the leading candidate have indicated a deep-seated interest in GC/HRMS as the principal analytical tool for broad ranging studies of drug metabolism in nai. 8. Extramural Users The davelopment of the techniques of ORD, MS and MCD at Stanford has beer paralleled with extensive sharing of these resources nation- ani world-wide in collaborative research efforts, without any additional funding. Rather than provide routine service, axperience has shown that discretionary selection of problems results in better utilization of our peopl? and instrumentation cesonrces. We would extend this provision of services including available computer programs, to a limited number of extramural users. Note, for example, our successful collaboration with Profassor Adlercreutz, Meilahti Hospital, University of Helsinki, ar tha identification of estrogens fron body fluids utilizing the AT planning program {ref. 33). 44 c Relationship to AIM-SUMEY and the Genetics fesearch Center we Tha present application is strengthenel by two research projects related to, but not overlapping, the proposed research of this grant. 1) AIM-SUWEX (NIH BR-00785, Oct. 1, 1973, thru July 31, 1978, Principal Investigator, J. Lederberg). This is a resource grant. to establish a national facility for applications of artificial intelligence in medicine (AIM). Our own use of this facility will inctluia SUMEXY PDP-10 computer time and file storage necessary to run the DENDRAL artificial intelligence programs. This support will be furnished without charge to the present proposal. It tepresents an annual investment of about $190,000 in computer time 2yuivalent value. The ATM-SUMBEX computing facility is shared equally between a national user community {AIM) and a Stanford Medical School tommunity. The DENDRAL research will be supported out of the Stanford portion. The AIM service will be administered under the 9olicy control of a national advisory committee and will be imolemanted over a national computer natwork. AIM-SUMEX provides the means for members of the national user community interested in structure elucidation to access the DENDRAL programs. 2) Genetics Research Center (NIH PO1-SM 20832-01 - approved by the NISMS Touncil, awaiting funding, Principal Investigator, J. Lederberg). This research proposal is a comprehensive grant which would snpport interdepartmental research at the Stanford Medical 3chool in Yedical Genetics, Pediatrics and other clinical apolications. A section of that proposal concerns the use of SC/LRMS for screening body fluids for evidence of inborn errors of metabolism. (This project grew out of the initial DENDRAL grant, one of the research qoals of which was the analysis of body fluids using SC/MS). This research on inborn metabolic errors will be zsonducted jointly in the Stanford Departments of Genetics and Pediatrics using existing equipment {Finnigan 1015 Quadrupole mass spectrometar, Varian Aerograph GC and a PDP-11/20 based data system). Wo appreciated the value of GC/HRMS analyses of selected extracts af body flaids (i.e., those containing metabolites not identified yv routine GC/LRMS data) when formulating the Genetics Research Teater proposal. Accordingly, a small amount of funding was there caqguestal For recording selected GI/HRMS data on the GC/Varian MAT 711 mass spectrometer in the Dapartment of Chemistry. If these funds are awarded, we will negotiate with NIH a suitable alimination of this minor overlap with the present budget. YS ty. SPScrFTte AIMS Th2 specific aims enumerated in this section will be pursued in tha highly inter-disciplinary manner that has characterized the DENDRAL project from the start of its NIH support. The aims are not disjoiat,but interactive and inter-dependent. For example, the power of MS and, potentially, other spectroscopic techniques, tan be anhanced by the use of computer programs to perform various asnects of structure elucidation and theory formation. From the starnipoint of computer science, one measure of the utility of techniques of artificial intelligence is how well they perform in real-world applications. It is necessary in the development of these programs to have a source of data and informed, involved tean-mates able to criticize m2thods and results. The aims are alabosrated in the methods section. We have attempted to keep the proposal to a readable length. Therefore, some detail has been omitted. However, many details can ba found in the biliography ani we are prepared to provide additional information Juring the site visit. 1. Enhance the power of the MS resource. The axisting MS resource, together with computer programs which axist oar which are proposed (see Aim 2, below), is capable of solving sone of the structure @lucidation problems of the user community given computer support for data collection and reduction. We refer specifically to the areas of GC/LRMS and roatine, batch HRMS samples, We believe that many of the problems of the user community require nore powerful technigues (see Section IIlt). These techniques, specifically GC/HREMS and Seni-antomatic metastable defocussing, can be provided with a minimum of cost and effort, thus enhancing considerably the capzhbilities of the resource. Jur first aim is to provide the resource with adequate computer support (replacing the previous ACME system) to enable collection and reduction of mass spectral data including low and high resolution scans and data or defocussed metastable ions. #2 oropose to develop this computer support in the ways described agelow. {these aims are written to include the work necessary to imolement the extended PDP-11/20 computer system. A description of the rationale for this choizte is provided in Section III.A and the specific augmentations in the Budget Justification). A) Convert existing, proven data acquisition and reduction nrograns from the PL/ACME larguage into Portran, consistent with tine-critical assembly language programs for data acquisition and instrument control. These programs will be written in Fortran to enhance compatibility with the computer systems of other users of such packages. B) Modify these programs, 3S requirei, to handle acquisition and reduztion of frequent or repetitive HRMS scans with selected instrument performance feedback to the operator, and to take 2aivantage of the expanded capabilities of the extended 11/20 systam. Prototype GC/HRMS systems have heen developed at Stanford and 2lsawhare, but this type of facility (in contrast to GC/LREMS) YE now available to the Stanford community. When this system 2lopad, service will be available to the Stanford community searzth collaborators and, if our resources permit, to any tist requesting assistance. In many instances this type of llaboration will require far mora involvement of convergent interests, efforts and skills than merely running samples on request. We have in mind the chemical and eventually biological interpretation of the analytical data as a matter of joint concern, as appropriate. 2 WV mo yo ahs bee ¥ mu a oO Ss I ie 7] OQ 4 we have praviously illustrated the advantages of high resolution Rass spectral data in the computer analysis of mass spectra {e.g., ref. 28). Also, we have previously shown that the same program cai deal with analyses of mixtures without prior separation especially when additional data (e.9., from selected metastable Jlefocissiny experiments) were provided (Ref. 33). We wish to use the MS rassurce and the comput2r program in further studies of nixtures of compounds which are difficult or impossible to separate by GC. The advent of routine systens for high pressure Liquid chromatography have made many of these separations possible, but the liquid chromatograph is not presently interfaced to the MS. Many of the problems of the user community require analysis of complex mixtures which are amenable to treatment by GC/MS techniques. We feel that where sample quantities permit, acquisition of GC/HRMS data is highly desirable. These data can be providai by the resource supplemented with computer support {above). We propose to continue tests of the GC/MS combination, operating ander moderately high mass resolutions (5900-19000), to define in Jetail the optimum operating conditions of the GC/HRMS combination. This will provide the necessary information on maximum practical sensitivity to be expected. This information tan then he used in collahoration with the user community for sampLo prenaration. The 37/HRMS system would normally be operated at reduced mass spactrometer resolutions to maximize sensitivity. We have existing multiplet resolution programs to increase the resolving power of the MS. We propose to provide the nultiplet resolution program with heuristic quidance based on compositional variations inferred from molecular ions or other singlet peaks. For example, a cesolvyiny power of 10,000 is harely sufficient to resolve ions which differ by CH2 vs. N (delta m = 0.012) for ions of about mass 193. Althsugh it will resolve CH4# vs. 0 doublets (delta m = 9.937) at this mass, it will not resolve closer doublets such as T3N vs. H293 (delta m = 9.003). We can provide exhaustive tabulations of multiplets hy mass separations (based on ref. 30) which can be used by the muitiplet resolution program. We have praviously indicated the power of metastable ion information in the operation of our prograns for structure alicidation {refs. 28, 33). Ye have extended one of our programs (the MS predictor program) to propose metastable defocussing experiments in order to avoid tolleaction of unnecessary data (see Ain 2, below). Although we can collect these data (Barber-Elliott techniyue) manually on our existing Varian MAT-711 MS, this is an 47 axcesedingqly wasteful operation, both in terms of sample sonsumption and time. We propose to implement some automation of collectian of these data on metastable ions. We also propose to bazyin preliminary investigation of alternative modes of metastable ion datermination {see Methods (Sec. ITI), helow). 2. Develop performance and theory formation programs to assist in the solution of structure elucidation problems in biomedicine. Tonputer programs have already been written for analysis of low and high resolution mass spectra, for generation of acyclic and cyclic molecular structures, for labelling structural skeletons with atoms, for analyzing 13CMR spectra of amines and for interpretation and summary of large volumes of data gathered on molel tompounds {see Existing Capabilities above, for references). We wish to increase the utility of these programs by providing interactive facilities that allow easier access to them, by increasing their generality ani power, and hy supplementing them with new raasoning programs. Performance Programs: The current structure generator program will be subjected to further datailed tests before using it for structure determination problens. A naw algorithm for generating cyclic skeletons (with 190 multiple bonds) will be projramned and checked. The algorithm is written and informally proved. A formal proof will be devised as wall. This algorithm represents one very powerful approach to the problen of implementation of constraints, as discussed in the following paragraph. The generating programs will be modified to allow isomer generation within constraints. Different kinds of constraints can ba inferred from different kinds of spactroscopic data. We intend to give tha program knowledge of a variety of these. The Planner programs that infer constraints from mass spectrometry data will he broadened to include additional knowledge about the spectral hehavior of classes of compounds of ralevance to the NIH-sponsored research of the user community. In atdition, w2 will add the capability for utilization of information ahont chemical isolation procedures (e.g., one expects acidic and neutral compounds in solvent extraction of acidified body fluids) and relative GC retention times (e.g., to admit the possibility of homologous series). We propose to implement a more general method for inferring the idantity of the molecular ion whether or not this appears explicitly in the spectrum. This information is important for the successful operation of the structure generator and the planiver. We want the program to use whatever information is available and not depend, as it currently does, on having knowledge of the structural class together with inference rules foc that class. Tnterface routines will be written to make it easier for other scientists to use these programs. We have to wait for an 48 interactive system hefore starting this: AIM-SUMEX will he ideal. Inpit/foutput routines will be crucial to easy use of the system. Haywavec, we also want to give users the facility to understand the system's reasoning steps so they can take advantage of it. Tn addition to making the computer programs available through ATM-SUMEX, we would like to translate parts of the LISP code into another language - for reasens of both efficiency and axportability. We have talked with computer professionals at IBM Research Canter about using tha APL language. FORTRAN, ALGOL and PL/1 are other languages whose merits for our purposes we will explore. Wea wish to continue a low-level of effort on computer programs that interpret other kinds of spectroscopic data. Planning programs Similar to the “4S Planner could be written for automatic analysis of data fron other spectroscopic tezhnigquas{e.q., IP, UV), as we have illustrated for 13CMR (ref. 39), Tha structure generator's view of chemical struct ire is topological and is presently unconstrained hy bond lengths and anjl2s. Because stereochemical considerations are frequently important in structure elucidation, we propose to begin consideration of stereochemistry in the structure generation and evaluation processes. A proyram with detailed knowledge about information abhtainable from various spectroscopic techniques could be written to exanine a list of candidate solutions and propose experiments necessary and sufficient to distinquish among them. The program would represert an extended Predictor (e.g., ref. 27). We have a ficst version of a program that suggests "crucial" metastable peaks to he sought in order to distinguish among candidate Structures. Work on this proqram will continue at a low level of activity, possibly expanding into areas other than MS. One topic w2 will continue to pursue is our collaborative effort with Dr. Silda Loaw, Genetics Department, on th2 potential application of molecular orbital theory to pr2adiction of mass spectra (ref. 71). Theory Formation Proqrams: The rile formation progran ({RULEGFEN) will be extended so that i+ can search a larger space of rules. Present a priori zoristraints on the rule generation give us a search reduction from teas of millions to a thousand possible rules. Even though search hairistics now allow efficient search o9f these possibilities, we want to be able to deal with much larger spaces efficiently, as whoa the number of primitive predicates is drastically increased. The RULFGEN progran will be modified so that complex Fragmentation and rearrangement processes are manipulated nearly as easily as simple fragmentations. The program currently finds frajmentation rules involving one or two bonds, possibly followed by hydrogea migration. In the case of cyclic systems such as astroaqens, however, the program must be able to work with sets of threa or more bonds in some cleavages. Interactive programs will he provided on AIM-SUMEX for the investigator to guery the rule generation program. For example, 49 many questions now arise about the program steps by which the program infers the rules it suggests as explanations of the regularities. Why, for exanple, was some particular rule not tonsideread plausible? New data will have to be selected in order to test the rules and to differentiate among competing rules. Wa will write a program that suygests new experiments (i.e., new data to obtain), depending on the nature of the existing rules. The t2st phase of the theory formation program will be written as an evaluation function of each rule against new data. Tasofar as any new experiments are "crucial" experiments, the avaluation function may merely reject a proposed rule. Mostly, yoweaver, riles will have to be evaluated against new data along many dimensions: frequency, strength of evidence, uniqueness, simplicity, and the like. ge wish to experiment with the whole theory formation program to determine the critical aspects of our design. For example, {1) how sensitive is the program t23 discrepancies, inconsistencies and errors in the data? (2) how well can the program find rules within a slightly different moilel of chemistry? (3) how well can the program perform with one pass through the data, or several passes? and (4) how critical are the principles of theory Formation? 3. Apply the structure elucidation techniques - both nstcumentation and computer programs - to biomedically relevant in compounds. Jur own interests are in elucilating the structures of, and anierstanding the MS of, marine sterols, hormonal steroids, and compounds isolated from human body fluids that can be associated with aqanatic disorders (from research in the GRC). In addition, we will be working closely with members of the Stanford Medical School and Chemistry Department - in particular those mentioned ahove {Section I.B.4) - on their structure elucidation problems in which MS will be used. Although most users expect to require HRMS and SC/HRMS data, some of their problems will be attacked ntilizing SC/LRMS techniques and library search through (usually) restrict2d libraries of mass spectral data. We propose to investiaate soma extensions to the technique of library search {s2e Methois) to complement our existing and planned DENDRAL preqrams. We plan to continue our exchange of mass spectral data ana library search information as we have previously done with Dr. S$. Yarkey (University of Colorado Madical School) and Dr. F. W. McLafferty (Cornell University). Rs in the past, attention to new biomedical research problems will lead to increased capabilities in the computer programs. We require close communication with the paople engaged in the research so that the programs actually assist the researcher while increasing in power. Collaborative proposals have come out of suzh past DENDRAL sponsored work, for example, large portions of the 3RC proposal and a proposal for 13CMR research. W2 anvision the interaction and collaboration with the user Sommunity to involve the following: 56 a) In all cases, we plar close cooperation with the users in all aspects of the problem. Although the hasic isolation procedures are the problem of each investigator, his knowledge of the available facilities and their limitations can be an important aiioto sample preparation and analysis of the results. This is particularly true for collaborators who are unfamiliar with the techniques of HRMS {[e.g., sample size and resolving power 1ecessary tt» separate the mass doublets that can be realistically axpetted in different contexts). b) The needs of the user comnunity will be varied. Drs. Duffield and Smith will, in collaboration with the users, determine the kinds of MS experiments which will be most useful, considering sample complexity, stability, quantity, and so forth. We wish to utilize fully the existing ressurce and our proposed extensions, hringing to bear on a problem any techrique which is appropriate and can b2 provided. This will include the full scope of available experimental techniques in MS (LRMS, HRMS, GC/LRMS, SC/HRMS, matastable defocussing, ultra-high resolution mass neasirements) and available computer programs (see below). c) Many problems will be amenable to treatment by computer programs which exist or which will be leveloped, for example, structural isomer problems or HRMS interpretation on compounds in 3 well-understood class. We will take the responsibility for utilizing these programs where appropriate to assist in structure alacidation problems. We will instruct members of the community in use of the programs when programs are used routinely hy collaborators. 57 TIT. METHODS “oleztular structure elucidatio1 entails the intelligent and patient application of a large body of krowledge to each specific problen. The importance and relative difficulty of the problem impel us to seek the powerful assistance of computer programs to help chemists in their analyses. It is unlikely that such prograns will ever replace chemists, especially because computer programs are readily written only to focus on rather narrow aspects of problems. Nevertheless, our past research is reasonably forwarded as a demonstration of the computer's ability to assist in practical biomoletular characterization although this w2s a spinoff from theoretically oriented research. In order +) meet the major objectives of this proposal we will foztus our attention primarily on structure elucidation of biomedistally important compounds through MS and AT. However, many of the computer programs can already use information from other analytical techniques. Sno we want to be able to think of structure 2lucidation in the context of an ensemble of analytical capabilities. A. Enhancing the Power of the Mass Spactrometry Resource Wo have developed a siqnificant resource consisting of instrunentation (the Varian MAT-711 and ancillary equipment) and souputear programs for instrument 2valuation, data acquisition and reduction. PRoutine reduction of high resolution mass spectra to elamental compositions and ion abundances without human intervention provides the capability for efficient handling of lacge volumes of high resolution mass spectra (such as will result From GO/HRMS runs). The development of the GC and of the GC/MS sombinatior is in the excellent hands of Ms. Annemarie Wegmann, who is responsible for operation of the complete system. We now have more than two years of operational experience with the MS, the $C and related equipment under a wide variety of experimental tonditions. Yona of the resource-related research discussed in this proposal sar bo carried out without significant quantities of mass spectral Jata. The existence and extensions of the MS resource, the davelopment of computer techniques and the applications to biomedical problems demand an efficient mechanism for acquisition and reduction of MS data, and eventual transmission of the data to the SUMPY cesource. Thus, operation of the MS requires substantial computer support to deal with the large volumes of Jata proliuced by the system at high data rates. We feel that a properly configured system of hardware and software should provide, at a minimum, the following capabilities: 1) Detailed evaluation of the condition and performance of the “S prior +> recording data on valuable samples, with feedback to the operator. L sonditioning, peak detection and peak analysis. 2) A ztoordinated system of hardware and software for signal Tur 3) Pata reduction techniques based on a compntei (not theoretical) noiel of the MS, including peak shapes, mass/time function, and resolving power as a function of mass. 4) Poak profile analysis for nultiplet detection and resolution. 5) Conpuiter control of scan rates, clock rates and optimun analog and digital filtering parameters. 3) Some on-line feedback, to the operator, assessing the performance of the svstem 4uring an experiment. 7) The system must deal with frequent or repetitive HRMS 2ans, ceagairing the capability for rapid storage and analysis of tard? volumes of data. Pravious support of our research by the NIH and NASA has given us a fiecn foundation of programs and experience. We have, up until the termination of the ACME computer facility (July 31, 1973) , Jemonstrated capabilities 1-5 above. We were precluded from picsuing capabilities 6 and 7 due to the configuration of the ACME facility. The demise of the ACME computer facility and the subsequent inzorporation of the PL/ACME language into a new IBM 370/158 Facility under Stanford auspices has forced a reevaluation of the means for pvroviiing HRMS laboratory computing support. We had oreviously depended exclusively on ACME for data reduction prozessing. The ACMF transition poses both technical and fiscal Jactisions in that the real-time support capabilities of the new facility will he different from ACME's and the fee for service basis of the facility requires an explicit budget allocation for its use. Previously we had received ACME computing support without charge as part of the core research effort. Since we were thus required to revise our computing plans, we have explored a number of options for near-term as well as longer term solutions. As outlined in the attached annual report, we have chosen an interim approach (through the end of the carrent grant, 4/30/74) which mininizes near-term costs, including hardware and software zo1iversions as well as operating expenses. This approach entails connecting the MAT-711 spectrometar to the 3707158 computer through an IBM 1300 interface. It allows use of the existing 2L/AZME programs but will have real-time response limitations at least as severe as ACME had {which is inadequate for either SC/LPMS or SC/HRMS). Our existing computing budget provides for only a very low level of instrument utilization in this mode. For 2a lonq2r term solution these constraints are unacceptable. surrent estimates are that continued use of the inadequate 1390-370/158 connection and PL/ACME intaractive programs under full instrument productivity would cost up to £4,100 per month. Threa alternatives have heen investigated for improving technical necfarmance and reducing cost. This review has resulted in our surrent proposal to augment the existing mini-computer system (PPP-11720) with local storage and arithmetic capabilities. This stand-alon2 system wonld not support real-time, on-line data reduction but would allow routine data acquisition and instrument performanca evaluation, followed by off-line data reduction. IT awe wee ewan een ewe ee eee ee eee wee eK BD EP eH OTe we ee eee Alternatives considered include: 1) Modifisad 370/158 Connection Ho discussed with personnel in the Stanford Center for Information Processing (SCIP) various approaches for improving 379/158 Service. Detailed planning is still under wav within SCIP in regari t> ceal-time support ani future pricing policies. Thus the following ztonclusions are tentative. It appears that the long-tern cost woul? be prohibitive to continue real-time data acquisition by the 370/158. Insteai, 2 store-and-forward system was proposed. This would entail an augmentation of the existing PoP-11/29 front-end mini-computer with memory, disk, tape, anda new interface to the 370/158, totaling about £28,000. This approach is workable, if limited near real-time instrument performance evaluations could be made to assur? satisfactory instrument setup and data acquisition. It was recommended that the existing software be converted fron PL/ACMF to a more 2fficient Language {such as FORTRAN) to reduce operating costs. This would require approximately 4-6 man-months of effort. The resulting decrease in operatiny costs could not be estimated in tine for this proposal because the new SCIP pricing policies are not formulated and inadequate 370/158 system analysis tools are ypacational to avaluate our benchmarks in terms of detailed cesourcte consumption. We have therefore budgeted an approach based on the remaining two options with the understanding that we will reconsider the SCTP option before proceeding with an imolenentation should this proposal be funded. 2) SUMEX fhe recently approved AIM-SUMEX PDP-10 facility will provide gecessary computing support for the development and use of DENDRAL AI programss. The ®S laboratory produces data which these programs analyze and thus has a close relationship to the AI research. The SUMEX computer could help in the off-line reduction >f instrumant data, particnlarly during the early stages of the proqact when the machine load will be relatively light (20-25%). The present programs would require conversion from PL/ACME as in aption (1), which would take 4-6 man-months. Such computing may use From 15-30 minutes of CPU time per day, depending on the amount of SC/MS work. While this approach saves operating somputing costs, the front-end PDP-11/20 would require zugmantation as in option (1) ($28,000) to allow store-and-forward aperation with subsequent off-line data reduction on SUMEX. This is needed because SUMFX is not configured to allow real-time acquisition of the volume of data anticipated. This approach, while the least costly, would entail a measurable use of the PDOP-10 resources which we feel are better reserved for the {itanded AIM-SUMEX applications. In addition, because of the priorities anticipated for allocation of SUMEX to AI research, particularly as loading increases, scheduling may he required which will constrain the MS laboratory operation. For these reasons, wa feel a better, even though slightly more expensive, approach is a stand-alone PDP-11/29 data reduction systen. 3) Stand-Alone PNP-11/20 SY The auamentations of the existing front-end PPP-11/20 required for store-and-forward operation in conjunction with the 370/158 or with SUMEY¥, come close to meeting the needs of a stand-alone data system. In addition to the menory, disk and tape, an augmented arithmetic capahility is needed to allow rapid floating point talculations. A special device for this purpose costs about £7,500. The SUMEX interface can be less sophisticated in this case, however, accounting for the much lower data volume after reduction, so that the total cost of the stand-alone system would be $34,900, As with the other options, conversion of the present programs wonld be required. This approach, while slightly nore exvensive, has the advantages af off-loaiing al] data logging and reduction functions from SUMEX aad affords an adequate capacity for non-real-time, stand-alone Jata reduction on the PDP-11/29. It furthermore allows more freajon and responsiveness ir the operation of the MS laboratory since aata collection or reduction can be scheduled without worrying about the impact on AIM-SUMEX users. We therefore aropose and have budgeted an augmentation of our existing mini-computer system as a stand-alone data reduction facility. ee ee a ne ee ee ee re ee ee ee ee ee ee ee ee ee ee ee ee ee [The biomedical community (see ser Community, Sec. I.B.4 above) Jasircing access to our facilities for structure elucidation have a variety of problems, some of which can be solved hy existing instrumentation and computer techniques, as noted above. However, many problems consist of complex mixtures of compounds where analysis by conventional GC/LRMS does not lead to unambiguous solutions, and separation cof components on a preparative scale for other spectroscopic analysis is difficult (9.g., see marine sterdls, saction D, below). These problems are amenable to attack by a system comprised of a GC/HR4S combination, the GC providing s9paration, coupled with the MS operating at high resolution to provide alamental compositions. Thas, upgrading of our current system so that SC/HPMS data can be provided on a routine basis is a desirabl2, and we believe necessary, st2p to solve mary of these pcobloms. Ha propose to continue the development of the SC/HRMS system while maintaining existing capabilities of routine HRMS analysis and 3C/M3 where this efficiently responds to local needs. Many nembers of the user community will require in addition to GC/HRMS, HPMS analysis of relatively pure compounds or mixtures of small nambers of compounds. we will provide this capability on an interim hasis, using Stanford's TBY 370/158 system while the PDP 11/29 system is being upgraded. We wera able, using the ACMF computer facility, to start avaluating the operation of a GC/4S system at high mass rasolutions. These experiments were hampered somewhat by the Llinitations of the computer system used to acquire the data (only aecasional, single scans were possible); they were necessarily tiscontinued {as well as all HP#S operation!) upon the termination of ACME. We do have, however, some benchmark figures for the nerformance of the proposed system. Mixtures of fatty acid esters (@.9., methyl palmitate and methyl stearate) gave good quality mass maasurements (+-10 ppm) over a dynamic range of 100:1 for sample sizes of the order of 0.5-1.0 micrograms/component during 190 sac/dAecade in mass scans {resolving powers 5,000-8,900). We are haltingly continuing our evaluation of the SC/HRMS systen aven without a data system, making measurements on individual ions sf the mass standard and known materials in the GC effluent. Thase data can be approximately translated into expectations during dynamic scanning. We have performed an extensive series of m-asurements on both methyl stearate and cholesterol (not derivatized), the latter compound being more representative of our ~yrreait research problems. These measurements tend to confirm the graliminaty data described above. Firmer data will be available subsequent to the submission of this proposal. We prorose to operate our existing GC/MS system under high resolution conditions aiming toward optimization of resolving powers, scan rates and GC and molecular separator operating conditions to datermine the maximum usable sansitivity of the SySstan. #o recognize that the ultimate sensitivity will not approach that attainable by photographic methods of recording; we feel that the ability for on-line operation and evaluation of the operating conditions of the MS partially offsets the sensitivity jisadvantajes. We realize that some structure elucidation problems will not be amenable to study because of the sensitivity limitations: we feel, however, that many problems of interest to the User Community can be studied effectively with this performance capability. Rather than propose a research program to inzrease the sensitivity of high resolution mass spectrometers (e.g., MeLafferty, at.al., Anal. Chem., 44, 2282 (1972), dynamic rascanning of peaks; Jet Propulsion Laboratory - chemical multiplier emission/detector arrays, private communication to T. Rindfleisch), we propose to identify our limitations and, with our aollaborators, use discretion in selecting and preparing samples. Further acrtalerations of technical capability to meet the state of +ha art in sensitivity will require investments in hardware that san he better justified at a later stage of a successful facility progran . Meanwhile, other laboratories can be ©xpected to make significant contributions to this important problem. Practical rayacd for budget limitations is the main reason we do not press this issue ourselves at the present time. Significant improvements in sensitivity (with only small decreases in mass measurement accuracy) can be achieved by operating the MS at reduced resolving powers coupled with intelligent analysis of thea resulting data to detect and resolve the potentially greater aunber of xverlapping peak envelopes. This proposal is not antirely new (e.g., see Smith, et.al., Anal. Chem., 43, 1796 (1971); Burlingame, et. al., in "Computers in Analytical Chemistry, C.R. Orr and J.A. Norris, Fd., Progress in Analytical thomistry, Vol. 4, Plenum Press, New York, N.Y., 1970, Chap. IIT). We can, however, significantly extend these earlier techniques by atilizatior of our multiplet rasolution algorithm. This algorithm 2mbolied in a computer program, has been shown to increase the effective resolving power of the ™S up to a factor of three. It hases its »0eration on a dynamic model of peak shape computed Yirectly from the data. For computational efficiency and to avoid 56 spicious information, this algorithm would be best implemented as a post-processor, basing its search for multiplets on the results of prior elemental composition determination. The ability to detect and analyze for unresolved peaks is mediated by consideration of the mass measurement accuracy of an MS systen. These systems are capable of determining peak positions {and thus nasses) to a small fraction of ths peak width. The high accuracy af such neasurements (#- 2-10 ppm) can, in fact, be utilized to letect and "resolve" multiplets in instances where the unresolved spazies ar2 known precisely (see Burlingame, et al., ref. above, For C4 vs. 13¢ doublet detection and resolution). For instantes where the heteroatom content of a molecule is known yr where the possibilities are reduced severely by chemical, Spectroscopic and mass measurement heuristics, there may be a ranqe of possible overlapping ions resulting from fragmentation of the moleculs. These potential overlaps may be computed and then used (in combination with the known resolving power and mass yeasuremant accuracy of the MS and the measured mass of the peak, assuming it was comprised of only one type of ion) to direct the nultiplet resolution program. As an example, we have computel the possible mass doublets for various ranges of compositions (Lederberg, et al., to be a published). A sample table for Cc, N, 3 =<4 is appended {Table 1). Inly 28 of the 364 possibilities are shown, namely those whose nass difference {e}) <.95 mass units. N€ these 28, 13 show e>.03 and would he fairly easy to resolve, cretuiring 175009 resolution at MW=150. At the othar extreme, 5 doublets show e<.01 {CNG vs. H4YO4; C2H20 vse N3: T2N2 vs H4¥03: C3N vs #293; and T4 vs H2NO2) which would demand special treatment for resolution. The 10 doublets for which .01 =< e =< .03 pose the interesting thallenges for tradeoff of resolution vs. sensitivity in the context of given problems. For example, if N is absent, the only ambiguities are C3 vs. H402 {e = -.02) and C4 vs 03 {(e = .015). “uch as we would wish always to have unambiguous empirical formulas for all ions, HRMS remains a valuable tool despite these limitations. As shown by these examples, even moderate resolution reduces the number of candidates to a manageably small number of alternativas. Contextual and intarval jJata (within the spectrum) can o2 us2i to trim these further at two levels: (a) pooling of aoaak statistics to sharper decision probabilities on the presence af haternatoms -- the fraqments are subsets of the molecule and (bh) the assemblage of candidat? solutions under each of the alternative formulae. Manifestly, computer processing can sort dyut branches of decision trees that would soon exhaust human patience. These heuristics are bnilt ints the DENDRAL programs {solutions based on fraqmentation theory), but are also applicable to table look-up approaches. We (ref. 29,33), and others (9.9., H.-K, Wipf, et. al., J. Amer. Than. Soc., 95, 3369 (1973)) have illustrated the importance of 57 netastable ion determinations in automated structure elucidation based on MS data. Data on metastable ions must be judiciously salacted because of the time and sample normally required to perform the measurements. Our programs are now capable of precise specification of those experiments necessary and sufficient to distinguish among a set of canlidate structures. We seek more afficient ways of acquiring these selected instrumental data. This can ba accomplished with minimal cost by developing the hardware and software necessary to perform (defocussed) metastable scans and zalculate the data. Much of the hardware, except an accurate sensor for accelerating voltaye, already exists. We have had considerable experience in peak detection on the software siie; the calculations to determine transitions are simple. [t is assumed that the operator would manually adjust the instrument to the i2sirel "daughter" mass prior to initiation of the scan of netastable origins ("parents") of this daughter. The retent availability of revarsei-geometry instruments has provided naw methods of metastable defocussing (e.g., Beynon, at.,al., Anal. Chem., 45 (12), 1023A (1973)). We have illustrated the power of these techniques in mixture analysis (ref. 69). No "normal" qeometry instrument is equipped to perform these neasuremants to determine all the daughters of a given parent, information which is frequently more useful than the converse. This infornation can be obtained, in principle, hy synchronous variation of two of the three fields (magnatic/ accelerating/s electrostatic daflection) in a very accurate way. We would like to explore this possihility because we feel that this technique, if feasibl>, would represent a significant upgrading of the many standard geometry, double-focussing instruments in existence. B. Computer Assisted Structure Elucidation AS mantion24d above, some existing programs can be used immediately For structure elucidation problems using MS data. The programs have been iascribed in detail elsewhere andi are mentioned in the section on existing capabiliti2s (Sec. I.B.E, above). The Planner's performance, for example, is excellent precisely in the areas where MS, by itself, is capable of definitive structure analysis. The general intellectual flexibility of the human chemist is beyond the reach of plausible programs. On the other hand, where the history of a sample is known, so as to restrict the potential classes of compounds and for classes where the rules af MS fraqnentation are well understood, the program's performance natches that of trained mass spectroscopists, the program also affers soma advantages in its exhaustive and rapid analysis of the Jata. Many structure elucidation problems of the user community Fit into tais category and existing resources can fulfill these needs. Whether man- or computer-implemented, MS cannot solve all stcuctaure 2lucidation problems, however. In such cases, recourse is to other spectroscopic techniques if sample size permits. As Aascribed in the introductory section, diverse information is ni2c2d together to achieve a solution. Interactive computer programs can assist in seqments of this procedure, with the advantages of exhaustive evaluation of the data and the molecular structures suqgested by these data. 5k Tn our own and in planned collaborative work, we will call upon the extensive facilities of the chemistry department for acyuisition of additional spectroscopic data. These services are Fiiaanced by fees, paid from existing research grants of the user coumunity. There are sufficient documented examples of structure >Laicidatio.r problems to obviate the requirement for extensive use af these additional facilities in jJevelopment of the programs. On the other hand, the intensive pursuit of mechanized "intelligence" in the domain of MS requires more than availability of public MS qJata . It requires the collaboration of skilled chemists actively engaged in practical MS research and, at the same time, committed to the exploration of innovations in the application of AI to the solution of the problems As i121 the oast, we will develop the computer programs through zlos2 collaboration among Drs. Duffield ani Smith (and other nembers of their groups) and the program designers and proqrammercs. For us, this means daily censultation for discussion 9f strategy, extensions to the program, ani solutions to new problems. In particular, we propose to continue software development (on the AIM-SUMEX facility) as folllows: 1) The rezeantly completed structure generating algorithm will be the core of our efforts to assist in structure elucidation. The structure yanerator can guarantee that the correct solution is somewhere in the list of possibilities. Additional programs, such as th2 Planner allow us to avoid exhaustive generation in practice. Some parts of the cyclic structure generator program have not b2en extensively tested yet, and these tests will be the First task to completes. 2?) The structure elucidation task is strongly directed toward rejection of whole categori2s (2.9g., compound classes) of solutions as quickly as possible by using as mich knowledge about the themical history or characteristics of a sample as is available. Details of spectroscopic data then define the nolesular framework more precisely. Each step in this procedure represents the application of constraints on the sat of possible solutions. Computational efficiency demands that these constraints be applied early in the generation process when the structure generator is utilized. Jo have made some effort to examine the kinds of constraints used by scientists engaged in structure elucidation. We have begun Yesigning strategies so that these constraints can he brought to bear on th2 structure generator. Some of these strategies involve niior changes to the existing program; others require significant axtensions of existing generating functions. One approach which Spams particularly attractive to us is presently under Jevelopment. This approach will utilize the existing structure yenerator, with some modifications, to generate a dictionary of cyclic skeletons up to those containing a maximum of twelve tertiary vertices. The dictionary will be a complete, irredundant list of ring systems which contain no multiple bonds and no tut-edges {acyclic parts). This dictionary will be organized and ceye) so that many constraints can be implemented easily. The dictionary will allow exhaustive specification of ring systems with loubl2e bonds and/or aromaticity. The rings themselves can he Labelled with heteroatoms to generate heterocyclic ring systems, 57 ar with acyclic radicals to qenerate substituted ring systems. The existence of the dictionary will lead to greater computational afficiency as it needs to be generated only once, and specific configurations of rings (numbers, sizes, fusions) can be pulled immediately from the dictionary. #2 propose to continue these investigations so that a reasonable variaty of constraints can be recognized and utilized effectively by a computer program. This rapresents the first step toward increasing the chemical knowledge of a program which views molecular structures and their manipulation as mathematical entities and transforms. 3) Present, effective use of the structure generator or its subroutines for special problems requires a detailed knowledge of the program. We propose to develop an interface between users and the progran to remove this requirement. The interface would contain elements of structure input and display routines anda simple language for application of constraints. Portions of these alamants are available from other workers (e.g., Richard Feldman, NTH) and we would draw on these sources whenever possible. Hy We prooose that initial efforts will be directed toward a system where the scientist examines his own data and inputs his fiirdings (in terms of allowed and disalloved structural features) to the program as constraints. The generator would then provide a list of possible solutions to he evaluated, followed by iteration on this procedure. 5) Many structure elucidation problems can be characterized as assembly 92£ sub-structures inferred from spectroscopic data into complet? molecular structures. Although there are two instances in the literature describing programs with the capability to solve this problem (see S. Sasaki, "Determination of Organic Structures sy Physical Methods, Vol. 5," F.C. Nachod and J.J. Zucherman, Ed., Academic Press, New York and London, 1973, p. 285; M.E. Munk, C.5. Sandano, R.L. McLean, and T.H. Haskel, J. Amer. Chem. Soc., 89, 4158 (1967)), we do not feel these approaches fulfill the regquicements for generating complete lists of structures and avoiting duplicate structures. We have some strategies to solve this problem, thus extending the scope of the generator while tying i+ more closely to the methods used hy chemists engaged in structure alucidation. Our existing structure generator has this capability; as long as the sub-structures are connected only by a single tond, no new rings are formed. 6) Wo wish to implement general rontines for finding molecular ions from spectroscopic data in order to improve the general power of the Planning program. The current Planning program depends on yaving som2 metastable ion information with HEMS data, together with knowledge of the structural class with special rules for the class. We will incorporate strategies suggested by Biemann (K. Biamann andi W.J. McMurray, Tet. Lett., 647 (1965)) and McLafferty (8. Venkataraghavan, F. W. McLafferty, and G. FE. Van Lear, Org. Mass Spectrom., 2, 1 (1969)) for finding molecular ions, but also give the program the flexibility to use class-specific information whan available. The procedure will be to use these kinds of iaformatioar within a general heuristic search paradigna. 7} The sertion on aims indicated some longer-term directions 60 which might be pursued. Of these, we feel that the incorporation af threa-dimensional information into the program is perhaps most inportant {e.g., representation of three dimensional information, molecular nodelling including steric factors). Lederberg has previously discussed ways (Ref. 1) in which three dimensional information can be considered in the generation and representation of molecular structures. More recently, the work of Wipke (J. Amer. Chem. Soc, in press; personal communication) in connection with computer assisted organic synthesis has provided important results which we would attempt to utilize to avoid unnecessary Juplication of effort. We plan to collaborate with Dr. G. Loew (Stanford genetics Dept.) to utilize her available programs on nolecular orbital methods to determine local minima for conformations. Another longer term goal which we feel is both interesting and important is the use of an extended Predictor (which we have previously described in the context of MS) to assist in Jistirnquishing amona potential solntions to a structure alucidation problem. We have recently carried out some extensions to the existing Predictor by incorporating the ability to suggest natastable defocussing experiments. Further extensions to include knowledge about other spectroscopic techniques and the information which can be elicited from these techniques are clearly feasible an@ could be a powerful extension to our computer assistance afforts. [. Theory Formation Jna inpoctant aim of this project is to improve the existing theory formation capabilities and thus provide more assistance to scientists investigating regularities within classes of compounds. This is a theory formation task at a very pragmatic level. The MS theory that the program attempts to find is of the same form as the one practicing mass spectroscopists use for structure alicidatioa. Thus, resulting pieces of theory are extensions to both the scientists! theory and the computer's theory of the Jiscipline. To improve this program we need to complete the Plan-GZenerate-Test prodram that has been started (as described in th2 appendad annual report) and tune it over many test cases. We also wish to make the programs interactive and easy to use so that they are more readily accessible. This can he done when the programs are transferred to the AIM-SUMEX facility. we plan to apply the theory formation program to two different kinds of data: (a) the data collected in the interest of anierstanding the mass spectronetry of a particular class of zoupounds, as was done for estrogenic steroids, and (b) sollections of diverse data that may provide some insight into tore ceneral fraqmentation mechanisms. For example, we hope to fiid general rules analogous to the alpha-cleavage rule or the stability a€ aromatic rings. The INTSOM program mentioned in Section (1) is the planning phase yf the theory formation program. It currently runs in batch mode an Stanforits 360/67 computer. We wish to add an interactive monitor to INTSU™ to give an investigator the ability to set up his awn conditions for interpreting the mass spectra and to sontrol th2 type of summary he wishes to see. For example, if he 6 / is interested in the allowable hydrogen transfers associated with one specific process the program could be instructed to produce a yery sp2ocific summary. Also, we wish to add an interactive program for answering questions about the results. Por example, an investigator should he able to find out easily how many procasses involve cleavage of a specific bond and how strong their resulting MS peaks are. The INTSOM program is now used routinely by mass spectroscopists at Stanford engaged in investigations of the mass spectrometric fragmertation of various classes of organic compounds, primarily steroiis. A manuscript is now in preparation (Ref. 54) describing the fragmeitation of progesterone and related compounds. The program was used extensively in this work. We are now beginning a letailed examination of the fragmentation of steroids related to the anjJrostane skeleton, particularly the biologically important testosterones. We propose t9 continue to use the INTSUM program in its present form and as it is improved in support of these studies. The qanerator of rules that we now have, RULEGEN, does a credible job of explaining the regularities summarized by INTSUM. It has found, for example, the well-known alpha-cleavage fragmentation process ani beta cleavage followed by rearrangement in the low casolnition data for fifteen aliphatic amines. The program will be extended. in two important ways to increase its utility: (i) the proaram needs to be able to work with an increased number of Jescriptive predicates in the generation of rules, and (ii) it needs to bh2 given a more flexible reprasentation of complex fraqmentation mechanisms so that it can more easily find rules iavolving nore than two bonds. We will continue working with low resolution MS data of the 150-200 nonofunctional aliphatic compounds studied previously in the coritext of the performance program. These compounds are well-understood and thus provile a gooi test of the program's effectiveness. In order to insure generality in the theory Formation programs, we will alse test the system against the high resolution mass spectra of the 68 astroganic steroids. Since they ace also wall-understood, these compounds will show how well the proqram can deal with complex ring systems, multifunctional ztompoands, cleavages involving more than two bonds, and high resolution data. The existing programs are in good working order - within definite Linits - 8s) we expect to apply them to new sets of data from the “S laboratory as interest arises. For example, as the high and low resolution MS from marine stersls are collected we expect to use INTSUM and RULEGEN (at least) to assist in the interpretation ani generalization of these data. Since these problems will advance tha state of knowledge of MS, it is not correct to look on than as test problems. However, in the past the programs Jeaveloped most rapidly when they were applied to unsolved problems af interest to our colleagues in the chemistry department. Por Yavelopment of the interactive programs, we will rely heavily on the criteria of acceptahility by Stanford users. The programs themselves will he written in INTERLISP on the SUMEX computer. Tnittially, we will provide interactive access to the control daraneteors of the programs in order to allow users to tailor their rans to thair immediate interests. Later we hope to expand these to allow interrogation of the programs with respect to both sontents of the results and the program's reasoning steps. >. Applications to Binmedical Problems Wa can immediately offer te the user community the Planner, for analysis of HP/MS in terms of molecular structure. The program is insensitiva to the source of the 4S data, and we foresee significant use of the program for analysis of spectra of mixtures without prior separation and spectra from the GC/HRMS facility without adiitional programming effort. Examples of applications ar2as are summarized below. 42 wish to exploit our existing capabilities of the analysis of birdlogical mixtures without prior separation (ref. 33). This approach will prove particularly useful in studies of mixtures which aro difficult to separat2 and analyze by GC. Phytoecdysones related to ecdysone, an insect molting hormone, present such a problem. GC of these compounds is very difficult, although high-pressure liquid cthromatography has recently been used to aarry out separations. This class of compounds represents an interesting and valuable test case for our combined MS and somputer tachniques, particularly the specification and subsequent acquisition of metastable defocussing data for precise linking of parent and fraqment ions in the spectrum of a complex mixture (c2fs. 28, 33). Model compouris, mixtures and current structure alactidation problems are available (Nakanishi, Columbia; Takemoto, Tohoku University, Sendai, Japan). Although most users cannot be sompletely specific as to the natur2 of their future structure 2lacidation problems, we feel that several of these problems can bo handled by soch an approach. As the structure generator and its extensions are developed Further, we foresee continuing use of an interactive version applied to specific problems of the user community. As an example, the work in collaboration with the GRC project will involve studies of several classes of compounds extracted from hunan hody fluids (e.g., aromatic and alivhatic acids, various classes of bases, amino acids and carbohydrates) which contain representatives varying by substitutions about a small number of molecular skeletons. The generator can define all isomers which must be considered as possible solutions. For those problems which are amenable to attack by library search procedures, e.g., screening of GC/LRMS runs of marine sterols to weed out known compounds, we propose to use these procedures and to investigate extensions to them. using a procedure related to that described by McLafferty (K-S. Kwok, et al., J Amer. Chen. Soc., 95, 4185 (1973), we seek to Jatermine from modified library search techrigques the known structures which yield similiar spectra. Utilizing the DENDRAL structural manipulation routines, we would then seek to determine those related structures (whose Spectra are not in the library) which are possible solutions. A library, including Wiswesser Line Notation names, exists (F. W. WcLafferty, private communication) and would be of some utility in this work. The MS facility in conjunction with our programs will ke used in studies of the following natur?: 63 1) Prof. Djerassi - we plan use of the MS facilities and computer proqrams in ongoing research connected with existing NIH-supported studies on steroids and marine sterols and continued collaboration with Prof. Adlercreutz on estrogen mixtures isolated from body fluids. Further collaboration with Prof. Adlercreutz will be on structural studies of new estrogen metabolites whose presence in nixtuirces has been inferred through our previous collaborative efforts. Phe work on marine sterols presently utilizes SC/L&MS and frequently laborious separation procedures to isolate individual fractions for HRMS analysis. GC/HRMS will be a significant assistance in this effort. We plan MS studies of known marine sterols (utilizing INTSUM) to jierive fragmentation rules, which then will he used in the Planner to aid structure elucidation of naw compounds. We also plan further work on extensions of MS theory in the steroid field, initially focussed on additional hiomedically iauportant classes of steroids related to the pregnane {progesterones) and androstane (testosterones) skeletons. This work is cucrently being carriel out by Dr. Smith in collaboration with two visiting senior scientists (Dr. Roy Gritter, Dr. Geoff dromayv) currently on sabbatical leave fellowships. 2?) ‘Chemistry Department Collaborators - as indicated by the responses summarized in the letters of interest (Appendix A), thara is significant interest in use of the MS facility by other NIH-supported members of the chemistry department. All those Yisted are familiar with the technique of MS as applied to structure elucidation problems. Most have usel MS frequently, darticularly Prof. Van Tamelen in his studies of the cyclization af squalene and related studies in the terpenoid and steroid fieli. The interests of these collaborators are generally in HRMS and 3C/HREMS, with occasional nse of other capabilities of the systam. The types of compourds studied by this group and an iriication of the amount of use expected are summarized in the letters of interest. 3) Zenetics Research Center (GRC): (Profs. J. Lederberg, H. Cann; Dr. A. Duffield) The body fluids analyzed by GC/LEMS to aate include urine, blood, amniotic fluid and cerebrospinal Fliia. ach body fluid is fractionated into the following compound classes: a) organic acids and neutral compounds b) amino acids c) carbohydrates which after appropriate derivatization are analyzejl by SCYLRMS/conputer system. A library of known LRMS will serve as the primary means of identifying metabolites from their axperimentally recorded LRMS. n those instances where the LPMS is insufficient for metabolite Jantification GC/HRMS data will be necessary to determine the composition of all ions in its mass spectrum. These data will jreatly enhance the prospects of identifying the metabolite in question. ve ad GY It is known {on past performance) that if a compound is present in body fluids at the level of 1 microgram per GC peak then good qguality HR/MS will be recorded (ion amplitude dynamic range of 12109, mass accuracy 9f +-Spom) using the Varian MAT 711 mass spectrometer. If the GC peak of interest contains insufficient material for a HRMS scan then preparative 3C conld be used to concentrate that portion of the chromatograph effluent prior to SC/HRMS. Prior to the demise of the ACME computer system (July 31, 1973) we Javeloped a GC/HRMS system and applied it to the analysis of axtracts from body fluids. The following example represents cesults obtained with this system during its development. The example us3d was a routine analysis and was run to determine the capability of the overall system during its development and not as an unknown sample of extreme interest. The total ion plot recorded during the Lifetime of the GC/HRMS analysis of an amniotic fluid is reproduced as Figure 1. A complete high resolution scan was recorded on each of the peaks shown in Figure 1. Filing time of the time-shared ACME computer systam did not allow the system to operate in a repetitive scan nole. For the sake of brevity only the GC/HRMS scan (# 1594, Fiqurce 2?) ctorresponding to glutamic acid N-TFA O-n-Butyl ester derivative is produced. (The corresponding GC/LPMS scan is Figure 3). The scan time per decade of mass was 10.5 seconds, the resolution 6,500 and the matching tolerance for the assignment of ampirical somposition set to 4 mmu. The rasults show that the system was capable of accurate mass measurement with a dynamic rarge in ion amplitude of about 33:1 in this instance. Tha cassation of computer support for the GC/HRMS system did not allow a HEMS analysis to be maie which was crucial to the ideatification of a metabolite present in a body fluid. Since that time however, several instances have arisen where GC/HRMS jata would hava been collected in an effort to identify natabolites not previously seen. The expected sample throughput in the GRC project with existing personnel is expected to approach 5 to 7 body fluids per week {(15- 21 GIVL2MS fractions to be run in the Genetics Department per weak). On average GC/HRMS would be required on 1 - 2 samples per waek, The research interests of the Medical School collaborators relative t> the proposed #S resource are summarized in the letters of interest (Appendix A). The MS services required by this cormunity will include GC/LRMS (Forrest, Sera, Kalman for drug and Arug metaholite identificatior, Rabinowitz and Wilkinson for prostaglandin identification, Robin for identification of oxidized/reduced rejox pairs, Hollister for Marihuana metabolites, Rarchas, neurotransmitters, Fair, polyamines and the prostatic antibacterial factor in urine); GC/HRMS (Trudell, drug metabolite identification, Kvenvolden, structure of amino acids and related compounds plus samples as required from interests described under SC/LRMS). In those instances where the biological extract contains insufficient material for a GC/HRMS scan preparative GC, using axisting instrumentation within the chemistry department, can be 6s” used to concentrate the material prior to the GC/HRMS analysis. If the mat2rial of interest is obtained relatively pure by this technique then HRMS analysis using direct sample insertion into the ion source would be utilized. Dn rom 4s mantioned above, several of the computer programs have immediate utility for assisting with structure elucidation oroblems. For example, the Structure Generator program can answer structural isomerism questions independently of mass spectrometry, {o.q. , to provide lists of isomers in conjunction with isomer interconversion problems such as carbonium ion rearrangements). Because the program will be able to generate complete lists of isomers with (or without) some specified structural features, a researcher can have confidence that no possibilities have been yverlooked. Some interest in the structure generator has been expressed by representatives of the pharmacentical industry. The qyanrerator could be used to suggest complete sets of structural alternatives for possible synthesis, once a physiologically active congener has been identified. Tn more general terms, the structure generator can he cichly suqqyestive of new, unexolored areas of synthetic arqanic chemistry. for example, the generator has heen used by a graduate student in chemistry, Mr. Jan Simek, to jiantify the space of possible Diels-Alder condensation products consisting of six atoms of any combination of carbon, nitrogen, oxygen, and sulfur ina six-nembered ring with one double bond. A literature search through the Ring Index revealed that many of the ring systens have never heen reported. 66 QV. SIGNTIPICANCE OF PROPOSED RESEARCH Structure alucidation is an important and difficult problem for birmedical scientists. Many of them lack the detailed chemical hackground necessary to he efficient in this endeavor. Generally speaking, they also lack the frequently complex and expensive 2quipment {(e@.q., high resolution mass spectrometers) to provide spectroscopic data to assist them in solving problems of molecular struztara. We plan to provide the chemical and analytical expertise to facilitate the solution of their structural problens. This research aims at providing more powerful techniques for Jetormininy molecular structures than are now routinely available. In particular, we have proposed {a) providing extended MS services az a means of collecting powerful analytic data for scientists; (b) develosing (and extending) sophisticated computer programs to assist with the interpretation of the jata from mass spectrometry and 2lsewh2re, {(c) developing (and extending) novel computer programs to assist with formulation of the rules of interpretation, and {d) applying these state of the art techniques to problems of biomedical relevance. Our research group is thus iedicated to a broad-based attack on the applications of structure alucidation to biological and hiomedical problems. The proposed research not only holds promise for significant long-term advances, it can have immediate henefits as well. Many nembars of the hiomedical community at Stanford have called upon the MS laboratory for assistance in the past and will continue to Jno so in the future. The proposed resogurce will provide the conduit for a substantial increase in the utilization of MS within the Stanford biomedical community, The ability of the proposed resource to interpret the experimental data it generates (enhanced hy the close proximity of the resource and hiomedical community) Should rasult in a successful program 9f interdisciplinary research. 4R¥S is an important source of data for these problems, and SC/HRMS is still more important. Previous investment by the NIH a the Varian MAT-711 HRMS system at Stanford can be utilized now ni built zpon for the future. Continued operation of the GC/MS ystam will give the Stanford community access to state-of-the-art spactroscopic techniques and to professional mass spactroscopists h> can help with ongoing problems. The comput2r programs themselves constitute a unique resource for assisting with the structure datermination. The previous NIH Jrant supported development of the programs. Tn part, we are requesting funds to exploit these programs. One of the most siqnificant aspects of this work is its interdisciplinary view of solving molecular structure problems by inzelligently directed search of the space of chemical graph structures. As a result of posing the structure determination problem in this framework, we have been able to further the knowledge about structure elucidation in at least three ways. Ficst, soma of the knowledge used by analytical chemists has been nade more precise for use in a computer program. Second, codifying such knowledge for the computer has led to the discovery 67 »f new research areas to extend our existing knowledge of MS. Several publications listed in the bibliography (Refs. 42 and following) are reports of exactly this kind of research. Finally, the computar's systematic search through the space of possible structures gives the practicing scientist the confidence that no structures were merely overlooked. The efficiency of the program depends on the exclusions of many whole classes of compounds, but the componter will have rejectel those classes using precise, axplicitly stated criteria. Our recent work on Finding MS interpretation rules (theory formation) can provide additional unique capabilities for assisting with the problem solving. We wish to continue this research bacause it offers hope for a solution to the problem of Furnisking real-world knowledg2 to computer programs ~-- in particulac to the computer programs that assist with structure alucidation. This is a pressing problem in current AI research. High performance programs, of which DENDRAL is most often cited, lerive their power from large stores of knowledge. Yet there are no routine methods for infusing such systems with knowledge of the task domaii. We believe our research in theory formation holds a cay to the solution of this problem. oy V. FACILITIES & EQUIPMENT fhe Stanford Mass Spectrometry Laboratory will provide 4S services on the Varian MAT-711 mass spectromater coupled with a 4awlatt-Packard gas chromatograph (Model 7610A). As service fastrunents for more rontine mass spectral analyses, the laboratory has a MS-9 and CH-4Y mass spectrometers. Nata reduction is currently provided on Stanford's IBM 370/158 sompater ia conjunction with a front-end PDP-11/20 data acquisition computer. (The PPP-11/20 presently has only the capability for buffering peak profile data between the mass spectrometar and the IBM 370/158 computer at the Stanford Computer Teanterc.) An alternative to buying time on the 370/158 is proposed and discussed in the bndget justification. Tha AT projrams will be run on the NIH-sponsored AIM-SUMEX tonupater facility (a PDP-10 conputer with the TFENEX operating systan, 192K words of memory, and adequate peripherals for our puroos2s). Running these programs on SUMFY will incur no charge. 67 VI. BT BLT OGRAPHY A. D&@ND2AL PUBLICATIONS {1) J. Lederberg, "DFENDRAL-64 - A System for Computer Tonstruction, Enumeration and Yotation of Organic Molecnles as Tree Structures and Cyclic Graphs", (tachnical reports to NASA, also available from the author and summarized in (12)). {la) Part I. Notational algorithm for tree structures (1964) CR.57029 (1b) Part II. Topology of cyclic graphs (1965) CR.68898 {1tc) Part III. Complete chemical graphs; enbedding rings in trees (1969) {2) J. Lederberg, "Computation of Molecular Formulas for Mass Spectrometry", Holden-Day, Inc. (1964). (3) J. Lederberg, "Topological Mapping of Organic Molecules”, Proc. Nat. Acai. Sci., 53:1, January 1965, pp. 134-139. (1) J. Lederberg, "Systematics of organic molecules, graph topology and Hamilton circuits. A general outline of the DENDRAL system." NASA CR-48R99 (1965) (5) J. Leierberg, "Hamilton Circuits of Convex Trivalent Polyhedra (up to 18 vertices), Am. Math. Monthly, May 1967. (6) S. L. Sutherland, "“DENDRAL - A Computer Program for Zeneratirg and Filtering Chemical Structures", Stanford Artificial Tntelliqence Project Memo No. 49, February 1967. {7) J. Lele2rberg and &. A. Feigenbaum, "Mechanization of Taductive Inference in Organic Chemistry", in RB. Kleinmuntz (ed) Formal Rapresentations for Human Judgmant, (Wiley, 1968) (also Stanford Artificial Intelligence Project Memo No. 54, August 1967). (8) J. Lederberg, "Online computation of molecular formulas from mass number." NASA CR-94977 (1968) (9) B. A. Feigenbaum and B. G. Buchanan, "Heuristic DENDRAL: A Program foc Generating Explanatory Hypotheses in Organic Chemistry", in Proceedings, Hawaii International Conference on System Sciences, 3. K. Kinariwala and F. F. Kuo (eds), University of Hawaii Press, 196R, (10) B. G. Buchanan, G. L. Sutherland, and E. A. Feigenbaun, "Heuristic BFNDRAL: A Program for Generating Explanatory Hypotheses in Organic Chemistry". In Machine Intelligerce 4 (B. Yeltzer and D. Michie, eds) Fdinburgh University Press (1969), (also Stanford Artificial Intelligence Project Memo No. 62, July {11) F. A. Feigenbaum, "Artificial Intelligence: Themes in the Second Recade"™. Tn Final Supplement to Proceedings of the IFIP658 Internatioazal Congress, Edinburgh, August 1968 (also Stanford Artificial Intelligence Project Mamo No. 67, August 1968). 70 (12) J. Lederherg, "Topology of Molecules", in The Mathematical Sciences - A Collection of Essays, (ed.) Tommittee on Support of Qesearch in the Mathematical Sciences (COSRIMS), National Academy af Sciences - National Research Council, M.I.T. Press, (1969), pp. 37-51. (13) G. Sutherland, “Heuristic DENDRAL: A Family of LISP Programs", Stanford Artificial Intelligence Project Memo No. 80, March 1969, (14) 3. Lederberg, G. L. Sutherland, B. G. Buchanan, FE. A- Faiganbaum, A. V. Rohertson, A. M. Duffield, and Cc. Djerassi, "Anplications of Artificial Intelligence for Chemical Inference I. The Yunber of Possible Organic Compounis: Acyclic Structures Tontaining C, H, OQ and NN". Journal of the American Chemical Sociaty, 91211 (May 21, 1969). (15) A. ™. Duffield, A. V. Robertson, C. Djerassi, B. G. 3uchanan, 3. Le. Sutherland, FE. A. Feigenbaum, and J. Lederberg, "Application of Artificial Intelligence for CThemical Inference II. Interpretation of Low Resolution Mass Spectra of Ketones". Journal of the American Chemical Society, 91:11 (May 21, 1969). (16) R. G. Buchanan, G. L. Sutherland, E. A. Feigenbaum, “Toward an Understanding cf Information Processes of Scientific Inference in the Context of Organic Chemistry", in Machine Intelligence 5, (8. Maltzar and DPD. Mickie, eds) Edinburgh University Press 11979), (also Stanford Artificial Intelligence Project Memo No. 99, September 1969). (17) J. Lederberg, G. L. Sutherland, B. G. Buchanan, and E. A. Feigenhaum, "A Heuristic Program for Solving a Scientific Inference Problem: Summary of Motivation and Implementation", in 8, Banerji & M.D. Mesarovic {eis.) Theoretical Approaches to Non-Numerical Problem Solving, New York: Springer-Verlag, 1970. (Also, Stanford Artificial Intelligence Project Memo No. 104, Yovember 1969.) {18) C. W. Churchman and B. G. Buchanan, "On the Design of Inductive Systems: Some Philosophical Problems". British Journal for the Philosophy of Science, 20 (1969), op. 311-323. (19) G. Schroll, A. M4. Duffield, c. Djerassi, 8B. G. Buchanan, G. L. Sutherland, E. A. Feigenbaum, and J. Lederberg, "Application of Artificial Intelligence for Chemical Inference TII. Aliphatic Fthers Diagnosed by Their Low Resolution Mass Spectra and NMR Data". Joirnal of the American Chemical Society, 91:26 (December 17, 1969). (23) A. Bochs, A. M. Duffield, G. Schroll, C. Djerassi, A. 3B. Delfino, B. G. Buchanan, G. L. Sutherland, ®. A. Feigenbaum, and J. Lederberg, "Applications of Artificial Intelligence For themical Inference. IV. Saturated Amines Diagnosed by Their Low Pesotution Mass Spectra and Nuclear Magnetic Resonance Spectra", Journal of the American Chemical Society, 92, 6831 (1970). (21) Y.™M. Sheikh, A. Buchs, A.B. Delfino, G. Schroll, A.M. Puffielda, Cc. Djerassi, B.G. Buchanan, G.L. Sutherland, E.A. faigenbaum and J. Lederberg, "Applications of Artificial Intelligence for Chemical Inference V. An Approach to the 7/ Tompnuter Ganeration of Cyclic Structures. Differentiation Between All the Possible Isomeric Ketones of Composition C6H100", Organic Yass Spectrometry, 4, 493 (1979). (27) A. Buchs, A.B. Delfino, A.M. Duffield, C. Djerassi, B.%. Buchaian, E.A. Feigenbaum and J. Lederberg, “Applications of Artificial Intelligence for Chemical Inference VI. Approach to a 3eneral Method of Interpreting Low Resolution Mass Spectra with a Tomouter", Helvetica Chemica Atta, 53, 1394 (1970). (23) F.A. Feigenbaum, 28.G. Buchanan, and J. Lederberg, "On Generality and Proklen Solving: A Case Study Using the DENDRAL Program". [In “achine Intelligence 6 (B. Meltzer and D. Michie, eds.) Fdinburgh Iniversity Press (1971). (Also Stanford Artificial Intelligence %rodect Memo No. 131.) (24) A. Bachs, A.B. Delfino, ©. Djerassi, A.M. Duffield, B.G. Buchanan, ®.A8,. Feigenbaum, J. Lederberg, G. Schroll, and G.L. Sntherland, "Tha Application of Artificial Intelligence in the Interpretation of Low- Resolution Mass Spectra", Advances in Mass Spectrometry, 5S, 3174. (25) B.G. Buchanan and J. Lederberg, "The Heuristic DENDRAL Program for Explaining fmpirical Pata". In proceedings of the IFIP TSongreses 71, Ljubljana, Yugoslavia (1971). {Also Stanford Artificial Intelligence Project Memo No. 141.) (26) B.G. Buchanan, F.A. Feigenbaum, and J. Lederberg, "A Heuristic Programming Study of Theory Formation in Science." In proceedings xf the Second International Joint Conference on Artificial Tntelliganze, Imperial College, London (September, 1971). {Also Stanford Artificial Intelligence Project Memo No. 145.) (27) Buchanan, B. G., Duffield, A.M., Robertson, A.V., “An Application af Artificial Intelligence to the Interpretation of Mass Spectra", Mass Spectrometry Techniques and Appliances, Fdited by G. W. A. Milne, John Wiley & Sons, Inc., 1971, p. 121-77. (28) D.H. Smith, B.G. Ruchanan, R.S. Engelmore, A.M. Duffield, A. Yeo, 7.A. Feigenhaum, J. Lederberg, and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference VIII. An approach to the Computer Interpretation of the High Resolution Mass Spectra yf Complex Molecules. Structure Eluciitation of Estrogenic Steroids", Journal of the American Chemical Society, 94, 5962-5971 (1972). (29) B.S. Buchanan, F.A. Feigenbaum, and N.S. Sridharan, “Heuristic Theory Formation: Data Interpretation and Rule Formation". In Machine Intelligence 7, Edinburgh University Press (1972). (30) Lederberg, J., "Rapid Calculation of “olecnlar Formulas from Yass Values". Jnl. of Chemical Education, 49, 413 (1972). (21) Brown, H., Masinter L., Hjelmeland, &., "Constructive Graph Labeliag Using Double Cosets". Discrete Mathematics {in press). (Also Computer Science Memo 318, 1972). (32) 8B. G. Buchanan, Review of Hubert Dreyfus’ "What Computers Can't No: A Critique of Artificial Reason", Computing Reviews (January, 1973). (Also Stanford Artificial Intelligence Project Memo No. 7a Tat) (33) D0. 4. Smith, B. G. Buchanan, R. Ss Engelmore, H. Aldercreutz and 2. Dierassi, “Applications of Artificial Intelligence for Chemical Tafererce IX. Analysis of Mixtures Without Prior Separation as Illustrated for Estrogens". Journal of the American Chemical Society 95, 6078 {1973). (34) DP. HL Smith, B. G. Buchanan, W. C. White, F. A. Feigenbaum, ™. Djerassi and J. Lederberg, "Applications of Artificial Tnhtelligqenze for Chemical Inference ¥. Intsum. A Data Tnterpretation Program as Applied to the Collected Mass Spectra of Fstroqenic Steroids". Tetrahedron, 29, 3717 (1973). (35) B. G. Buchanan and N. S. Sridharan, "Rul2 Pormation on Non-~domogeneous Classes of Objects". In proceedings of the Third Tnternational Joint Conference on Artificial Intelligence (Staaford, California, August, 1973). {Also Stanford Artificial Tntelligance Project Memo No. 215.) (36) DP. Michie and 2.G. Buchanan, "Current Status of the Heuristic DENDRAL Program for Applying Artificial Intelligence to the Interpretation of Mass Spectra". August, 1973. (37) #. Brown and L. Masinter, “An Algorithm for the Construction of the Graphs of Organic Molecules", Discrete Mathematics (in press). Also Stanford Computer Science Department Memo ZTAN-CS-73-361, May, 1973) (39) D.H. Smith, L.M. Masinter and N.S. Sridharan, "Heuristic NENDRAL: Analysis of Molecular Structure," Proceedings of the NATO/CNA Ailvanced Study Institute on Computer Representation and Manipulation of Chemical Information, in press. (39) RB. Carhart and Cc. Djerassi, "Applications of Artificial Intelligense for Chemical Inference XI: The Analysis of C13 NMR Nata for Structure Elucidation of Acyclic Amines", J. Chem. Soc. (Parkin IT), 1753 (1973). (40) L. Masinter, N. Sridharan, and D.H. Smith, "Applications of Artificial Intelligence for Chemical Inference XTI: Exhaustive Saneration of Cyclic and Acyclic Isomers.", suhmitted to Journal of the American Chemical Society. (41) L. Masinter, NS. Sridharan, 8. Carhart and D.H. Smith, "Applications of Artificial Intelligence for Chemical Inference XTII: An Algorithm for Labelling Chemical Graphs", submitted to Journal of the American Chemical Society. (4?) The Determination of Phenylalanine in Serum by Mass Sragqmentography. Clinical Biochem., 6 (1973). By W.E. Pereira, V.A. Bacon, Y. Hoyano, R. Summons and A.M. Duffield. (43) The Simultaneous Quantitation of Ten Amino Acids in Soil Fxtracts by Mass Fragmentography. Anal. Biochem., 55, 236 (1973). Ry W.E. Pereira, ¥. Hoyano, %.F. Reynolds, 2.E. Summons and A.M. Duffield. (44) An Analysis of Twelve Amino Acids in Biological Fluids by Mass Fragmentoqraphy. Anal. Chem., in press. By R.E. Summons, W.E. Pareira, H.3. Reynolds, 7.C. Rindfleisch and A.M. Puffield. (45) Tha Quantitation of B-Aminoisobutyric Acid in Urine by Mass Fraqmentogcaphy. Clin. Chim. Acta, in press. By W.E. Pereira, P.E. Summons, W.E. Reynolds, T.C. Rindfleisch and A.M. Duffield. ($6) The Determination of Fthanol in Rlood and Urine by Mass Pragmentography. Clin. Chim. Acta, in press. Py W.F. Pereira, R.E. Summons, T.C. Rindfleisch and A.M. Duffield. Vie 2nblications Describing DENDRAL-~Related Research But Not Funded By This Grant (47) An Automated Gas Chromatographic Analysis of Phenylalanine in Seram. Tlinical Biochem., 5, 166 (1972). By E. Steed, 4H. Pereira, RB. Halpern, M. D. Solomon and A.M. Duffield. (48) Pyrrolizidine Alkaloids. XIX. Structure of the Alkaloid Frucifoline. Coll. Czech. Chen. Commun., 37, 4112 (1972). By P. Sedmera, A. Klasek, A.M. Duffield and F. Santavy. (49) chlocination Studies I. The Reaction of Aqueous Hypochlorous Acid with cytosine. Biochem. Biophys. Res. Commun., 48, 880 (1972). Py W. Patton, V. Bacon, A.M. Duffield, R. Halpern, Y. Yovyano, KH. Per2ira and J. Lederberg. (52) A Stady of the Flectron Impact Fragmentation of Promazine Sulphoxide and Promazine using Specifically Deuterated Analogues. Austral. J. Chem., 26, 325 (1973). By M.D. Solomon, R. Summons, @. Pareira and A.M. Duffield. (51) Spectrometrie de Masse VITI. Elimination dtcan Induite par Impazt Flectronique dans le Tatrahyiro-1,2,3,4-Napthtal-ensa-diol-1,2. Org. Mass Spectre., 7, 357 (1973). By P. Perros, J.P. Morizur, J. Kossanyi and A.M. Buffield. (92) Chlorination Studies IT. The Reaction of Aqieous Hypochlorous Acid with a-Amino Acids and Dipeptides. Biochim. at Biophys. Reta, 313, 170 (1973). By W.E. Pereira, Y. Hoyano, R. Summons, V.&. Bacon and A.M. Duffield. (53) Spectrometrie de “asse. IX. Fragmentations Induites par Tmpart Flecstronigue de Glycols- En Serie Tetraline. Bull. Chim. Soc. France, 2105 (1973). By P. Perros, J.P. Morizur, J. Kossanyi and A.M. Onffield. (54) The Use of Mass Spectrometry for the Identification of Metabolites of Phenothiazines. Proceedings of the Third Trternational Symposium on Phenothiazines, Raven Press, New York, 1973, By A.M. Duffield. (55) ChLlocination Studies IV. The Reaction of Aqueous Hypochlorous Acid with Pyrimidine and Purine Bases. Biochem. Biophys. Res. Tommun., 53, 1195 (19735. By Y. Hoyano, V. Bacon, RF. Summons, W¥.F. Pereira, B. Halpern and A.M. Duffield. (96) Mass Spectrometry in Structural and Stereochemical Problems. CCXXYVIT. Flectron Impact Induced Hydrogen Losses and Migrations in Some Aromatic Amides. Org. “ass Spectry., in press. By A.M. Nnf field, GS. deMartino and C. Djerassi. (57) Stable Isotope Mass Fragmentogqraphy: Quantitation and tydroqen-Deuterium Exchange Studies of Fight Murchison Meteorite Amino Acids. Geochem. et Cosmochim. Acta, submitted for publication. By W.E. Pereira, B.F. Summons, T.C. Rindfleisch, 75 ALM. Doffield, B. Zeitman and J.G. Lawless. (58) Mass Spectrometry in Structural and Stereochemical Problems CLXXXKIII. A Study of the Blectrorn Impact Induced Fragmentation of Aliphatic Alilehyies. dg. Amer. Chem. Soc., 91, 6814 (1969). By R.J. Liedtke anid oc. ~Pijerassi. (59) Mass Spectrometry in Structural and Stereochemical Problems - CXCVII. Flectron-Impact Induc2d Functional Group Interaction in 4-Renzyloxycyclohexyl Trimethylsilyl Ether. Org. Mass Spectrom. 4, 257 (1970). By Paul DB. Woodgate, Robin T. Gray and Carl Di2zrassi. (67) Mass Spectrometry in Structural and Stereochemical Problems ~ CYCVIII. A study of the Fragmentation Processes of Some a,B-Insaturated Aliphatic Ketones. Org. Mass Spectrom., 4, 273 (1979). By Younus M. Sheikh, A.M. Duffield and Carl Djerassi. (61) Mass Svectrometry in Structural and Stereochemical Problems “CIT. Interaction of Pemote Punctional Groups in Acyclic Systems agpon Flectron Impact. J. Org. Chem., 36, 1796 (1971). By M. Sheehan, R.J. Spangler, M. Ikeda and C. Djerassi. (62) Mass Spectrometry in Structural and Stereochemical Problems ™"VIT. Fragmentation of Unsaturated Ethers. Org. Mass Spectrom., 5, 895 (1971). By J. P. Morizur and C. Djerassi. (63) Mass Spectrometry in Structural and Stereochemical Problems “CVIIT. The Effect of Double Bonds Upon the McLlafferty Rearrangamant of Carbonyl Compounds. J. Amer. Chem. Soc., 94, 473 (1972). By J.8. Dias, Y.M. Sheikh and Cc. Djerassi. (64) Mass Spectrometry in Structural and Stereochemical Problems CCOX¥. Behavior of Phenyl-Substituted a,B-Unsaturated Ketones Upon Flectron Tmpact. Promotion of Hydrogen Rearrangement Processes. J. Orq. Chem., 37, 776 (1972). Ry R.J. Liedtke, A.F. Gerrard, J. Diekman ani C. Djerassi. (55) Mass Spectrometry in Structural and Stereochemical Problems CCXXTI. Dalineation of Competing Fragmentation Pathways of Complex Molecules from a Study of Yetastable Ion Transitions of Deuterated Derivatives. Orq. Mass Spectron., 7, 367 (1973). By D.H. Smith, A.M. Duffield and <. Djerassi. (455) The Carbon-13 Maqnetic Pesonance Spectra of Acyclic Aliphatic Amines. J. Amer. Chem. Soc., 95, 3710 (1973). By H. Eggert and c. Djerassi. (67) The Carbon-13 Nuclear Maqnetic Resonance Spectra of Keto Steroids. J. Org. Chem., 38, 3788 (1973). By H. Fqgert and C. Djerassi. (68) Mass Spectrometry in Structural and Stereochemical Problems CCXYXYXVIII. The Fffect of Heteroatoms Upon the Mass Spectrometric Fragnentation of Cyclohexanones. J. Org. Them., in press. By J.H. Block, D.H. Smith, and C. Djerassi. (49) Mass Spectrometry in Structural and Stereochemical Problems “CX¥LIIT. Applications of DADI, a Technique for Study of Metastable Tons, to Mixture Analysis. J. Amer. Chem. Soc., Submitted for publication. By D.H. Smith, C. Djerassi, K.H. Maurer, and U. 76 RAODD. (70) Mass Spectrometry in Structural and Stereochemical Problems. The Fragmentation of Progesterone and Alkyl-Substituted Proaasterones, in preparation, by S. Hammerum and C. Djerassi. (71) Applications of Molecular Orbital Theory to the Interoretation of Mass Spectra: Prediction of Primary Praqgnentation Sites in Organic Molecules," Org. Mass Spectrom., 7, 1241 {1973), by G. Loew, M. Chadwick, and 5.4. Smith. 77 PROGRESS REPORT 78 TEXT OF 1973 ANNUAL REPORT FOR RESEARCH PROJECT: RESOURCE-RELATED RESEARCH -- COMPUTERS AND CHEMISTRY Progress Report Part A. APPLICATIONS OF ARTIFICIAL INTELLIGENCE TO MASS SPECTROMETRY JBJECTIVES: Research activities carried out under Part A of this project have been directed toward extending the reasoning power of Heuristic DENDRAL. Heuristic DENDRAL reresentsS a paradigm for attacking problems in one of the major areas of importance to any scientific discipline dealing with molecules, the area of structure elucidation. We have focused our attention on the use of heuristic programming techniques for analysis of mass spectra and ancillary analytical data which can be obtained utilizing a mass spectrometer. It is convenient to discuss objectives, progress and plans by examining three broad areas of activity in researzth connected with Part A. We wish to note that these areas conform to our overall strategy of PLAN-GENERATE-TEST. We have shown, Parlier, how powerful this strategy is when applied to the task of structure elucidation utilizing mass spectral data. The areas and their objectives are the following: (I) PLANNER: (a) Extend the programs used for structure elucidation to structural analysis of complex molecules. (b) Assess the capabilities andi limitations of the PLANNER. (c) Generalize the programming techniques to reduce compound Class dependence. (d) Explore the utility of ancillary data available from the mass spectrometer. (II) STRUCTURE GENERATOR: (a) Complete the exhaustive, irredundant generator ot molecular structures. . {b) Develop efficient constraints 9n the generator to exploit its potential utility. (c) Exploit the concepts developed for the structure generator in solving various structure-problems (related to m.s. and others) and isomer-problems. (III) PREDICTOR (a) Extend the Predictor to still more complex molecular structures. {b) Explore the design of experimental strategies, utilizing Predictor functions, tos differantiate among candidate solutions. We point out that the PLAN-GENERATE-TEST strategy, although applied to structure elucidation, has potential utility as a strategy for solving other chemical problems. Similarly, although we utilize mass spectral data aluost exclusively, the same heuristic programming techniques allow facile extension to analysis of data from other types of analytical instrumentation. These were not objectives of the original research proposal but seem logical extensions for future work. We have illustrated the patential of these techniques for analysis of 13C NMR data (Carhart and Djerassi, 1973). This is discussed briefly under the PLANS section, below. PROGRESS: (I) PLANNER The fuaction of the Planner is to analyze mass spectral data acquired dn a compound. The Planner attempts to derive structural information from these data using the rules of behavior of compounds in the mass spectrometer. Jbjective (a): Extend Programs. The Planner is presently embodied in a program which also contains a set of functions to assemble this structural information into complete molecules (a primitive Structure Generator) and to test these molecular structures with other, not necessarily mass spectral, rules (a pcinitive Predictor). This performance program was written in this way to provide a useful tool for chemical studies while more general versions of the Structure Generator and Predictor were being jeveloped. This program and its performance have been described in some detail in a publication and ia previous progress reports. A manuscript (Smith, et.al., 1973) has now appeared describing the application of this prograa to the analysis of mixtures of compounds without prior separation. Objestive (b): Assess Capabilities. He have extended the capabilities of the Planner so that we can analyz2 both low and high resolution mass Spectral data. A low resolution mass spectrum is regarded by the program as a pseudo high resolution spectrum wherein possible elemental compositions of each peak are limited only by the inferred molecular tormula of the compound. This results in more ambiguity with a commensurate increase in number of candidate solutions as would be expected considering the lower specificity of low resolution data as compared to high resolution data. We have extended our capabilities for molecular ion determination utilizing a heuristic search technique through the space of plausible noleculac ions. This techaique has had significant success even when dealing with the low resolution mass spectra of compounds which display no molecular ion, for example the class of derivatized amino acids {trifluoracetyl, n-butyl esters) important to studies carried out unier Pact B, below. ; We have segmented the performance program to decrease the amount of memoty required for its operation. This should increase the chances for other groups to make use of the program. The linitations of the present performance program are primarily the requirement that some information about the class of conpounds be kaown, and that, for each class, relatively detailed rules about the nass spectral fragmentation of this class be available. The former limitation results primarily from the nature of the program in that a complete structure generator is not incorporated. The primitive structure generator available to the program can only place substituents about an assumed skeleton. This Limitation will be alleviated when a structure generator with SOODLIST and BADLIST constraints is available (see Structure Ganerator, below). The latter limitation is mor2 fundamental, but is characteristic of every spectroscopic technigue to one degree or another. It must be assumed that analysis of a mass spectrum, alone, may not lead to sufriciently unambiguous information about the structure of the compound yielding the spectrun. It is For this reason that extensions of the programming techniques to encompass data from other spectroscopic techniques are attractive. 3 S/ Qbjective (cj): Generalize Techniques. we have carried out several successful experiments to ensure that the perfornance program, used originally for analysis of estrogenic steroids, retains only procedures which are compound-class independent. By supplying fragmentation rules for other classes of compounds, we nave successfully carried out structure elucidation of molecules in several diverse classes including other steroiial hormones and related compounds {progesterones, testosteroaes, androsterones), steroidal sapogenins and derivatized amino acids. Objective (dj): Explore Utility Previous progress reports kave sumnarized in some detail the ways in which data from ancillary techniques in mass spectrometry (metastable ion ani low ionizing voltage data, labile hydrogen exchange) can be used by the program. The utility of metastable ions for aid in stucture elucidation continueS as an active area of interest. Experience with the program has inspired studies on metastable ions, first, to help delineate the course of fragmentation of molecules with the purpose of extending and refining fragmentation rules used by the program (Smith, Duffield and Djerassi, 1973). Experience with the iacreased specificity of structural information with concomitant reduction in analysis tine when metastable ion information is availavle (Smith, et.al., 1973) has led to a study of a new technique for detection and analysis of netastable ions (Direct Analysis of Daughter Ions, or DADI) and has illustrated the utility of this technique in mixture analysis (Smith, Djerassi, Maurer and Rapp, 1973). Experience with the PLANNER has led to several research activities relatei to, but not supportei by, this grant. Our studies of estroyen nixtuc2s isolated from pregnancy urine have suggested new compounds likely to be important in the human metabolism of estrogens. Some of these compounds are hitherto unreported structures and a synthesis progran is underway in Professor Djerassi’s laboratory to produce sone of these compounds. The Planner will be used as one method of comparison of the synthesized, authentic standards with those isolated from pregnancy urine. - Work is also being carried out to explore the fragmentation of model systems possessing two heteroatoms in close proximity. It is clear from the first of these studies (Block, Smitno, and Djerassi, 1973) that the fragmentation of these difunctional systems does not reflect that of monofunctional analogs. More groundwork is required in this area td obtain better fragmentation rules for these systems. II. STRUCTURE GENERATOR Objective {a): Complete the Generator The last progress report discussed the completion of both the basic structure generator algorithm and program, which provide the capability for exhaustive generation of graph isomers of a given empirical fornmala, with prospective avoidance of duplicate structures. Since the time of the submission of that report, manuscripts describing the structure generator, directed specifically to an audience of chemists, have been submitted (Masinter, Sridharan, Lederberg, and Smith, 1973; Masinter and Sridharan, 1973). Some effort over the past year has been devoted to / Gor verification of the completeness and irredundancy of the method. He have extended existing combinatorial countiny algorithms to check that “the numbers of isomers generated are correct. we have used an interactive version of the generator to verify that variations (allowed by the algorithm) of the mechanism of generation yield the same set of isomers. In this way we are now increasingly confident that the progran's performance accurately reflects the mathematically proven algorithm on which it is based. The Structure Generator has been briefly described, and placed in its context within Heuristic DENDRAL, in an invited paper presented at a NATO/CRS sponsored conference on Computer Representation and Manipulation of Chemical Infornation, held in Amsterdam in June, 1973 (Smith, Masinter and Sridharan, 1973). we have also begun to develop techniques to expand the scope of the jenerator. One example, which has been completed, is adding extensions to the CATALOG. The CATALOG contains the set of vertex-graphs fror which structures are assembled. The original CATALOG was not sufficient to generate all isomers of some potentially interesting compositions, 2.g-, those involving graphs possessing nodes of degree >3. We now have a program which constructs complete sets of vertex-graphs containing nodes of degree >3 from the set of trivalent graphs in the original CATALIS. We have thus extended the capabilities of the generator. Other such extensions are discussed in the PLANS Section, below. Objective {b): Develop Constraints It is absolutely essential that we provide the mechanism for constraining the Structure Generator: without constraints it is merely a legal move generator, as in a chess-playing program. For structure elucidation problems, the Planner can determine many features of the molecular structure from various types of experimental jJata such as presence of functional groups, and the numbers of doubie bonds and rings. Partial information of this sort can be used to constrain the Structure Generator to the Space of plausible candidate structures. From a graph-theoretic point of view, however, constraining the graph generating algorithm is a difficult unsolved problen. We are presently formulating several types of constraints to apply to the Structure Generator. Some types of constraint await the development of new mathematical tools {see PLANS), while others can be immediately implemented with relatively minor alterations to the algoritnm. The class of constraints presently receiving attention deals with types of unsaturation (rings or double bonds) desired in the final structures. Related to this constraint is the constraint of number of quaternary carbons present. The rormer information (number and nature of multiple bonds) is readily available from several spectroscopic techiigues, while the latter may be obtained from 13C NMR. The implementation of this class of constraints will be used as the model for future implementation of a GOODLISI {structural features known to be present) and a BADLIST {structural features known to be absent). It is possible that some types of constraints may not be easily implemanted within the algorithm. Thus, retrospective tests of isomers may be required to search for desired or unwanted features. We have developed some new approaches to graph matching which seem to be Significantly more efficient than previous methods. Should prospective implementation of a constraint prove difficuit, we will 5 53 have at our disposal some powerful graph matching tools to exercise the constraint. Ibjective (c): Exploit the Generator for Structure Elucidation we have demonstrated the utility of some subsystems of the structure generator, e.g., the LABELLER, by exploring some problems of isomerism noted in the chemical literature. We have corrected the member and presented the identities of isomers formed by different substitutions of alkyl chains about a porphyrin nucleus. We are presently exploring some probleas of isomerism of carbocyclic cing systems, specifically C10H10 and (CH) 10 and C10H2n-4 tricyclic ring systems, n = 8 - 12, related to the mechanistics of isomeric interconversion. We have the complete list of all topologically possible 1176 6-membered Diels-Alder ring systems, using any combination of C,N,O and S. This list was generated using the PARTITIONER and an extended version of the LABELLER. These are all the 6-membered ring systems that can be embedded in structures resulting from the well-known Diels-Alder reaction. Of the 1176 possible ring systems, approximately 80% are unreported in the Ring Index. Many of these are chemically unstable - undecscoring the need for a BADLIST implementation for the Cyclic Structure Generator. However, many of these unreported ring systems are certainly chemically plausible. Awareness of such gaps in relatively simple synthetic categories might lead to discovery of new categories of compounds with important biological effects. (III) PREDICTOR The function of the Predictor in the PLAN-GENERATE-TEST strategy is to perform a detailed evaluation of candidate solutions (structures) to a structure elucidation problem. It may use a more detailed model of spectroscopic behavior than that embodied in a Planner to attempt to differentiate among possible solutions. Objective {a): Extend the Progran we have extended and generalized the Predictor used previously for saturated, aliphatic, monofunctional compounds. Given a list of structures and rules of fragmentation processes, it will predict a mass spectrum for each structure. Prediction of relative ion abundances is crude, but previous work has shown that even crude measures of ion abundance are usually satisfactory. The predicted spectrum can be matched then with the observed and candidates ranked according to the quality of the match. The program works with structures and rules of any complexity. An interesting philosophical question is how much kaowledge should be brought to bear on interpretation of the data at the Planning vs. Predicting stages of analysis. It is our feeling that if more can be accomplished during Planning to constrain the Structure Senerator, the analysis will be more efficient. On the other hand, Some knowledge can be utilized only if a complete structure is specified, so that its use is restricted to a predictive role. Moreover, Predictor Functions have a greater utility, as indicated in the subsequent © section. Objective (b): Differentiate Structures The Predictor has a more obvious application in the design of & oY axperimental strategies to differentiate among candidate structures. Rules of spectroscopic behavior utilized during Planning demand the presence of some data to evaluate. The Predictor can then be used to request additional data from any source to aid in differentiation. We have explored this approach by analyzing the spectrum of a compound with the performance program. The Predictor was used to evaluate the the set of candidate structures to define the minimum number of metastable defocussing experiments necessary to achieve a unique Solution. Thus, no time need be spent acquiring unnecessary OF redundant data. Clearly, this has important implications for future work in that many different types of data (e-g-, NMR, IR) might be requested by the Predictor to facilitate identification. PLANS For the remaining period of this grant we propose to carry out the following extensions of the research outlined above. {I) PLANNER The major area of activity related to the present version of the Planner will be to focus our attention on using the program in support of chemical studies outlined under Part B (see below). The chenical extraction and derivitization procedures used in the analysis of body fluids restricts the types of compounds present in each separated fraction. Such Simplifications make this a problem more amenaple to attack. Only certain classes of compounds are present in each fraction, and we have some knowledge of the mass spectral fragmentation of these Classes. We wish to couple the program to the results of library matching procedures so that we direct our efforts to structure elucidation of those components which have not been previously identified. This is particularly important in the context of analysis probleas such as those discussed under Part B. We propose increasing the utility of the program by removing two present constraints: (a) allow unspecified "dummy" atoms in the skeleton instead of requiring a rigidly fixed structural skeleton, and ({b) allow fragmentation processes to be specified more flexibly ~- in particular, allow fragmentations in substituents on the skeleton instead of cequiring all fragmentations to cut through the skeleton. Although we are presently unconfortable with immediate coupling of the Structure Generator to the Planner, we propose continued 2xploration of the problems of controlling the generator automatically. Actual implementation awaits a more comprehensive treatment of the problem of constraints. II. STRUCTURE GENERATOR The inclusion of a reasonable set of constraints is obviously required and will be the subject of most of our future development work. Wwe plan to develop an interface to the present interactive version of the Structure Generator that speaks a more chemical language. This interface will be designed to avoid the present reguirement that the usec know something about the program before he can use it. As the optinum method for implementation of a constraint is determined, the interface will be extended to translate the usual specification of the constraint in chemical terms into rules acting at the level of the program. AS we stressed in development of the PLANNER, there are Considerable advantages to building a powerful program in an 7 8S incremental fashion. These steps are loyically directed to our longer term goal of developing a useful structure elucidation tool for the chemist, based on the structure generator. There are several other areas of interest which are peripherally related to the problem of constraints and which will occupy our attention. The Structure Generator knows no chemistry other than atom names and their associated valence. There are several important areas where this is an immediate problem. For example, the program has no explicit awareness of the aromatic resonances, leading to a remediable redundancy in the list of isomers. An aromaticity-predictor is also indispensable for anticipating chemical behavior of a structure. We wish to deal with types of isomers besides simple connectivity isomers. We need to have the facility for assembly of molecular sub-structures (the usual type of information inferred fron spectroscopic data) when such an assembly yields new rings or multiple bonds. All the above questions need a reexamination of the fundamental mathematical considerations. The present algorithm has been proven to yield complete and irredundant solutions. In devising new algorithms or variants of the present one, the burden of proof can be reduced to {the usually easier) equivalence to the previous algorithm. Professor Harold Brown, who was the mathematician instrumental in initial development of the labelling algorithms for structure generation, will be with us again for several months to help attack the problems outlined above. I1I. PREDICTOR Although the Predictor has been essentially finished for our own internal use, we propose to spend a modest amount of time in the coming months making it more usable by others. In particular, we wish to extend the initial work on predicting the new experiments necessary for distinguishing among candi.late structures (e.g., predicting that a metastable peak at mass 70.1 would confirm one structure and disconfirm another). In addition, we plan to work ona cataloging some existing sets of mass spectrometry rules in such a way that the program can be easily used for different classes of problems. Part A ceferences (Published or submitted during year) R. Carhart and C. Djerassi, "Applications of Artificial Intelligence for Chemical Inference XI.... D.He Smith, B.G. Buchanan, R.S. Engelmore, H. Adlercreutz, and c. Djerassi, "Applications of Artificial Intelligence for Chemical Inference, IX. Analysis of Mixtures Without Prior Separation as Illustrated for Estrogens," Je Amer. Chem. Soc., September 5, 1973. DeH. Smith, A.M. Duffield, and C. Djerassi, "Mass Spectrometry in Structural and Stereochemical Problems, CCXXII. Delineation of Competing Fragmentation Pathways of Complex Molecules froma Study of Metastable Ion Transitions of Deuterated Derivatives," Org. Mass Spectrom, 7, 367 (1973). D.He Smith, C. Djerassi, K.-H. Maurer, and U. Rapp, "Mass Spectrometry in Structural and Stereochemical Problems, CCXXXIV. Applications of DADI, A Technique for Study of Metastable Ions, to Mixture Analysis," Je Amer. Chem. Soc., submitted (1973). JeM.e Block, D.H. Smith, and C. Djerassi, "Mass Spectrometry in Structural and Stereochemical Problems, CCXXXVIII- The Effect of Heteroatoms upon the Mass Spectrometric Fragmentation of Cyclohexanones," J. Org. Chem., submitted (1973). L.M. MaSinter, N.S. Sridharan, J. Lederberg, and D.H. Smith, "Applications of Artificial Intelligence for Chemical Inference XIl.- Exhaustive Generation of Cyclic and Acyclic Isomers," J. Amer. Chem. Soc., submitted (1973). LeMe Masinter and N.S. Sridharan, "Applications of Artificial Intelligence for Chemical Inference, XIII. Labelling Objects Having Symmetry, J. Amer. Chem. Soce, submitted (1973). DeH. Smith, LL.M. MaSinter, and N.S. Sridharan, "Heuristic DENDRAL: Analysis of Molecular Structures," to be published in the proceedings of the NATO/CRS Advanced Study Institute on Computer Representation and Manipulation of Chemical Intormation. N.S. Scidharan, "Computer Generation of Vertex Graphs", Stanford Computer Science Memo CS-73-381, Stanford University, July, 1973. NO &7 Part B-i Gas Chromatograph - Mass Spectrometer Data System Development IBIECTIVES AND RATIONALE The objectives of this part of the research project are the improvement of GC/MS data system capabilities and the coupling of axtracted data to the Heuristic DENDRAL programs for analysis. we ultimately seek a substantial degree of interaction between the instrunentation and the analysis programs including computer specification and control of the data to be collected. In addition to the developnent goals, this portion of the project provides for the day-to-day operation of the GC/MS systems in support of mass spectrum interpretation computer program development {Parts A and C) and applications of GC and MS to biomedical and natural product sample analysis with collaborators. Our rationale for this approach is that the overall systen should be designed for problem solving rather than just for data acquisition. This implies that analytical computer programs, after review of available experimental data, could be able to specify additional information needed to contirm a solution or distinguish between alternative solutions. Such requests could be passed back to ah instrunent management program to set up proper instrument parameters and collect the additional information. Our initial objectives to implement an on-line, closed-loop system using the ACME computer facility have met with a number of difficulties. These grow principally out of ACME's limited computing capacity and commitments as a general time-sharing service. In addition, the scanning high resolution mass spectrometer has inherent sensitivity limitations, which do not preclude a demonstration but rather limit the practical sample volume which could be analyzed. Until such limitations can be overcome, particularly in terms of computing support, we have focussed our efforts on an open-loop demonstration of such an approach. PROGRESS Progress has been made in demonstrating a GC/High Resolution Mass Spectrometry capability, in further developing automated data analysis algorithms, and in planning for the implementation of a data system for the collection of metastable ion information. Progress in these and other areas directed toward the main research goals has been impacted by a transition in computing support which is still underway. This transition, discussed in more detail below, was occasioned by the phase-over of the ACME computing facility, which we had been using, from NIH grant subsidy to a fully fee-for-service operation under Stanford University auspices. Summaries of the results and problems encountered in each of the areas follow. Gas Chromatography/High Resolution Mass Spectrometry (GC/HRNS) we have verified the feasibility of combined gas chromatography/high resolution mass spectrometry (SC/HRMS). Using programs described in previous reports, we can acquire selected scans and reduce them automatically. The procedures are slow compared to "real-time" because of the limitations of the time-shared ACME facility. We have recorded sufficient spectra of standard compounds to show that the JO of system is performing well. A typical experiment which illustrates some of the parameters involved was the following. A mixture (approximately 1 microgran/component) of methyl palmitate and nethyl stearate was analyzed by GC under conditions such that the GC peaks were well separated and of approximately 25 sec. duration. The maSS spectrometer was scanned at a rate of 10.5 sec/decade, and a resolving power of 5000. The resulting mass spectra displayed peaks over a dynamic range of 100 to 1 and were automatically reduced to masses and elemental compositions without difficulty. Mass measurament accuracy appears to be 10 ppm over this dynamic range. We have begun to exercise the GC/HRMS system on urine fractions containing significant components whose structures have not been alucidated on the basis of low resolution spectra alone. Whereas more work is required to establish system performance capabilities, two things have become clear: 1} GC/HRMS can be a useful analytical adjunct to our low resolution GC/MS clinical studies (Part B-ii), and 2) the sensitivity of the present system limits analysis to relatively intense GC peaks. This sensitivity limitation is inherent in scanning instruments where one gives up a factor of 20-50 in sensitivity over photographic image plane systems in return for on-line data read-out. This limitation may be relieved by using television read-out systems in coniunction with extended channeltron detector arrays as has been proposed by researchers at the Jet Propulsion Laboratory. We can nevertheless make progress in applying SGC/HRMS techniques to accessible effluent peaks and can adapt the improved sensor capability when available. These experiments have also shown that the ACME computer facility cannot reliably provide the rapid service reguired to acquire and file repetitive spectometer scans. This problem is to be expected in a heavily used time-shared facility without special configuration for high rate, real time support. Excepting possible requirements for real time data analysis (such as in a closed-loop system), this problen could be solved by implementing a large local buffer (e.g., disk) at the front-end data acquisition mini-computer. We are 2xploring this possibility in conjunction with the overall planning for computer support discussed below. Data Analysis Algorithms A. Peak Resolution One of the significant trade-offs to be made in GC/HRMS is tnat of sensitivity versus resolution. In maintaining high instrument resolution (in the range of 5,000-10,000) while scanning fast enough to analyze a Gc effluent peak (approximately 10 sec/decade), system sensitivity is constrained as discussed above. We have worked on a method for reducing instrument resolution requirements through more sophisticated computer analysis of a lower resolution output. In effect this transters the burden of overlapping peak detection and mass determination to the computer instead of reguiring inherently well resolved data out of the instrument. The advantage comes in better system sensitivity. Unresolved peaks are separated by an analytical algorithm, the operation of which is based on a model peak derived from known Singlet peaks in the data. Actual tabulated peak models ate used rather than the assumption of a particular parametric shape (€.g-«, triangular, Gaussian, etc.). This algorithm provides an effective // 89 increase in system resolution by approximately a factor or tnree thereby effectively increasing systen sensitivity. By measuring and comparing successive monents of the sample and model peaks, a series of hypotheses are tested to establish the multiplicity of the peak, minimizing computing requirenents for the usually encountered simple peaks. Analytic expressions for the amplitudes and positions of component peaks have been derived in the doublet case in terms of the first four moments of the peak complex. This eliminates time consuming iteration procedures for this important multiplet case. Iteration is still required for more complex multiplets. B. GC Analysis The application of Gc/MS techniques to clinical problems as described in Part B(ii) of this proposal has indicated the desirability of automating the analysis of the results of a GC/MS experiment. The SC/MS output involves extracting from the approximately 700 spectra sollected during a GC run, the 50 or so representing components of the body fluid sample. The raw spectra are in part contaminated with background "column bleed" and in part composited with adjacent constituent spectra unresolved by the GC. We have begun to develop a solution to this problen with promising results. By using a disk-oriented matrix transposition algorithm, the array of 700 spectra by 500 mass samples per Spectrum can be rotated to gain convenient access to the "mass fraymentogran" form of the fata. The transposition algorithm avoids many successive passes over the input data file as would be required in a straightforward approach. By generating a reorganized intermediate file, time Savings by factors of 5-10 are achieved. The fragmentogram forn of the data displayed at a few selected mass values, has been used at Stanford, MIT, and elsewhere for some time to evaluate the GC effluent profile as seen from these masses. Mass fragmeatograms have the important property of displaying higher resolution in localizing GC effluent constituents. Thus by transposing the raw data to the mass fragmentogram domain, we can systematically analyze these data for baselines, peak positions, and amplitudes, and thus derive better mass spectra for the relatively few constituent materials. These are free from background contamination and influences of adjacent GC peaks unresolved in the overall gas chromatogram. These spectra can then be analyzed by library search techniques OF first principles as necessal ye fe have applied a preliminary yersion of this algorithm to several urine samples. These contain several apparently simple peaks which in fact consist of multiple components. The algorithm performs well in separating out these constituents although further testing is required. Closed-loop Instrument Control In the long term, it could be possible for the data interpretation software to direct the acquisition of data in order to minimize ambiguities in problem solutions and to optimize system efficiency. The task of deciding among and collecting various types of mass spectral information (@ege, high resolution spectra, low ionizing voltage spectra, or selected metastable ion information) under closed-loop control during a GC experiment is difficult. Problems arise because of the large requirements placed on computer resources ° 90 and present limitations in instrument sensitivity or data read-out imposed by the time constraints of GC erfluent peak widths. Solutions to these problems may not be economically feasible within currently existing technology but seen achievable in the future. We are studying this problem in a manner which would entail a multi (two or three) - pass system. This permits the collection of one type of data (e.g., high resolucion mass spectra) during the first GC/MS analysis. Processing of these data by DENDRAL will reveal what additional data are necessary on specific GC peaks during a subsequent GC/MS cun. Such additional data could help to uniquely solve a structure or at least to reduce the number of candidate structures. This simulated closed-loop procedure could demonstrate the utility of DENDRAL type programs to examine data, determine solutions and propose additional strategies, but will not have the requirement of operating in real-time. Some parameters in the acquisition of pacticular types of information, such as metastable data, will require computer control, even in the open-loop mode. We have considered plans to implement two aspects of instrument sontrol, in addition to the magnetic scan control implemented for GC operation and reported previously. These include system resolution control, such as would be reguired to change from normal spectrum scanning mode to metastable scanning mode, and high voltage control necessary to selectively measure metastable ion fragmentation data. In addition to these we have considered the discrete switching of various electronic mode controls which are straightforward ana not discussed in detail. Implementation plans for computer control of these instrument functions have been delayei because of the ACHE computing facility transition which diverted the necessary hardware and software manpower. Resolution control involves changing the widths of the slits at the exit of the ion source and the entrance to the ion multiplier detector. Additional source and electrostatic analyzer voltages must be controlled to optimize performance, as discussed later. Mechanical slit adjustment is accomplished on the MAT-711 instrument by heating wires which support the slit jaws. The resulting expansion or contraction of those wires move the spring-loaded jaws. AS implemented by the manufacturer, the time constants involved in heating the control wires are 5-10 seconds. It is possible to speed this up to approximately 0.5 seconds by application of a controlled over voltage decreasing to the appropriate equilibrium value for the desired slit width. This was demonstrated by a series of experiments on an extra slit assembly mounted in a vacuum jar in our laboratory. Cooling of the wires is relatively fast in the way they are mounted so no problem exists in that direction. It is desirable to have feedback to indicate the actual slit width achieved rather than relying on a slit assembly calibration. Stretching of the support wires or changes in the spring tension under temperature cycling would change this calibration. An aptical scheme to measure slit width in situ is possible. We do not contemplate implementing this feedback immediately because it requires major changes to the instrument flight tube. Two types of metastable ion relationships are obtainable by suitable control of the double-focussing instrument. First, for a given daughter ion, one can trace the parent ions which give rise to 0S F/ it. Second, for a given parent ion one can trace the various daughters to which it gives rise. fhe first measurement ("metastable defocussing") is the more straightforward for this instrument since parent ions can be enumerated by a simple scan of the accelerating voltage, holding the electrostatic analyzer (ESA) voltage and magnetic tield constant. The second type of scan requires the coordinated scan to two of the three fields. We feel that joint computer control of the accelerating voltage and ESA voltage is the simpler approach since the magnetic field is more difficult to set and monitor because of hysteresis effects. For a resolution of 1000 in the metastable ion mass measurement, the voltages must be set to approximately .01-.02% accuracy. [his requires a 14-16 bit digital- to-analog (D/A) converter to control the input (10 volts) to the operational amplifier which generates the high voltage. Similar D/A controls of ion source voltages for ion current and focus dptimization can be implemented using optical isolators to allow verniec control of the various high voltages around the nominal 8KV values. Computing Transition As mentioned earlier, the transition of the ACME computing facility from NIH subsidy to Stanford-sponsored fee-for-service operation has impacted our development efforts this past year. Both the low resolution instrument used for routine body fluid analysis research and the high resolution instrument are affected. All computing support was previously obtained from the ACME facility, much of it aS core research without explicit transfer of funds. The transition has creyuired consideration of both technical and economic factors. The new facility represents a combination of the previous ACME interactive and real time computing load with various administrative and batch computing loads on a new IBM 370/158 computer. This new environment will have even more difficulty in supporting real tine computing needs than ACME did. No real time support has been available since the 360/50 service was discontinued on July 31, although terminal service was reestablished in mid-August. Data acquisition service via the IBM 1800 is expected to be operational by early November. For the high resolution instrument, this transition, as a mininun, necessitates an interface modification (we previously sent data through the IBM 2701 interface no longer to be supported). It also amplifies the problems we encountered in sending and filing high cate mass spectrometer data {particularly during GC/MS runs). These problens would be present to some extent in any general time-Sharing service machine without specific hardware and software configuration provision for these needs (such provisions for real time support had been proposed in our SUMEX computer application). After examining a variety of alternatives, we conclude that a dedicated mini-computer solution (built around a machine with the arithmetic capability of a PDP-11/45) would be highly attractive technically and relatively inexpensive. A stand-alone mini-computer systen would cost in the range of $50,000-$60,000, augmenting existing equipment, plus approximately $9,000 per year maintenance and $2,000 per year for supplies. Estimates for 370/158 support, based on current charging alyorithmss and previous utilization experience, run from $35,000-$50,000 per year. This spread is caused by uncertainties in the effects of planned measures to increase operating efficiency and possible changes to the rate structure. In ~ FA any case, the mini-computer approach pays for itself in 1 to 2 years of operation and provides the responsiveness Of a dedicated machine for real time support. Unfortunately our existing budget does not provide for this solution. The budget is very marginal tor purchase of computing support from the 370/158 as well. This later approach is the only currently available one, however, Since it can he implemented with relatively low start-up Cost. The effect of budget limitations appears in terms of a reduced number of samples which can be run. We have attempted to minimize the other budget costs {manpower principally) to increase the computing funds available. This will necessarily impact our development goals. We hope, in the renewal application for DENDRAL support, to be able to implement the more effective mini-computer approach for the high resolution spectrometer as a longer term solution. we have undertaken an interim mini-computer solution for the low resolution spectrometer (Finnigan 1015 quadrupole) which is primarily used for our body fluid analysis studies. For the Same reasons outlined above, a Mini-computer solution is attractive. In the case of the low resolution quadrupole instrument, a lesser capacity machine will suffice for immediate data acquisition and display functions. We have implemented such an interim system on a PDP~11/20 machine available fron other funding sources. This system, which is now operational, allows the acquisition of GC/MS data, limited by the capacity of tne DEC tape storage medium to approximately 600 spectra, per experiment. For sertain types of GC analyses, up to 1090 spectra per experiment are required so this limits, to some extent, the utility of this interin system. A calcomp plotter is supported for display purposes. A fixed head disk ‘provides for library search procedures which are still being converted from the ACME system. We have applied to the NIH-GMS for funds to augment this system in order to relieve current limitations as part of a Genetics Center research proposal. FUTURE PLANS Our future plans are basically to continue development along the lines outlined above. We will complete the computing support transition steps described. These include primarily establishing a connection to the new 370/158 facility to provide interim support for the high resolution system. We will pursue additional software and hardware development goals as far as possible within the limited budget available. ‘These afforts will concentrate for the most part on bringing up a metastable jon analysis data system. It should be reemphasized that the manpower levels proposed in the follow-on budget have been minimized to allow for purchasing computing time on the 370/158. The allocated manpower is ceguired primarily for instrument operation and maintenance with mininal provision for development efforts. Part B(ii)- Analysis of Body Fluids by Gas Chromatography/Mass Spectrometry. The chemical separation of urine into the following fractions prior to GC/MS analysis has been described in previous DENDRAL Reports: free acids (analyzed by gc/ms as their methyl esters) amino acids (analyzed by gc/ms as their N-trifluoroacetate n-butyl ester) carbohydrates (analyzed by gc/ms as their trimethyl silyl ether derivatives) hydrolyzed acids (analyzed by gc/ms as their methyl esters) hydrolyzed amino acids (analyzed by gc/ms as their N-trifluoroacetate n-butyl esters) During the past year we have extended these methods of fractionation to the following body fluids: blood {after an initial precipitation of proteins by the addition of ethanol) and amniotic and cerebrospinal fluids. The following summarizes the results obtained from an analysis of these fluids during the past year by gas chromatography-mass spectrometry. URINE ANALYSIS: A. The Development of a "Metabolic" Profile Characteristic of Neonatal Tyrosenemia Using Combined Gas Chromatography~Mass Spectrometry. This work was carried out in collaboration with clinical colleagues from the Department of Pediatrics at Stanford University anda joint publication describing this research is in preparation. The study was based on a total of one hundred and four 24-hour urine samples from sixteen premature or small birthweight infants receiving treatment in the Stanford nursery. After exclusion of infants who becane ill, died, or left the nursery, we were able to follow nine infants closely for periods of between 4 and 6 weeks from day 3 of life. All nine infants had birthweights of below 1500g and three of these were below 1000g. - of the nine infants studied, five showed transient tyrosinemia as shown by a marked elevation in the urinary excretion of the tyrosine metabolites, p-hydroxyphenyllactic acid, p-hydroxyphenylpyruvic acid and p-hydroxyphenylacetic acid. There was also a less marked but distinct elevation in the urinary tyrosine output. Figures 1 and 2 show the metabolic profiles of the same infant {(J.L.) in the normal({a) and tyrosinemic(b) states. Figure 1 shows the free acid sutputs, chromatographed as the methyl ester-methyl ether derivatives and Figure 2 is an expression of the free amino acids of the sane urines, chromatographed as the N-trifluoroacetyl n-butyl ester derivatives. In each case the concentration of each metabolite is a function of the peak height as compared to the height of the internal standard. Table 1is a summary of the ranges of urinary output of tyrosine and metabolites observed for all the infants in the study. TABLE 1 Daily Excretion in mg/kg Tyrosine p-HPLactic p-HPPyruvic p-HPAcetic Normal 0.2 - 3 0 - 5 0 - 0.5 0.2 - 2 ryrosinemic 3 - 15 5 - 50 0.5 - 5 0.5 - 5 AS shown by Table 1 and Figure 1 neonatal tyrosinemia is characterized by a very large increase in the output of p-~hydroxyphenyllactic acid and by a 10-50 fold excess of the latter over p-hydroxyphenylpyruvic acid. Studies of the hereditary defects in tyrosine metabolism initially indicated that p-hydroxyphenylpyruvic acid was the major metabolite although more recently cases have been reported where p-hydroxyphenyllactic is in a 2-5 excess over p-hydroxyphenylpyruvic. These latter determinations were made using GC and GC/MS methods and therefore probably retlect the improved specificity of the analytical procedure (previously colormetric methods were used) rather than a difference of actual metabolic profile. Apart from the very large excess of p-hydroxyphenyllactic acid over its keto analog we could detect no significant differences between the profiles shown in neonatal tyrosinemia and those published for hereditary disease. Other metabolites such as p-hydroxymandelic acid, DOPA N-acetyltyrosine, which have previously been reported in tyrosinemic urine were not seen to be elevated. Be GC/MS Analysis of Urine from Children Suffering from Leukemia. This research was carried out with twenty 24-hour urine samples supplied by Drs. Jordan Wilbur and Tom Long of the Stanford Children's Hospital. The acidic fraction of all urines studied in this project showed no abnormal metabolites nor were gross amounts of known acids detected. The amino acid fraction, however, of six of the urine samples showed the presence of an non-protein amino acid, beta-aminoisobutyric acid {BAIB). In several of these instances the patients were excreting in excess of 1 gram of BAIB per day. The literature contains many references to increased BAIB excretion (genetic excretors, jead poisoning, pulmonary tuberculosis, march hemoglobinuria, thalassaemia and Down's Syndrome). The reported excretion of BAIB by leukemic patients was not substantiated by another investigator. There are several criticisms in the literature of the methods used for the quantitation of BAIB in biological fluids and in order to fill this void a sensitive, specific and rapid method for the quantitation of BAIB has been developed. (SEE: The Quantitation of bAIB in Urine by Mass Fragmentography; W.E. Pereira, Re Summons, WE. Reynolds, T.C. Rindfleisch and A.M. Duffield, in press). C. GC/MS Analysis of Urine from Patients Suffering fron Hodgkin's Disease. During this study 20 urine samples from patients with diagnosed Hodgkin's Disease (Department of Oncology, Stanford University Medical Center) were analyzed and in general, no abnormal metabolic profile could be found in any urine. There was one exception in which an individual was noted to excrete massive quantities of adipic acid (of the order of 1 gram per day). D. Detection of Metabolic Errors by GC/MS Analysis of Body Fluids. This project results from a collaborative effort between the Departments of Genetics and Pediatrics of the Stanford University Medical Center. To date over 50 samples have been analyzed; the majority {35) being 7 9S urine, while amniotic fluid (10), hlood (6), and cerebrospinal fluid (6) were also analyzed. It has been and will continue to be our practice to analyze aliquots of fluid samples in collaboration with clinical investigators obtained for valid diagnostic purposes completely divorced from this research on GC/"S analysis techniques. This investigation is not intended to serve as a screening program for a large population but rather to focus on those individuals who exhibit suggestive clinical manifestations such as psychomotor retardation and progressive neurologic disease as well as suggestive pedigrees. In the case of amniotic fluid the hope is to be able to monitor the condition of the fetus in those pregnancies which might be considered at rist. To date we have investigated specimens from normal pregnancies in order to establish the catalog of compounds to be observed in amniotic fluid. From this base it could prove possible to identify materials which might identify the health of the fetus. We have been able to confirm the presence of orotic acid in a urine from a person found to have orotic aciduria while another urine sample was used to demonstrate our ability to identify the characteristic metabolites present in isovaleric acidemia. The following description refers to a urine froma child with hypophosphatasia. A child died 33 hours after birth in Fresno, California, with the classical signs of hypophosphatasia. This genetic defect is marked by high phosphoethanolamine (PEA) concentrations in urine of affected honozygotes and unaffected heterozygotes. Atter derivatization (in this instance the TMS ethers of the water soluble carbohydrate fraction were prepared) we were able to detect by GC/MS large concentrations of ethanolamine and phosphoric acid but not PEA itself. The derivatization procedure we used most likely hydrolyzed PEA. We were able to quantitate for this compound in the infant's urine using an amino acid analyzer, and PEA excretion was extremely high {over 200 times normal values for infants) confirming the diagnosis. Next we examined urine samples from the child's parents, presumed heterozygotes, by GC/MS and by the amino acid analyzer. Again, no PEA was detected by the former method although the presence of ethanolamine and phosphoric acid was demonstrated. we determined the following excretion levels of PzA by amino acid analyzers Newborn infant: 94 micromoles per 100 ml. (Normal 0.21-0. 33) Father: 269 micromoles per 24 hours (normal 17-99) Mother: 32 micromoles per 24 hours (normal 17-99) It is of interest that in this family the affected infant and his unaffected father both show subnormal serum alkaline phosphatase activity. The mother, who did not excrete increased amounts of PEA, was found to have normal activity of this enzyme in her serum. The following table summarizes the serum phosphatase activity measurements: Newborn infant: 0.2 units*® (normal 2.8-6.7) Father: 0.7 units {normal 0.8-2.3) Nother: 3.4 units (normal 0.8-2. 3) (* - 1 unit is that phosphatase activity which will liberate 1 millimole of p-nitrophenol per hour per liter of serum) Sot 76 E. Drug Analysis Service Using GC/HS We were recently contacted by physicians to rapidly identify a drug self-administered by a patient in the Stanford University Hospital. From the mass spectrum the drug was identified as pentazocaine Within the hour. Although not part of the formal DENDRAL proposal we expect that Similar cases may arise in the future and we intend to respond positively to such requests. Development of Library Search Routines for Mass Spectrum Identification The analysis of a single body fluid fraction produces between 600 and 750 mass spectra. In order to cope with the interpretation of the daily production of mass specta (about 8 body fluid fractions for a total 2f between 4,800 and 6,000 mass spectra) we have begun the implementation of library search routines. Concurrent with the analysis of body fluids for metabolic content we have been recording the mass spectra of many reference compounds. This collection represents the beginning of the construction of a library of reference spectra. Late in 1973 we expect to receive from Dr. 5S. Markey, University of Colorado Medical Center, a more comprehensive library which he has collated from contribrtors (including our own laboratory) in the field of biological applications of gas chromatography/mass spectrometry. Prior to the demise of the ACME computer faciliity at Stanford University, we ran library search routines on data collected fron urine Fractions. Because of the ACME system being heavily loaded, our programs took about one minute per compound identification. However, the experience gained will be used to implement library Search routines on our current PDP-11 GC/MS data system. In addition we have sent mass spectra fron several urine analyses to Dre. 5- Grotch, Jet Propulsion Laboratory, Pasadena, California, in order that he ‘could use his library search routines on real data. In this instance the limiting factor for efficient compound identification was the library content which was limited to a few compounds of biological significance. In addition those compounds of interest that were present in the library were often in a derivatized forn ditferent from that used in our analytical methodology. Application of GC/HRMS to Body Pluid Analysis We reported in the Last annual report of the DENDRAL project that the Varian MAT 711 mass spectrometer was interfaced with a gas chromatograph for the recording of low resolution mass spectra. We have now used this system tor the recording of HRMS of gas chromatographic rractions from urine analyses. We were able to record HRMS scans over several gas chromatographic peaks of interest in a number ot urine fractions. The high resolution results were found to be of a high quality in mass measurement accuracy. When using the MAT 711 instrument for GC/HRMS the sensitivity of the ion source was a limiting factor in that less intense gas chromatographic peaks often lacked sufficient material to generate acceptable high resolution mass spectra. Notwithstanding this limitation the HRMS data recorded on different urine fractions was used to confirm the identification of several netabolites. If hy chance the metabolite of concern was available only in quantities insufficient for direct GC/HRMS, preparative GC would be used to concentrate the component of interest for subsequent HRMS. C7 77 RESOURTE OPERATION Over the term of this grant our mass Spectrometry laboratories have provided support to numerous research projects in addition to the DENDRAL computer program development project funded under this grant. These cover a variety of applications at Stanford, in the United States, and abroad. Included are problems in the study of human netabolites, biochemistry, and natural product chemistry. Samples have been run in collaboration with outside people both on the MAT-711 GC/High Resolution Mass Spectrometer system and the Finnigan 1015 GC/Low Resolution Quadrupole Mass Spectrometer system. The low resolution system has also been supported by a NASA research grant. The following tables summarize the support rendered in terms of humbers of Samples run through various types of analysis: I. MAT-711 High Resolution System (Period covered 11/71 - 6/73). Batch Batch GC/High GC/Low High Resol. Low Resol. Resol. kesol. KS MS MS MS DENDRAL program devel. 317 3 Stanford Genetics (Body fluid analysis) 39 7 13 Stanford Chemistry {non- DENDRAL - Dr. Djerassi's group) 91 112 50 Stanford Chemistry (non- DENDRAL - Drs. Vantamelen, Johnson, Mosher, Collman, Altman, Goldstein) 29 23 y Stanford Surgery {Dr. Fair) 8 Dr. Adlerkreutz (Finland) 10 De. Venien {France} 26 Dr. Gilbert, Mors, Baker {Brazil} 40 Qy Dr. Orazi (Argentina) 19 1 Dr. Subramanian (India) 10 5 Dre Khastgirc (India) 5 Dr. O'Sullivan (Ireland) 5 De. Badr (Libya) 30 Dr. Mital (India) 5 oO q & 624 samples IT) Note the samples run are specified by fluid type. extracted and derivatized as described in Part B (ii) Specific discussions of the may cepresent several GC/LRS analyses. 215 Samples 13 54 samples sanuples FINNIGAN 1015 Low Resolution System (period covered 8/72-8/73) Each fluid is and therefore results of various of the analyses run are discussed earlier in Part B(ii). Stanford Pediatrics (Drs. Cann, Sunshine and Johnson) Stanford Oncology (Dr. Rosenberg) Stanford Psychiatry - Genetics {Drs. Brodie and Cavalli-Sforza) Stanford Respiratory Medicine (Dr. Robin) Stanford Pharmacology (Dr. Kalman) Stanford Biochemistry (Dr. Stark) Stanford Children's Hospital (Drs. Wilbur and Long) uc San Francisco Medical School - Dermatology (Dr. Banda) Menlo Park V.A. Hospital (Dr. Forrest) Palo Alto V.A. Hollister and Green) Hospital {Drs. University of Puerto Rico School of Medicine (Dr. Garcia-Castro) GC/Low Resolution MS 24 13 urines Amniotic Fluids bloods cerebrospinal fluids urines cerebrospinal fluids urines bloods extracts extracts urines urines extracts extracts urines 243 samples 77 PART B PUBLICATIONS fhe following summarizes the publications resulting fron research in the low resolution mass spectrometry laboratory over the past year, including body fluid analysis. This laboratory has been jointly supported by NIH (DENDKAL) and NASA. ‘The listed publications include research relevant to both Sponsors. The Determination of Phenylalanine in Serum by Mass Fragmentography. Clinical Biochem., 6 (1973) By WE. Pereira, V.A. Bacon, Y. Hoyano, R. Summons and A.M. Duffield. The Simultaneous Quantitation of Ten Amino Acids in Soil Extracts by Mass Fragmentography Anal. Biochem., 55, 236 (1973) By. W.eE. Pereira, Y. Hoyano, W.E. Reynolds, R-~E. Summons and A.M. Duffield. An Analysis of Twelve Amino Acids in Biological Fluids by Mass Fragmentography. Anal. Chem., By RE. Summons, W.E- Pereira, WeE. Reynolds, T.C. Rindfleisch ahd A.M. Duffield. The Quantitation of B-Amino isobutyric Acid in Urine by Mass Fragmentography. Clin. Chim. Acta, in press By #.E. Pereira, K.E. Summons, aaE. Reynolds, T.-C. Rindfleisch and AM. Duffield. The Determination of Ethanol in Blood and Urine by Mass Fragmentography. Clin. Chim. Acta By W.E. Pereira, R.E. Summons, T.C. Rindfleisch and A.M. Duffield. A Study of the Electron Impact Pragmentation of Promazine Sulphoxide and Promazine using Specifically Deuterated Analogues. Austral. J- Cheme, 26, 325 (1573) By M.D. Solomon, R.~ Summons, l. Pereira and A.M. Duffield. Mass Spectrometry in Structural and Stereochemical Problems. CCXXXVII. Electron Impact Induced Hydrogen Losses and Migrations in Some Acomatic Amides Org. Mass Spectry., in press. By A.M. Duffield, G. DeMNartino and C. Djerassi. Spectrometrie de Masse. IX. Fragmentations Induites par Impact Electronique de Glycols- -En Serie Tetraline Bull Soc. Chim. France, 2105 (1973). Spectrometric de Masse VIII. Elimination d'can Induite par Impact Electronique dans Le Tetrahydro-1,2,3,4-Napthtal-ene-diol-1,2. Org. Mass Spectre., 7, 357 (1973). By P. Perros, J.P. Morizur, J. Kossanyi and A.M. Duffield. Shlorination Studies I. The Reaction of Aqueous Hy pochlorous Acid With Cytosine. Biochem. Biophys. Res. Commun., 48, 880 (1972) By W. Patton, V. Bacon, A.M. Duffield, B. Halpern, Y. Hoyano, We Pereira and J. Lederberg. “= /00 Chlorination Studies II. The Reaction of Aqueous Hypochlorous Acid with -Amino Acids and Dipeptides. Biochim. et Biophys. Acta, 313, 170 (1973). By. W.E. Pereira, Y. Hoyano, R. Summons, V.A. Bacon and A.M. Duffield. Chlorination Studies IV. The Reaction of Aqueous Hypochlorous Acid with Pyrimidine and Purine Bases. Biochem. Biophys. Res. Commun., 53, 1195 (1973). By Y- Hoyano, V. Bacon, KeE. Summons, W.E. Pereira, 8B. Halpern and A.Me Duffield. a /Of Part C. EXTENSION OF THE THEORY OF MASS SPECTROMETRY BY COMPUTER OBJECTIVES: Part C of the DENDRAL effort, termed Meta-DENDRAL, aims at providing theory formation help for chemists interested in the mass spectrometric behavior of new classes of compounds. Our goals are necessarily long-range because theory formation by computer is itself an exciting, unsolved problem in computer science. We have chosen to explore this problem in the context of mass spectrometry in order to make frontier computer research results available to working scientists. The problem of finding judymental rules for use in a computer program is common to many biomedical computing projects, such as medical diagnosis and therapy recommendation programs. In order to give these programs the knowledge that makes then perform at acceptable levels, a medical expert is often asked to Summarize his own knowledge of the prohlem area in rules that the program can use. The Meta-DENDRAL theory formation program is a paradigm for the kind of assistance that computers can give to the medical experts in this role. Prograns of this sort can, first of all, provide the expert with an interpreted Summary of a large collection of "hard" empirical data. Second, the program can suggest to the expert plausible rules that appear to explain major features of the data. Thus, the expert is able to assimilate large collections of data in the rules jyiven to the computer. We believe that the meta-DENDRAL work is a useful model on which fruitful work in other biomedical problems can be based. The over-all strategy of this research is to model the theory formation activity of scientists. #e start with a set of empirical data which are known molecular structures and their associated mass spectra. By exploring the possible mechanistic explanations of each nass spectrum, the program is able to find a set of mechanisms that appear to be characteristic for the class of molecules. These characteristic mechanisms constitute the general mass spectrometry rules for the class, or a first-level theory for the class. Further refineaents of the rules give more sophisticated restatements of the theory. we have designed the programs in such a way as to provide useful results from the intermediate steps. The progress section discusses several sets of results that have been obtained, even though the entire program has not yet been completed. PROGRESS: In the past ten months (since January, 1973) the theory formation programs have seen significant application and Significant new extensions. In addition, the work has been described in publications for both chemists and computer scientists. Applications of Existing Programs. The INTSUM program, for interpreting and summarizing the mass spectra of many known compounds of one class, was described in the previous annual report as essentially tinished. In this Last period we have used this program to help understand the mass spectrometry of several ay J00 classes of compounds, including estrogens, equilenins and other estrogenic steroids, androstanes, alkyl pregnanes, vinyl gquinazalones, amino acids and aromatic acids. An article written for mass spectroscopists and soon to appear in Tetrahedron {Smith, et.al, enclosed) describes this program and its usefulness in understanding the previously unreported mass spectrometry of the equilenins. The amino acid and aromatic acid results are useful for interpreting the mass spectra taken from those fractions of urine (see Part 8). The INISUM program is available to anyone who requests it, as stated in the article soon to appears Because of the complexity of the progran, we recommend that mass Spectroscopists use this program on a network computer after they have collected a number of mass spectra from a class of compounds whose fragmentation mechanisms they wish to investigate. Recent Extensions to Meta-DENDRAL. In this last period significant progress has been made on the theory formation programs that use the interpreted summary of the data provided by the INTSUM program. A simple rule formation program, described previously (HI7), finds the characteristic mass spectrometry mechanisms for a class of compounds, assuming that the compounds exhibit regular behavior aS a class. Recent work has removed the restriction that the compounds must behave as a class ~- important classes can be found by the progran within the set of given compounds. The procedure was described in a paper for the Third international Joint Conterence on Artificial Intelligence, which is enclosed. At the same time that the rule formation program looks for characteristic mechanisms, tne class separation procedure retines the class of molecules that appear to behave uniformly ({i.e., appear to exhibit most of the characteristic mechanisms). Another important extension of the theory formation program makes the rule descriptions more general and less specific to the class of compounds studied. The mechanisms in the rules are now described generally in terms of the kinds of bonds that break, and not in terms 29f the precise relations of the bonds to the skeletal structure common to the class. For example, a rule is now stated as "Any bond that is the second bond from a nitrogen atom is likely to break", rather than “In the skeleton R1-C2-N3-C4-R5S the bond between atoms 1 and 2 and the bond between atoms 4 and 5 are both likely to break", These general descriptions will allow much more freedom in the kinds of interpretations that can be placed on the INTSUM results. It is possible, tor example, to alter the set of predicates used to describe bonds without altering the program. The program can be conceptualized as a search program through the space of possible combinations of predicates. Some predicates describe the type of bond (e.g., 'sSingle'), others describe the atoms joined by the bond {e.ge, ‘nitrogen', 'secondary'), and others describe the bonds and atoms next away from the bond that breaks. Some a priori heuristics limit consideration of complex predicates to chemically meaningful combinations, for example, by forbidding consideration of a single atom as both carbon and nitrogen. Other heuristics guide the process of expansion by forbidding a new predicate to be added to a description if ts addition reduces the explanatory power of the existing description. For example, if a high average intensity is associated with breaking the 4 Jos X-X bond in ¥-X-" and further specification of either of the X's reduces the average intensity, then the description is not changed. In addition to the work just mentioned, a generative model of rule formation has been pursued by Carl Farrell in his dissertation work directed by Professor Feigenbaum and Dr. Buchanan. He has written a program which accepts, as input, descriptions of specific molecules and all the primitive actions that might explain the mass spectra of those molecules. The output of the program is a set of general Situation-action rules that describe classes of molecules that seem to be characteristically show evidence of significant actions. PLANS In the following period we plan to increase the performance capabilities of the theory formation program in several ways. 1. Sample Selection. The program's current strategy is to find the rules exhibited by most or all of the molecules in the initial sample. If the molecules are diverse, the rules will be diverse. Thus, we plan to adda preprocessor that can select a Nsimnle" set of molecules for the rule Formation to work with. For example, unbranched (straight-chain) compounds should be expected to present fewer complications for initial theory formation than highly branched compounds. The eftects of the complicating features can be studied after the simple rules have been found. 2. Rule Clarification. After simple rules have been found, we want the program to clarify the conditions under which the rules hold. By studying nore complicated molecules, the program can find the simple rules that no longer hold for these cases. For example, we want the program to discover that terminal alpna carbons (as in CH3-X-N) are special. Or, the program should discover the effects of double bonds by examining new cases even though the molecules in the original set contained no double bonds. 3. Experimentation. Because the original set of molecules contains the simpler examples from which it is easier to find characteristic mechanisms, the progran will need to clarify rules in the way suggested under (2). For a human scientist, this means describing new experiments to perform that will help place limits on the range of applicability of the cules. Looking at additional arbitrary molecules may be helpful, but not as helpful as looking at the specific molecules that will resolve specific questions about the preliminary rule set. 4. Integration of Results. When the program has examined two or more classes of molecules, it should be able to integrate the results into a common set of mechanisms {if any are common). The set of predicates used by the integration program may not have to be wider than the set used by the rule formation progran, but one would expect the rules themselves to be more general. For example, integrating aliphatic amine and ether results should combine the separate alpha-cleavage rules (one with nitrogen, one bo Jo¥ with Oxygen) into a more ganeral rule (specifying ‘N or O*%, oF theteroatom'). PART C REPERENCES (Published or subnitted during this year) D-H. Smith, B.G. Buchanan, W.C. White, E.A. Feigenbaum, C. Djerassi and J. Lederberg, "Applications of Artificial Intelligence for Chemical Inference Xe INTSUM. A Data Interpretation Program as Applied to the Collected Mass Spectra of Estrogenic Steroids". Tetrahedron. In press. B.G. Buchanan and N.S. Sridharan, "Analysis of Behavior of Chemical Molecules: Rule Formation on Non-Homogeneous Classes of Objects". In proceedings of the Third International Joint Conference on Artificial Intelligence, Stanford University (August, 1973). (Also Stanford Artificial Intelligence Project Meno No. 215.) Related Publications D. Michie and B.G. Buchanan, "Current Status of the Heuristic DENDRAL Program for Applying Artificial Intelligence to the Interpretation of Mass Spectra". August, 1973. E.H. Shortliffe, S.G. Axline, B.G. puchanan, T.C. Merrigan and S.N. Cohen, "An Artificial Intelligence Program to Advise Physicians Regarding Antimicrobial Therapy". Computers & Biomedical Research. In Press. HUMAN SUBJECTS JOb HUMAN SUBJECTS As a part of this research project, GC/MS analysis techniques will be applied to human body fluids in collaboration with clinical investigators, and blood and urine specimens will be collected from human subjects. Collection of VOIDED URINE SPECIMENS presents no risk to the patient. Collection of blood samples will not be taken solely for the purpose of this research but rather would be collected as part of a diagnostic procedure deemed necessary for clinical diagnosis. The undersigned agrees to accept responsibility for the scientific and technical conduct of the project and for provision of required progress reports if a grant is awarded as the result of this application. poi. L tun Car ¥Djerassi/ Principal Investigator (ok APPENDIX A FIGURES 1-3 {07 Ta.ce 4. c a = wus ~ vloh mass diffecerces (a) S50 mma (9.08 am,. E c t N 0 rmu Oo 0 0 0 6.0- 0 ‘ L -L Cathe os 0 l 2 —2 LG. 0 -5 4 -3 ~35.h 1 m2 -3 2 = .6 au G =o i who. os 1 2 ~ 3 Woe? ae i ‘ o -1 35.6 1 -t 2 -2 -uh.7 1 -€ 3 -5 ~21.3 1 -) 4 <4 2.1 -- 2 c a4 2 22.5 o. o e -3 L O09 -— . L, -e 0 Oa oe 2 é -—_ mi. LT 2 <é - -2 =e 2 a 2 -3 “Jew -- o =: 3 - Lu 3 os . - 4 x 13.4 oe Ss é - 3 3 36.5 z ~= -_ -1 -43.7 = me C -2 -20.3 oe mr L -3 3.4 -~ . 2 —4 26.5 _ c -4 Cc 48.7 — ai ~2 -.. -31.6 “ 2 ~o -2 -~9,2 ~“- 4 C o -3 15. oe ct * iisned tables will consider various ra.iges of cons.der isotopic nuclides, and will be sortec by e as wel. ¢ fin Se LET. (Gi 4/0 cate nab etek . : — Pot gst eee eh a tee ENCINO etait Sale et ants inser pie, dae Rca te asi sat ale Meh. FIGURE 1 bmi Sat si it Vin en fino reios |: MM + 270. ak. [| CHRINE -extiad A Repti iat Ry RN Shae ee a eam FITE DPHO1594 UNCEFINED THRESHOLD= 4, SAMP RATE=10800 , (15.6 SECS, NUMBER CF PEAKS POUND=170 38 CALIE MASSES WITH LAST CNE=455.0, FIGURE 2 USING REP FILE MPRO15S64 SCURCE IS urine-fractionzaminoacids (Gehrke technique CEE ID IS GVP-17H MASS OF MCL ICN RUN CN MAT 711 EXP DOWN SCAR PEAKS REJECTED MISSED CALIBRATION MASSES: 281.0 HESEEKRAEE ATER FERS 28.0 9.5 31.0 81.0 0.3 93.0 131.0 -0.2 143.0 181.0 2-4 193.0 231.0 4.10 0 243.0 305.0 0.4 319.0 367.0 0.6 381.0 431.0 2.9 443.0 TABLE OF 83.6 0.3 0.1 -2.3 -4.4 -2.0 1.1 -&.0 DATE RUN 730313 107 FOR AREA, 44.0 100.0 155.0 205.0 255.0 331.0 393.0 455.0 MIN WIDTH= 2, MIN AREA= 1 1.49 DECS), TOEC=10.5 ARRAY SPACE USED( 1750/ 80C00)= 21.9% 6 MASSES ABCVE NOT FOUND 2.8 UID os ws OO aw ON OD OU at AD 6.6 MASS PILE MPHO1594 HAS BEEN CREATED WITH 84 3.7 VOLTS AND REF BASE PEAK IS 174.5 YCLTS SAMPLE BASE PEAK IS MATCHING TCLERANCE= 4.COQ0O MMU FRCF MASS MATCHING C 15 H 20 0 5 N FASS 41.03931 §3.05388 55.01820 55.05510 56.06277 57.07068 64.01433 68.96245 68.97505 69.03101 70.03085 75.00519 83.01006 84.04382 85.02852 AREA 58.0 5.4 3.5 12.0 33.9 72.8 6.0 35.3 4 F EMU ERR 0. 183 -0.899 1.153 ~0. 189 0. 321 0. 180 0.248 -1.721 1. 863 -1.698 2.907 1.564 -1. 682 -3.025 0.560 ~0.585 -1.911 ~3.253 -0.814 0.230 -1. 113 0.914 -0. 429 L 6 CR 8 Q FOR WIDTH 51.0 112.0 162.0 212.0 269.0 343.0 405.0 24.9 -0.5 0.1 -0.2 -0.9 “2.2 4.4 MASSES FOUND 40 TO 700 CCMHPCSITICN c 3H 5 c 3H 7 c tH 128 c 38 390 c 48 7 c AH 8 c 4H 9 c 14th 4cC C 2H 2 FL c 2H 3B c 4H SC c tH 245 c 38 46 H 4 N 2 FL H 10 248 c 25H 360 c 38 1 FL fH 20 1 FL c 248 %1dCc c YH 3 O C 2H 2 FL c 2H 4B c 48 6C c 24H 3 C c 4H 560 Ww le Nw ot & N a a Sew Ww mW te oe 69.0 119.0 169.0 219.0 293.0 355.0 417.0 PROJECTICN ERRCRS 4948994454 944400 042 -0. 3 -0.2 -0.9 ~0.9 1.5 2.6 -1.7 DK enn w wot on] on] m Zl tee Oke Bey MWFNeEN & “a mOmOO08 SITE Nerne Tao nme a NON TMOAN VVUVVUOYN 2.011 -0. 833 2.170 -3.513 3.896 2.751 -1.073 9.5 96.00504 134.96208 - N - e ~ + reem ee ~~ + - GN ned — bal ved wt el HARA a = a By we om Oy o> wn Ra 4 NM By Bu i Pa fea fe he OF) Pa te NO Be oP fey meen HNN NON m Marware 37 te TENE Nae ste - = mam mM tt wa wa ‘ ot md Ce rd et ey a alle! zm Bie Mer eM Me he ee BEAR BBB Be hy hy BS te i --e Mew IrTOMANN TUNA TEPEANANMNM TK ENN AMN NTE eK TNM EH EK NN eEM Oa PSV SOOSHMOOCOVVOZBOOUBZVOMOVGOOBZBVYVOVORZROIRAOBOVOVIO AY re NEA NAN & NMOMMNTONMNANATMNTTOAWM ESP MMNCTONR AMM KOMNe tTOMNIR ond AGMOBROMHMO RK ORSON RU DORN R RMN MMM tOMmnm om mm ems WONOENENMNE TOP ME NTE RK FONE MOTE NNER NTAMNHFNDKNDWOMON - - VPYEYVYOVYUBRVSVYVYVYVYVYYVYVYVYNVVYVYVYVOUUUYVVOUUULUURLUUUUUUUUYU AMFMDOFBDONEMNDODEFONMNAROMNANKE RK DE MNANAMLODTNMRK KN AYVDWAGAMMAAMOKMMONADODONNANNOHOCTMDADNOCONNOANAERNRAMTKeE OMNTKMAMNANWONNMKFNODEMNAHAKEHROMAKRFE EDEN DOO ANE NOANT EMCO Satta nt tt tf Ff ee 8 ke ee tk et ee tl ht he th hl hl hl hl hl hl hl hl hl hl hl hl hl hl kl klk kK NONMMNEKONN SEE NNNOONNOFOOMME NOE EMF RF OOONKeKMeKoome bit tt ft ts 14 re ee ee { t- | wo N ~ n t e ® * e + w “~ wo za © 137.00035 139.02548 140.03549 152.03288 153.03532 143 Cream os m N m => Nm ~ ™ ~* - N N RAMA RH = od - A 9 nd — wl wd) -l wat oat ~m fa Bey Bey Gy Ry be N mm os MeN MM es - Pea Be 1) \o RNIN fa N mM Reem BONN fe - TeEANN & nN ~m N ™N ~ NUN tf 2 - ry ~ ~ - 3 ~ Aw = land m3 ad ae ~ el ot m9 ae! a aa he > = 5 ON Oe Be WO = BI Pee Ei Oey Bee a ee ND ee OO a a Oks be EG ey fey a FarNnruMnnnwarm N arNm ~m Mer szrnannm N FwWN TOrer ses TrrFaIwMONNner Mn stae ol od A wnt “a 1 = MmMBEOMOOQDVOVCVOHAMOOAMOOMOOOZVOVHESHOOUZNVDOVAVOBZBOBVVIVOCV BZOVO0 BY TMAMATESEMEANAMANM FEE AN - TENE NSC MEM SK ON N TTOMMOUON AMM OFS TOW” Mw wd — a BSmonmoetnrnmrenmrnmnoonmtntommes re OBMmMOmmRmMOOCORrRROBRAUMMUA MTOM MmMBmMenmRuoumsa MOMFON SAKE AKNMMONE AN HME MMNONM EN TMOEON STEN KE TODNMOEF HOE MAKAH st OMn - -~ - ™ - - Saad - VOYVUVYEYMYVYNAIUVVYVVVIVOVYUYVYGYVOSOVVYYVVVYVVYVUVYVVOVUYOYVOVOVVOVIUVVUBVIVIIVVIVSGVVV EY TORAMNAHNONMOMNOMNADTOOTOKDIFMOTDEKEMEKAMVYKFOMNAMNNENDAMANK AMR K DOMMo DOOM AFMOAMOANMN OM EDOOMUARONANDOSTAANTARAKRDNOANAMOORMAKRENANDINA TION KEE MANTNARKHODNDNRDITIMONTRONDON FNODENNAIRMOVHNE AK DHROOOR e ¢@ @ @ @ @ @ © ¢ e 8WhUhehU hUhHhUhhUh MhUhMhUrhUh hUhUh hU hUCU hmhUr hmhUchmhUlCUr rhmhlUc rhrmlmUlUr hmlmlmlUmr krmhUlUc rhmlUm lc rmUlU lmUmlUc hlhlUr hlhUchmlhlUlU lm hlUlUlhlU mlm ekhlUm aml lel kl el tll ele ek lek le 8 MNMNONEFKEANAMMEKNAMMMONDMANDKEK HENDON ORKEENMEAENENMNOMMANMOFEFNOOe I t te | 4 44 i} ette tet terest ( ‘i '- id wn m sr wn ro) wn e , e tJ s ® ” .o - m ~ N - ~ - N é ° ° 9° N on o oO a ~ an o oO oO a 2) ~ eo «© oa © “ on °o oO a on oO ° e t * 8 4 e a m =z a o ~ oO é S ~ ~ ~ ~ - - ~ - - mn - re NN ” a wa J od ed al wm ag FP fay the bee Pee nr m =o ware wa oe = i by RAW Sm MN aTaMTWMNWaAAMM ST zs wi oooooqoaooomeo WDEMEANN EMEA BmmmmmMonmnn awe SFMMrAN ARP Aso OnT - - VYUVUNVIVVIVVVOIOY CKTANMDOM TW DOIMEW DOerrorerwWnanna DMFeKrARKKFOVWeRrANDM ees e © © @ @ @¢ @ @ COMeKDWRKFaAnernen t ' tete t wo e re ao @o n a n ' 2 an - 3.2 196.95901 - NNM - N - wr - N > I wl et Oe wa a w= al wa et oT fee ee ONO Pe Py be te = eAMOH me rrN Ss Py I) NO or iy om fy ON Pu Pa ate ~ > TeEMNMAN mn oy fo is] ™ N m m-aN = ae] wat ne | mI | a wd - Bem RGM AG ew a Pa NG Pe Ba Bw Ra Pn We fe ST OG Pe hy Ore Zk Pea ee he Gm YF STM TENN CFAMNAMNOMMS TMAH N WmmMan ANH FAM AM ran a. 4 QUmMAMODADVAZOVVVOMOOA TODUMINEDTORNEFEOONEHDTOTNT EMME LKATHK ANAM NN 3} BRR SSN UoMUBoGM HMO MMMBOOTBOOCODMORAOMTARGANnTOE NM OCDONMORDNKORNEOAMTODONTOKMOAHDKPOPAIHROTMOOAMSTYA ~ me - ~ - - - - - - - _ VYUVYUVUVYEUVYYVYEVYEVYYYVYUVYUVYVYVVYVYVYVYVYGYVYVUOUYYY CATDNVOMODOAMMNBDANOMOMRDAIMASEMAWDFOTSCMOODVTrAMOH = DODNODANDNAODMMEMNDANOMOAADAMMDAMNMBDTTMODINAME & OC DADINBQOFMAKSINMVOFONADEOMEKKQDAOCDOMKNFNVOTA™ eeet eer eee ee © #8 eee ee le ee eee 89 © 8 © @ © 8 ee 8 8 NENAEMMKEOMONKFKENANANNOKANOMMMOENENEEHANME OMA i i ' ttt 4 i ‘4 t it | 1.4 1 ‘ { . © ~ é ~ a e ‘ ® ’ e Oo a“ ~ i) =z QQ N - - wy wo < - @ oO an \o e- N ~ ~ ~ oO wo ” “ © oO an oO © an n an e e , 6 e mw nN Oo © é™ n an n Qo - - - - N N WS - ” 7 —N wn wo - ~m w OreNnN am rN MmOmnsars -~ or w= ra 7 eI ed 4 a J a A we ee A 3 wl I od ed ted do} wl 4 mi ™N Wa Oh) ee SP oP Pe fe oF mw) Ry wo meer NN Be ot ww TP ha Re fee te ON By Ry ha Hy We OND Pe te fy ie iy Wag Ov) mus Nm N N NN N N N ~ N a arerre arr ageste TON TRAN N a ~ od nd a = 3 HAH al — wd — ood pad - aH he a Dy Re fey Py me Pee WmN Mer srMOMaAnN rtuON MePFONNWMAMM ae rN AFNTFTNOMKNANANMAMAONAMAMN TK eM TWN O~ Ay . SOANVOVZOBZVIORZTFOVOKZROVOZOOZHOUBZOVOVI0OZOOUVG0L0UBZOVO0U000EMVO NN rer NEN MEN N TOTNES CSMN EME KK KNONFMMMFTOOMNTOFMNRMODeEAN em al 9 weal RBROKGBAMROTOTRONTSUMNOOHMRORMOOMRMARUIM RIM GK GBMME BRK om om Geom nm ANRSUISNSTHRANKDOLRATHAETHOMNBDONOKMDFTROKMDOMNERRMDAOMNEFONTAKDOMNRAN — = ~- - - ~ - - - - ere — - ee - - re - PYYVYVYYVYVYVYVYYVYVOYVVYYVGYVYVYUVYVYVYVYVVOULVUYUVUYVULVUOUOUYVUUUUUUUUUUUUU DAN OOTSTOMOMACDINATKKNOANENORNANTRNNCDCDONMSADORSTNHNKNONKRON ANDO STODHNHODNTFOSMOANCM MK BMNN ADK EFONORANK KR KOR ODOMEMADANARRK ORY CDF UIMOT FBIM DENN TNNOTNOMNOINERNDODOMNRKENMANORPHAWODTKVOONRWAOM ®* ¢@ 8 @ «© © @ @ @ 6 8@hUmehUYMhU Lr ee e ee @ @© @ @ @ e-e ¢ © @ @ @ @ @ ¢@ e 68 e @ @ @ e 68 OOM IANZ ME NOOEKMNOMOMNKNAMMMONEMNANMNEK RP ONDOONNK KK oNnoooeenonm tq 1 ' ' 4 ot t tao (iret ‘ { t ‘ t4 i i a | { 1 o7 a N ~N -” e v e % @ é ° oO ° nm = - 219.99133 223.98842 225.02393 226.03322 236.99301 116 wsronroOR Nae PNONOOWAOO TruoNOrPoOAYN eee © © © @ oe OMenenamm ! it tit © ° wy So So ~ a On ’ ~m vr N ~0.518 3 - Fey) fa N Nm ~~ Bh ee Ore azy 2.359 -1. 663 1.214 -1.586 (5.3 245.00569 a ae A - m™ MN mm Pa Ry nm ™ —-m™ na ~3 — ‘ = Gh ST wa me py 3. 144 1.999 0. 656 0. 854 ~1. 825 -3. 168 -0. 291 -1.634 2.766 1.621 -2.401 -2. 203 3. 546 ~3.348 4.2 247.98857 J a ot Beer NN Be ™ uD SO N ~m ™N NN wd ed ny 3 Sie Brew ME Be ES ANM OM =o —TPTONHY YM + + 0.039 2.916 2.087 -3.232 4.7 248.99155 wn] FNMATNONONNM STH se NNMeE A 3.032 0. 353 3. 230 ~0.792 -1.937 -3. 082 -0. 052 1.395 -1. 197 39.5 254.10146 3. 533 2. 190 | AMIN M sr sin CSwNarmMoamon ornmnrorare MOWwN MR KN © @¢ © © @ @ o@ 6 Neerwewoorm ' t bo x ® m e a7 mn © ~- e w wn N Mh? TANME ens www et ed wt 2 Ht — od poo pe pd Bea Bee thy fq fy TD fy fu ty Re Ps Be ie t2seor NN TAN =| re) BBm BB fu ws Mami ame MKNANMAMANTAHOWN wo a OOO COZO4 BOO O0 4ZTIVWVONV WMO STOATMOMMHMEeAN AN 7 en reer ere = — RMA mnmUmmrOCmtATOmM Om hE OM DOMNNSMANAr SHAN s ~e r-< wee VvueVvvVvvVV90VUNU9VVVV0U NDMMOMODNMA AAwMOUMaAsT ONDDAMMMOMEPDWDOYT AMD OrmMMNnNTrFDOemMmMsrreranonr fees © © © © © ¢ 6 6 8 Oe ht PANN FET KR MASFTNAMEN MOE { t bid t it =a e + - ~~ Qo n n ® wm wn “N m N ~ wo - m w we ~- m w - -~ 4 = al zA md a. a wa a wi a al J N wo rem me Be OO MN Ge oe be OD fe me em ar ot Py i wr N ™N ™N N ~ N - - ~ N ~ ram a * ond Sl 4 — bout wa 2 7 wa wad a oF rd pd a * wed fq 23 Per WO Pee Pe ey a Rea GP Py ley fy Oy 2G Pee Rey Ee fey LO ly * NWN NIFMON MrMNAFNTRPATAMONIMMNNM OM gs sin ewmarne * med wl ROUAMAOAOBEWNIVCOVARVUBOVAVUMRMOUOROVSOVOHMH OOOO TEM SeMenanywrn rT WOME MEKANN TPT eernnvenMmMroannateo r37rere 3 - OmooQoOROMonnmemoonronwmonmoonxronrnronmnnmownm no a 2 TOOK TTT OANTOTVODNSTOMN DST TANAITATOeN Hen aAwy - rere -- ~~ ee rr - - Pm Te oe & n VYYVYYVOYVOYVYVYVYVOVYYYYYVYVYUYOUYIGVYUUYVYNYVYURYOYY = oO 4 e c OG OWN KEK MNDWOE DAMAHAKAMDEVAEKOAAFMONTHEANDAOAY™ O&O OAPFHANEFOKOEPFONNUOAAHDMNATASKEK MW AKA CHE QAUN KHNOMOOCDVWAANCMEFKHDENEMODMNINMVN TE ODNMOTFEMAM ARRAS @ eeesvee @ eevee eeers ef © @ ® © © &© © ew ee hl hell SNOMNONOMON SEEK NANMMANN EE AMMAN EEKANMOUMMNA OS to t it t ' i iti Le | ih . - a . xg "” nN al v x ws ~ GN é é e 8 e e e * arm an + o * => - ~ - * * % * * * ann - ‘wo Q etm QO” N wo o mC rte - a ie) * =a as @® n o @ * © aan oO Nn ao * © e 8 e ° e e Or N nan me nm ww ie) ve] ~ * © AN N N N tN ANAM OH ~allaesasnam wz wo -m wy w ™N © * ox w = / ee a od WAHIAWA wi a md nd a a wd a mer al md Diy fee tee fy Re oP Ra ha Hea] ee iy Ble ey by be fe OP fe Hy DB Be me oP oT Be em te Pa WOON Bet fy WO TFTRUNWM ww aOsrerae ANT CIN TENAIAN AN N mN ~m en N N NN ~ "N oa) = nl AR A a 3 wa a aa = ret eT ny ond 4 WaAmMansreNname i { vovouav0K09 MeENnN te rN TW me ie AGB MAS Be ee hy BO Wa fa Oy ee 5G fe KO ey ee Bee Ug fe Pe VTTMNAMMIAMNMINN MwwnNnmOarsa WM NNWON ANN TWN al a QPOTO CNS CMT LST DAC STK NWSE MEAT EOHMN CSE KONEONANN ~~ ee = | eYnmreenaam J OOVDVVO OC K4OQOOVAVMADVIOABAOVOHOKOAZMAMOORVVCHROM ARORA BAGS ES WTrMOMNIFMen wa Bean BsnNRM aon cmITC MRM Be RBMsomMmoOorNnomtmmonmanamonmontrooonmnmomrmomoonmoo00omae oe wre er ele r vevvvevvnly FINDONFDHONOMEAAFNRM MR DAKKHNMDOWODEMNMOTDOKMK KOVWODEANMATNONVAAKRE INDE PEANEANTFNOHDATNNOFOVU TORE TEE ANDO ORMAMNADFAAAMAGANAM HK OM MOO FODSAIFDADEDONKEFNTMNDBDEMONDALEANTOMAMNOMN TONE DER EME NeE AMO Pr a re rr ee ee ee ee ee ee ee ee ee NOK KH NFKMMNOMBKDNKONMOMNENAMMNDONSEMEMNMEOMAKTKENDOLFPENMENNDORKOM tede ‘ io yee (#t (ttdie? t ( J { t 4 t tt ot-l ' - ” m vs) é oS) ao - e e 4 e a e e e - wn ov a st ” az. 7 ~- =~ a é N ° ) © o s ~” oO o . “ - 2 oO © é 10 wn oO wn m © a a oO rs) @ C = @ o a a an a on n ao ® 6 * * * e s ° N “ w ° ” ~ - N oe © 2 oO oO a - N N “N N “ N mm ~~ ee ew Ce ET - re FO oe ee & ~~ ee pF EF oF DU LU UL UU UUUUUULUUUUUUUUUU UU UU UUUUUUUUUUUUUUU YE REM Lr cere prereset wt ‘ Toe deta de nae fe co Lovato lia, A A A Os 4 8 3 »~ & , 4 . v l li : lll ul dda bu al. { ' lla rrrtettttbiepete il | ert licerrreertretets |. rrerprrstererterenep terrier mt | aaa } “yo! Te | “ | eT | : | T P } | 7 i “| t | “| “pe | “| | Ty : | epee t mye [ |: | 1 | nee + "| qT my t { | =] t ‘| re] “} 40 68 88 188 120 140 160 180 200 220 249 260 Gr ay \ - ay rs 4 “nN a. 43 you we + . Cov q FIGURE 3. ov "J A . GC/LRMS (FINNIGAN 1015 QUADRUPOLE MASS SPECTROMETER). WV O-n-BUTYL ESTER DERIVATIVE, ‘ LA 5 SAMPLE: GLUTAMIC ACID N-TFA O-n- . 4, oloseitias “ io Nn fre / - wt, jt Pre epee 288 388 320 348 360 3820 409 420 4408 468 480 Sage GUP L7H ANNIOTIC FLUID D408 FILE 344 HG APPENDIX B LETTERS OF INTEREST JRO STANFORD UNIVERSITY STANFORD, CALIFORNIA 94305 DEPARTMENT OF CHEMISTRY December 17, 1973 Professor Carl Djerassi Department of Chemistry Stanford University Stanford, California 94305 Dear Carl: I am writing to indicate the anticipated use of mass spectral facilities by my research group in the fornseeable future. As has been true in the past, we plan to utilize both GC/HRMS and simple HRMS for various purposes, especially 1) the determination of structure of enzymic cyclization products, including members of the lanosterol class, derived from squalene oxide-like substrates, the purpose being the elucidation of the mechanism of enzymic steroid synthesis, and 2) the characterization and confirmation of structures of intermediates in the synthesis of natural products, including polycyclic terpenoids, alkaloids of physiological interest, and nucleosides,and 3) identifi- cation and/or structure determination of organic materials employed in our organic-inorganic program devoted to nitrogen fixation and related processes. Very truly yours, Gin E. E. van Tamelen Professor of Chemistry EEvT/j1b /Ry OFFICE MEMORANDUM e STANFORD UNIVERSITY «© OFFICE MEMORANDUM Date! December 3, 73 T : ° Carl Djerassi From : . Keith Hodgson SuBsecT: Response to inquiry about GC/HRMS facility In response to your three questions concerning the potential use of upgraded GC/HRMS facilities: 1. Yes, especially in the study of certain biological ligands and lower molecular weight ligand-metal complexes. 2. Potential use of the facility might run in the range of 8-10 samples per year most of which probably would be handled most easily by simple HRMS. 3. No, no research is cur for the next 6 mon (at least Cd by NIH. WNGNVYOW3W 3Dd1ddO © ALISUZAINN GUOINVLS = WNONVYOWIW 3914d0 © ALISUZAINN GYOINVIS PRR OFFICE MEMORANDUM e@ STANFORD UNIVERSITY #© OFFICE MEMORANDUM e STANFORD UNIVERSITY # OFFICE MEMORANDUM Date: December 3, 1973 To: Professor Carl Djerassi FROM : James P, Collman Professor of Chemistry SUBJECT: Please excuse our belated response to your inquiry of November 20 concerning a potential upgrading of mass spectrometry facilities. The service you mention in your memo of the 20th would be valuable to us. We would have significant use for the GC/HRMS for a project dealing with models for cytochrome P45, based monooxygenases currently supported by the NIH. secs Fine CAL WNGONVYOWIW 3D1ddO © ALISUZAINN GYOSNVIS © WNONVYOWIW Jd1ddO © ALISUTAINN GYOSNVIS © WNONVYOWSW 43IdgO © ALISHZAINN GYOSNVLS © SRS OFFICE MEMORANDUM e STANFORD UNIVERSITY @ OFFICE MEMORANDUM ¢ STANFORD UNIVERSITY @ OFFICE MEMORANDUM To From SuBJEct: Date: December 13, 1973 Professor Djerassi Professor Harry S. Mosher Your proposal to the NIH On our NIH Grant on the investigation of animal toxins we have been studying natural products from the skin of Central American frogs (atelopidtoxin) and some products from marine animals (nudibranchs) as well as some new chaline esters isolated from the hypobranchial gland of various sea snails. Some, if not all, of these are mixtures. Obviously the new capabilities of the mass spectrometry laboratory would be of value to me. I expect only occasional use of HRMS an. GC/HRMS, but on these occasions these techniques would be very important. Mae So Snnl— /2Y WNGNVYOWEW 351ddO © ALISUZTAINA QYOINVLS «© WNONVEOW IW JFdIddO © ALISUZAINN GYOINVIS *¢ WNONVYOW3W F5HdO © ALISHFAINN QYOINVIS © OFFICE MEMORANDUM e STANFORD UNIVERSITY © OFFICE MEMORANDUM ¢ STANFORD UNIVERSITY @ OFFICE MEMORANDUM To From SuBJect: Date: 14 December 1973 C. Djerassi W. S. Johnson The contemplated new facility for high resolution mass spectrometry and combined gas chromatography/high resolution mass spectrometry would be of extreme value to our research program concerning the non-enzymic biogenetic-like cyclization of polyolefines, a project which is presently supported by NIH Grant AM 3787-14. If this facility were to become available, we would expect to use it extensively in the analysis of product mixtures of the aforementioned cyclizations. We estimate that our need for the gas chromatographic capability would be about 20% of the total need for the mass spectrometry service. Ww. & ohne JAS WNONVYOWRW 3D1ddO © ALISYTAINM GYOANVIS © WNONVYOWIW JDIdJO © ALISYUJAINN AYOANVWIS ©®© WNAONVYOWSW 3D1ddO © ALISYZAINN GYOINVIS © STANFORD UNIVERSITY MEDICAL CENTER STANFORD, CALIFORNIA 94305 « (415) 321-1200 Exr. 5785 STANFORD UNIVERSITY SCHOOL OF MEDICINE Department of Anesthesia November 30, 1973 Professor Joshua Lederberg Department of Genetics School of Medicine Stanford University Stanford, California 94305 Dear Dr. Lederberg: Thank you for including my laboratories in the group which could be served by a GC/HRMS facility. As you know,.- Dr. Cohen and I have our own GC/MS/ Computer System. Our use of the pro- posed facility would be limited to those times when it is necessary to use high resolution ot identify a metabolite. I would estimate a need for three GC/HRMS and three HRMS Spectra per year. My work is entirely supported by the National Institutes of Health. Sincerely_yours ly Qt} James R. Trudell, Ph.D. JRT: rw Jab HOV 27 We VETERANS ADMINISTRATION HosPITAL 3801 MIRANDA AVENUE PALO ALTO, CALIFORNIA 94304 IN REPLY REFER TO: November 26, 1973 Professor J. Lederberg Department of Genetics Stanford University School of Medicine Stanford, California 94305 Dear Prof. Lederberg: Dre Allan Duffield of your department has informed me that you plan to obtain additional apparatus that would provide high resolution GC/MS as a service to the Stanford community. We have in the past used the hospitality of your department in the identification of metabolites and derivatives of phenothiazine drugs and cannabinoids by GC/MS. Originally, we had the collaboration of Dr. B. Halpern and more recently Dr. A. Duffield, who was instrumental in helping us with some of our problems. - Our department would indeed be most interested in availing ourselves of GC/MS analyses in the course of our current NIH projects which again are concerned essentially with drug metabolism and the isolation and characterization of unknown drug derivatives. As a rough estimate, I would think that we may-be interested in the analyses of about five samples per months two of which will require high resolution MS. . I certainly hope that your project to acquire the sophisticated new instrumentation you are seeking will be successful. Sincerely yours, Irene S. Forrest, Ph.D. Chief, Biochem. Research Lab. (151F) ISF: jr Show veteran's full name, VA file number, and social security number on alt correspondence. L. ‘2 P OFFICE MEMORANDUM e¢ STANFORD UNIVERSITY © OFFICE MEMORANDUM @ STANFORD UNIVERSITY @e OFFICE MEMORANOUM DEC 3 4973 Date: November 30; 1973 To Joshua Lederberg From: «f. Rabinowitz, Ph.D. D.I. Wilkinson, Ph.D. Suosect: RE: NIH GC/HRMS Proposal Research carried out in this department has strongly implicated a role for the prostaglandins in the etiology of psoriasis (E. M. Farber, K. Aso, 32nd Annual Meeting, American Academy Dermatology, Chicago, I11., Dec. 1973; E. M. Farber et al, J. Invest. Derm.,in preparation; E. M. Farber et al, Nature New Biology, in preparation). The prostaglandins are a class of Coo fatty acids, having molecular weights near 350 and basal tissue con- centrations in the nanogram and picogram per gram range. The prostaglandins are presently detected by radioimmunoassay, bioassay and mass spectrometric techniques, among others. There is considerable controversy concerning the method of choice for measurement of absolute amounts of prostaglandin in various tissues. In particular, it has been suggested that mass spectrometric techniques yield more accurate quantitative assays than radioimmunoassay techniques (Adv. Biosciences, 9, 71-123, 1973, Ed. G. Raspé, S. Bernhard, Pergamon Press, N.Y.). Radioimmunoassay techniques are currently in use in our laboratories, and the addition of mass spectrometry capability would greatly increase the definitiveness of our studies, as well as make available to us a powerful tool for the study of prostaglandin precursors and metabolites. Work to date has been supported in part by NIH Grant No. AM 15107. borat DVB even Deo b Whee I. Rabinowitz, Ph.D. O . I. Wilkinson, Ph.O. Department of Dermatolo eae of Dermatology 4 IR: DIW:ss INAINVYOWIW 351ddO © ALISYUTAINM GYOANVLS © WNGNVYOWIW 321ddIO © ALISUZAINA QYOINVIS © WNGNVYOWAW ad1ddO © ALISUZAINN GUOANVIS © JRE OFFICE MEMORANDUM @ pec 3. 1973 Date: November 30, 1973 Joshua Lederberg, Department of Genetics T ° Carl Djerassi, Department of Chemistry From Eugene D. Robin, M.D., Department of Respri atory Medicine Susiect; YOUX memo of November 20, 1973 describing a proposed GC/HRMS facility. ¢ I have applied to the NIH for a continuation of my research grant, Adaptations To 0» Depletion in which I have proposed to measure the redox state of NAD +/ NADH and NADPt/NADPH by measuring the ratio of oxidized to reduced redox pairs using g gas chromatography/mass spectrometry. These analyses will be conducted with the assistance of Drs. Alan Duffield and Wilfred Pereira of the Department of Genetics. I welcome the opportunity to have a GC/HRMS facility available on campus to support the GC/LRMS available in the department of genetics. The facility you propose to establish will be of importance to us in those instances where assignment of molecular composition to fonized fragme-ts is crucial for wass spectral interpretation. I would anticipate using this service between one and two times a month. , Sincerely yours, a » “oven, M.D. Professor of Medicine and Physiology EDR:ods JA? STANFORD UNIVERSITY © OFFICE MEMORANDUM e¢ STANFORD UNIVERSITY © OFFICE MEMORANDU?A LONVUOWAW HIIddO © ALISUTAINN GYOUNVAS © WNONVYOWSW aatdgO © ALISUZAINN GYOUNVLS © WNONVYOWAW H2IddO © ALISUZAINN GNOENVIAS @ NOV 30 1973 VETERANS ADMINISTRATION HOSPITAL 3801 MIRANDA AVENUE PALO ALTO, CALIFORNIA 94304 November 28, 1973 REFER 10: Dr. Joshua Lederberg Department of Genetics Stanford University School of Medicine Palo Alto, California 94305 Dear Dr. Lederberg: I should be very pleased if you were able to obtain through the National Institutes of Health a GC/MS facility which could be shared jointly by members of the Stanford University faculty. At present, I am being funded under grant DA-00424-01 for a study of the metabolism of ma.ihuana. We have made significant progress in our methods of extracting metabolites, in isolating new ones by thin-layer chromatographic techniques, and by purify- ing them to some degree as determined by GLC. The big bottleneck has been the lack of ready access to a GC/MS set-up which would permit further characterization of the metabolites. Our needs would be primarily for GC/low resolution MS, for which we have extensive need, perhaps the analysis of 15-25 samples per month. Depending on the outcome of these analyses, we might have 1 to 2 samples per month requiring GC/high resolution MS. We anticipate having little need for high resolution MS without GC because of the fact that our samples are isolated from complex mixtures and are nearly impossible to purify. If there is any way in which I could assist in helping obtain such a facility for the University, please let me know. Sincerely yours, Leo E. Hollister, M.D. ~, Associate Professor of Medicine LEH: bh Show veteran's full name, VA file number, and social security number on all correspondence. 130° . STANFORD UNIVERSITY HOSPITAL Pharmacy Department Date September 5, 1973 To: Dr. J. Lederberg, Director Department of Genetics _ From: Hiram H. Sera, Director Lh (fe Subject: Drug Analysis Service with Gas Chromatograph and Mass Spect- rometer, . I wish to express our appreciation to your department for assisting us in identifying a drug sample submitted to us from the Fl patient tare area, BACKGROUND: . The patient on E1A with G.I, disturbance, joint pains and occasional spike temperature was found to possess an unidentified medication in a plastic vial and was found to have self-administered the drug intramus- cularly while in the hospital, The house staff was notified and the drug sample was submitted to us for immediate identification, Through my previous association and knowledge of Drs. Summons' and W, Perieras'’ (in Dr. Duffield's instrumentation research laboratory) work with gas chromatograph and mass Spectrometer, I had taken the liberty to request their assistance in the identification, In an hour, the determination was made and the drug was found to be Pentazocaine or Talwin which is a Synthetic analgesic used commonly in this hospital in tablet and injection forms. Since we do occasionally receive similar requests from physicians, I wish to call on your staff again in the future, Thank you. HHS: lh ce: Mr. John Williams Dr. Roger Summons Dr. W. Periera Dr. A. Duffield 78/ OFFICE MEMORANDUM e¢ STANFORD UNIVERSITY © OFFICE MEMORANDUM e¢ STANFORD UNIVERSITY @ OFFICE MEMORANDU?PA Date: November 26, 1973 To: Joshua Lederberg, Department of Chemistry Carl Djerassi, Department of Chemistry From : Sumner M. Kalman SuBsect: Mass Spectrometry, Your Memo ‘of November 20, 1973. A central facility for mass spectrometry and GC/MS would be highly desirable from my point of view. We often need to identify metabolites of drugs that interfere with our assays, and that represent research problems as well. Frequently we need to check the purity of a reference material which is in short supply. I have received much heln from both your laboratories in the past and would welcome the opportunity to use an expanded facility. For many of our prcblems low resolution MS is satisfactory and I hope you mean to provide this service too. , With respect to your questions I anticipate that (1) Yes. (2) We would probably use GC/MS once a month or more. We would use MS at about the same rate. (3) Yes. Sincerely yours, oyun tt. (elem, AD Sumner M. Kalman, M.D. Professor of Pharmacology Director, Drug Assay Lab VYSOWAW FI1ddO © AAISSTAINN GYOUNVAS @ WNONVYOWEW FdlyO © AAISYZAINA G¥YOQUNYAS @ WNONYHOWAW BolddO © ALISUZAINN GYOENYAS ¢ a ABR PN res, OFFICE MEMORANDUM e STANFORD UNIVERSITY © OFFICE MEMORANDUM @ STANFORD UNIVERSITY @© OFFICE MEMORANOUM To From SuBJEcT: DEC 3 1973 Date. 3 December 1973 Joshua Lederberg Jack Barchas GC/HRMS Our thanks to you and Alan Duffield for inquiring of our interest in the proposed GC/HRMS. We would find it quite useful, as we are currently applying for funding for a quadrupole mass spectro- “meter for mass fragmentography studies. With such a unit, there would be many times when the capability of the HRMS instrumentation woulu be valuable in structural elucidation. We would expect very heavy utilization of our instrument if we were to-obtain the funding, and, therefore, would expect to make considerable use of the proposed GC/HRMS, which is an essential ancillary tool. The GC aspects of the instrument would be valuable, since we would expect to be studying a number of unknowns and the GC separation would be an integral part of that process. Our work is supported by NIMH, ONR, NASA, and the Alcohol Abuse division of HEW. y JDB/rs — 733 WNONVYOW3W 351dIO © ALISUTAINN GYOSNYVIS «© \TONVEOWIW 3D1ddO © ALISUTAINM GYOINVLS © WNAONVYOWSW 3Dd1sdO © ALISYBAINN GYOINVIS e NOV 30 1978 NATIONAL AERONAUTICS AND SPACE ADMINISTRATION AMES RESEARCH CENTER MorFeTT FIELD, CALIFORNIA 94035 REPLY TO ATINOF: LLPE: 239-9 November 28, 1973 Professor Joshua Lederberg: Department of Genetics School of Medicine Stanford University Stanford, CA 94305 Dear Professor Lederberg: I was delighted to learn of your proposed plans to upgrade your mass spectrometry capabilities by providing routine high rez olution mass spectrometry (HRMS) and combined gas chromatography/ high resolution mass spectrometry (GC/HRMS). Such a service could be of inestimable value to our program. As you know we are developing gas chromatography/high resolution mass spectrometry facilities for NASA's interests. In particular we are modifying our equipment in order to determine carbon and nitrogen isotopic compositions of organic molecules. If available we would use your proposed facilities for our routine GC /HRMS analysis of biologically significant molecules which are sought in our program. Most of our work requires GC/HRMS as opposed to HRMS. In addition, we are also most interested in computer programs which aid in mass spectral interpretations. Although we have a few of our own programs, we would be most eager fo upgrade our own interpretation capabilities through use of programs from your facility. Our work thus far has been supported solely by NASA; we are not supported at present by NIH. I hope that our expression of interest will be of use to you in obtaining funding for a potentially most useful analytical facility. Sincerely yours, a , _ . a e me ete’e Ce. ee ateeeeebel Coe Keith A. Kvenvolden Chief, Chemical Evolution Branch 134 OFFICE MEMORANDUM e To From SusJECT: NOV 29 1973 Dare: November 28,. 1973 Joshua Lederberg, Ph.D. $331 William R, Fair, M.D. $287 Use of facilities for high resolution mass spectral analysis with gas chromatography. As your memo of November 1973 requested, we have answered the questions concerning our interest in GC/HRMS. 1. 3 This service would be of definite value to us in two projects currently being investigated in our laboratories. a) The identification, distribution, and biological significance of the prostatic antibacterial factor (PAF). Our preliminary experiments indicate tl.at this is a basic polypeptide, perhaps attached to a divalent metal such as zinc. b) This service would also be of value in the determination of the urinary polyamine levels in patients with various genitourinary tract malignancies. Our initial experiments along this Jine indicate that there is significant elevation of polyamines in patients with prostatic carcinoma. The use of GC/HRMS would enable a more precise quantita- tion of these differences and enable us to expand our research into other areas concerning the biochemical significance of the polyamines. I would estimate that on the PAF project we would use approximately 2-4 samples per month and perhaps 10-12 samples per month on the polyamine projects. Both of these projects would require the use of GC/HRMS. A portion of our research on the PAF is currently supported by a grant from the NIH. The amount of this grant is $36,698, and this grant will terminate on December 31, 1974. 135 a CONVSOWSW JDdiddO © ALISUTAINN QUOANVAS @® WNONVUOWAW JIidsIO © ALISUTAINN QHOSNVIS © WNONVYOWSW FI1dO © ALISYZAINN QYOANVIS ¢ STANFORD UNIVERSITY © OFFICE MEMORANDUM © STANFORD UNIVERSITY #® OFFICE MEMORANDUM