165. Applications of “Artificial Intelligence ”’ for Chemical Inference, VI) Approach to a General Method of Interpreting Low Resolution Mass Spectra with a Computer by Armand Buchs’), Allan B. Delfino®). A. M. Duffield, Carl Djerassi, B. G. Buchanan, E. A. Feigenbaum and J. Lederberg Contribution from the Departments of Chemistry, Computer Science and Genetics, Stanford University, Stanford, California 94305, USA (15. VI. 70) Résumé. Le programme connu sous le nom de «Heuristic DENDRAL» est maintenant capable d’interpréter d’une maniere absolument automatique les spectres de masse a basse résolution de n’importe quel compose de formule élémentaire CpHoniy & (% = 0, Sou N, v = valence de X). La possibilité de faire usage de spectres de RMN. pour faciliter l'interprétation a été retenue. Il n’est plus nécessaire de fournir au programme la formule élémentaire du composé dont on veut déter- miner la structure. Les données théoriques concernant la speetrométrie de masse et la résonance magnétique nucléaire sont créées par le programme luicméme. A aucun moment le chimiste n’a besoin de fournir d’autres données que le spectre de masse et, s'il le désire, le spectre de RMN. L’efficacité du programme a été mise 4 l’épreuve avec 210 spectres de masse. La structure correcte apparait toujours dans la réponse. Les résultats reportés dans les tableaux 2, 3 et 4 montrent que le nombre d’isomeres qui sont compatibles avec la réponse donnée par le programme représente une trés importante réduction du nombre total d’isoméres qui sont @ priori des candidats possibles. Previous publications have described the results of heuristic computer program ming for the interpretation of low resolution mass spectra of ethers 2! and amines (3). These two classes of compounds are part of the general heteroatomic class C, Hen ay* 1) For Part V see reference {1}. 2) On leave of absence from the University of Geneva, Switzerland. 3) Present address: Allen-Babcock Computing, Palo Alto, California 94303. Diagram 1. Choice of the most plausible empirical formula READ MASS SPECTRUM }<—--—- READ NEXT MASS SPECTRUM «-_~ oo a REDUCE MASS SPECTRUM 7 | y y | ACCEPT MASS SPECTRUM OR --. = » REJECT MASS SPECTRUM Ss a oa oe RANK HETEROATOM | Y o— ---—_--___ TAKE BEST RANKED HETEROATOM TAKE NEXT HETEROATOM < |... NO ACCEPT HETEROATOM OR —» REJECT HETEROATOM —» ALL REJECTED? —» YES A Lo - —» INFER MOLECULAR WEIGHT « — pS ACCEPT HETEROATOM = «—_—_ |_| / OR nee Le Lo. » BUILD EMPIRICAL FORMULA <« |. a | v GENERATE SUPERATOMS AND THEORY i INCREASE MOLECULAR WEIGHT BY nx 14 | SO PERFORM VALIDATION PROCESS | is } fn <3 SOME SUPERATOMS OR — Ly NO SUPERATOM__ 2 = 3 J VALIDATED VALIDATED J Y RESULT COT “IN - (0261) 9 ‘9SR.F ‘EC "TOA ~— VLOY VOIKIND VoILaAATaY S6ET 1396 Hecvetica Cuimica Acta — Vol. 53, Fasc. 6 (1970) — Nr. 165 (v =. valence of X) with which this paper is concerned. We shall review them in the light of improvements which have recently been achieved. The ether subelass that the program can analyze has been extended past methyl, ethyl and propyl ethers, to include any ether structure. Moreover, the alcohol, thioether, and thiol classes have been added to the program’s repertoire. The necessity of supplying the empirical formula has been removed; the INFERENCE MAKER program is, at the present time, able to accept as sole inputs the mass spectrum and, optionally, the NMR. spectrum of the unknown compound. The purpose of this paper is to describe how the program first decides on a plausible empirical formula (and therefore a molecular weight), how it then generates the corresponding set of subgraphs, builds for each subgraph the theory related to its structure, and finally infers plausible substructures from the mass spectra of amines, ethers, alcohols, thiols, and thioethers. The basic design of Heuristic DENDRAL is described in our earlier publication dealing with saturated ethers [2], and is summarized again in our publication dealing with amines 3]. As will be shown in this paper, the efficiency achieved in ‘he INFERENCE MAKER with the general class of ‘saturated acyclic monofunctional’ (SAM) compounds is such that the two other phases of Heuristic DENDRAL (STRUCTURE GENERATOR and PREDICTOR) need not to be used. Diagram 2. INFERENCE MAKER output with heptane-3-ol (1) as an unknown ACTUAL MASS SPECTRUM = ((27.41) (28.11) (29.40) (30.3) (31.40) (32.1) (41.48) (42.6) (43.25) (44.6) (45.12) (55.13) (56.7) (57.18) (58.10) (59.100) (60.3) (67.1) (69.67) (70.5) (71.1) (72 1) (73.2) (84.1) (85.1) (86.2) (87.30) (88.2) (98.3)) MASS SPECTRUM CORRECTED FOR 8C = ((27.41) (28.11) (29.40) (30.3) (31.40) (32.1) (41.48) (42.6) (43.25) (44.6) (45.12) (55.13) (56.7) (57.18) (58.10) (59.100) (60.1) (67.1) (69.67) (70.3) (71.1) (72.1) (73.2) (84.1) (85.1) (86.2) (87.30) (88.1) (98.3) NMR. SPECTRUM = ((9.20 6T) (1.37 8M) (3.40 1M)) Run 7 NUMBER OF CARBON-BOUND METHYLS NUMBER OF OX YGEN-BOUND METHYLS TOTAL NUMBER OF METHYLS = MINIMUM NUMBER OF ALPHA-CARBON-BOUND HYDROGENS i ll it mer cCN \ INFERRED MOLECULAR WEIGHT = 116 INFERRED EMPIRICAL FORMULA = C,H,,0 SUBGENERA INFERRED: *EA-S-(CyHy, CoH) rae ISOMER TOTAL NUMBER OF ISOMERS: / Run 2 WAS A NMR. SPECTRUM AVAILABLE ? NO INFERRED MOLECULAR WEIGHT = 116 INFERRED EMPIRICAL FORMULA = C)H,,0 SUBGENERA INFERRED: *EA. S-(CyHg, CyHs) 4 ISOMERS TOTAL NUMBER OF [SOMERS: 4 HELvetica Cuimica Acta — Vol. 53, Fasc. 6 (1970) — Nr. 165 1397 The decision processes invoked by the INFERENCE MAKER in the choice of the most plausible empirical formula are schematically represented in Diagram 1, and will be illustrated with an example, the mass spectrum of heptane-3-ol (1). CH,—CH,--CH—CH,—CH,~ CH,—CH, | OH 1 The actual mass spectrum of 1 and the one corrected for the C isotope contribu- tions are tabulated in Diagram 24). The program is supplied with the actual mass spectrum (and the NMR. spectrum, if one was recorded), and starts by making a decision about the plausibility of it belonging to the SAM class. The program strips from the mass spectrum all the ion signals which would be used later on, during the validation process. Then, depending on the average intensity®) of the remaining ion signals of the spectrum (called reduced spectrum), the program either accepts this mass spectrum as a plausible SAM candidate, or totally rejects it from further consideration at this very early stage of the process. For the SAM class compounds, the ions which are removed from the mass spectrum can all be formed by mecha- nistically important fragmentation paths. They belong to the following series: 1. a-C.eavage series for nitrogen, oxygen and sulfur SAM compounds starting with L + 4. mie 30 (CH,=NH,), 31 (CH,=OH), and 47 (CH,=SH) respectively. The following ions which belong to these series are removed from the mass spectrum of 1: m/e: 30 (CH,N) ®), 31 (CHO), 44 (C,H,N), 45 (C,H,O), 58 (CzH,N), 59 (C3H,O}, 72 (C,H, N), 73 (C,H,O), 86 (C;H,.N), and 87 (C,H,,0). 2. Alkyl] series ions (C,H,, ,,) arising from bond rupture between the heteroatom and an a-carbon with the charge remaining on the hydrocarbon moiety. This removes the ions with m/e 29 (C,H,), 43 (C3H,), 57 (C,H,), 71 (C;H,,), and 85 (C,H,3) from the actual mass spectrum of 1 (Diagram 2). 3. Alkyl series ions (C,H,,_} originating from the primary loss of water and a methyl radical, followed by olefin expulsion. The ions with m/e 27 (C,H), 41 (CgH;), 55 (C,H,), and 69 (C,H,) were also eliminated from the mass spectrum of 1. 4, Alkyl series ions (C,H,,) arising from the loss of XH, (X = O or S), followed by expulsion of olefinic molecules. In the mass spectrum of 1 the following ions belong to that category: m/e 28 (C,H,), 42 (C,H,), 56 (C,H,), 70 (C,H,,}, and 98 (C,H,,). They are therefore removed from the actual mass spectrum of 1. In order for a mass spectrum to be accepted as a plausible SAM candidate, the reduced spectrum must not only exhibit a low average intensity (< 3%), but must not contain any signal with an intensity greater than 10°. The reduced spectrum of 1 contains the followmg ions: mle: 60 67 88 Intensity: 3 1 2 4) The mass spectra reported in Diagr. 2 are tabulated in a sequence of dotted pairs. In each dotted pair the right part represents the relative abundance of the ion whose mass is given in the left part. 5) All intensity values refer to relative abundances with intensity of the base peak = 100°%. 6) Since the program has not yet made a decision about the heteroatom, it considers the ions with mje 30, 44, 58, 72, and 86 as arising by a-cleavage from an amine molecular ion; actually, in the mass spectrum of 1, their empirical formulae are C,H, ,O (n = 1 to 4). 1398 HELVEtica Curmica Acta ~ Vol. 53, Fasc. 6 (1970) — Nr. 165 As it satisfies both these conditions, the actual mass spectrum tabulated in Diagrain 2 is accepted as a SAM molecule spectrum and is subjected to further tests. The program then assigns to each heteratom it knows, i.e. presently nitrogen, oxygen and sulfur, a plausibility score, by summing the intensities of the theoretical series of o-fission ions corresponding to each heteroatom. For any heteroatom X, the lowest mass a-cleavage peak has a mass corresponding to the formula CH, :XH,_, (v = valence of X). In order to calculate the scores, the program uses the following mathematical relationships: A = Mass(X) + Valence(X) +Mass(CH) where X = Heteroatom, M = A+(14 xi) where i = one less than the carbon number corresponding to M,. J = Intensity of the ion of mass M, Score = 3" ](M,) i=} where n is defined by the following relation: (14 xn)+A Mina! equals 3 or 4. In such a case the program infers as lowest probable molecular weight the value (Mj, +14). This takes into account the fact that, for many alcohol mass spectra, the last ion in the spectrum arises by the loss of water from the molecular ion, Evaluating the formula given above for this process, we find the following values from the mass spectrum of 1 with oxygen as the heteroatom: Molecular weight = 16 + 2+ (14 x n) M jax = 9 when n= 6, M' max = 102, t.e. greater than M max" Since the value of (M/_,.--M,,,,,) equals 4, the program assumes that m/e 98 (C,;H,,) corresponds to the loss of H,O and therefore adds 14 mass units to the value of MW7j,,,, inferring m/e 116 as the lowest plausible molecular weight; it will use this value in order to eventually build the first empirical formula. The results we have obtained with 210 mass spectra of amines, ethers, alcohols, thiols, and thioethers show that the correct molecular weight is always inferred on the first attempt for those mass spectra whose highest mass number is either 7, 7 —1, M—2, M+1, M+2, M+3, or even M—18 and M--17 for oxygen containing compounds. The molecular ion need not be present in the spectrum. If the highest mass number in the spectrum is smaller than that of 1M —10, the program will infer a molecular weight M’ of the next lower homolog, provided this does not lead to the apparent presence of intense ions at mass-spectrometrically improbable mass points M’ — R (with 2 <_R < 15). A mass spectrometrist would have to deal with this kind of spectrum in much the same manner as does the program. When the program is working with oxygen or with sulfur, it makes a final decision about allowing the spectrum to enter the validation process with one of these two heteroatoms. In the electron impact induced fragmentation of alcohols, ethers, thiols, and thioethers, the hydrocarbon moiety of the molecule plays an important role ,4]. A rather large L 1400 Hetvetica Cuimica Acta. Vol. 53, Fasc. 6 (1970) - Nr. 165 fraction of the total ion current is carried by the hydrocarbon type ions C,H,,,, and C,H,,1- To accept the spectrum with oxygen or sulfur as heteroatom, the program requires that the sum of the average intensities in the two above mentioned hydro- carbon series be greater than respectively 5% or 2%. The two ion series start with n = 3"), fe. with the ions m/e 41 and 43, and end when the value of n is such that mie of ion C,H,, 41 exceeds the mass of ion (Af - CH,XH, _,). With our example (1) the C,,H2,_, series includes the following ions: m/e 41, 55, 69, and 83. The average intensity value includes all the ions, ¢.¢., even those which are missing from the spectrum, such as m/e 83 with our example. The C,,H,,,, series includes the ions of mje 43, 57, 71, and 85. Since the sum of the average intensities of these two series amounts to 43%, t.e. to a value well above the 5% required, oxygen is accepted as a plausible heteroatom. Once a molecular weight has been inferred, the program generates the empirical formula, Given the inferred heteroatom, the calculation is performed for SAM compounds in the following way: if M = Inferred molecular weight, X = Heteroatom, and C,H,X = General formula, then n= (4f—Mass(X)—Valence (X))/14 and y = M—((12 xn) +Mass(X)). For example 1, for which 116 was inferred as the value of M (with X — oxygen), this results in the following calculations: n = (116—16 —2)/14 = 7 y = 116—((12 x 7) +16) = 16 2.e. Empirical formula = C,H,,O0 | After having built the empirical formula, the program builds the subgraphs or superatoms'*) corresponding to that heteroatom and the theory associated with those superatoms. With our example 1, the program generates the ether and alcohol subgraphs, formulating for each subgraph its associated mass and NMR. spectral theory, and tries to validate these subgraphs. If one or more subgraphs are validated, the total inference process for the unknown structure is complete; if no subgraph is validated for the molecular weight, the attempt is classed as a failure. Therefore the program makes a further attempt with the same heteroatom but a different molecular weight. Since the first molecular weight was a lower limit, the new molecular weight will be 14 mass units greater than the prior one. From this then is calculated a new empirical formula. A molecular weight or empirical formula change does not affect the number and kind of superatoms required for validation ; the superatoms and theory are built de novo only if a heteroatom change occurs. If, after having tried to validate subgraphs corresponding to the best ranked heteroatom with three consecutive empirical formulae, no substructure is substan- tiated, the program assumes that despite its high score, the highest ranked, and ®) The program ignores the two ions at m/e 27 and 29 (n = 2). In general they are of no value for the interpretation of mass spectra, especially with SAM compounds. 10) A superatom is defined as a structural unit with at least one free valence, In this context, the program generates only superatoms containing the heteroatom and all the a-carbon atoms with their protons; also, the program attaches only carbon atoms to the free valences. HELVetica Cuimica Acta ~ Vol. 53, Fasc. 6 (1970) — Nr. 165 1401 accepted, heteroatom is not the correct one. The INFERENCE MAKER then makes the same kind of attempt with the next best ranked heteroatom, ?.e. checks its consistency with the mass spectrum, infers a starting molecular weiglit in accord with the mass of the new heteroatom and the highest mass number of the mass spectrum, calculates an empirical formula, generates subgraphs and corresponding theory, and invokes the validation process. If no result is supported after all the heteroatoms that are known to the program have been postulated with three consecutive empirical formulae each, the mass spectrum cannot have resulted from a SAM compound, as far as the INFERENCE MAKER program is concerned. In actual practice the program did find a subgraph consistent with the mass spectrum of heptane-3-ol (1). The actual output illustrated in Diagram 2 consists of two separate runs; in the first one the mass spectrum was supplemented by a NMR. spectrum, and in the second run the NMR. spectrum was ignored. If no subgraph had been validated for C,H,,O0, the program would have substituted C,H,,O and finally C,H,,O0. If still no subgraph were validated, the program would have classified the mass spectrum as not belonging to a compound of the SAM class. Nitrogen or sulfur subgraphs would not have been generated, because the observed scores (22 and 0) are below the threshold values (100 and 20) for both these two heteroatoms. These preliminary decisions about consistency between heteroatom and spectrum do not ensure that only mass spectra of SAM compounds will enter the validation process, but they sharply decrease the probability of having non SAM compounds spectra accepted. If should be stressed that even if inadequate mass spectra pass that entrance filter, they still have to undergo successfully numerous tests during the validation process in order to be wrongly classified as SAM compound mass spectra. Diagram 3. Relations between the name and the structure of superatoms HETFEROATOM PREFIXES EA=0 AM=N TH=S a-SUBSTITUTION SYMBOLS | | M = -CHy P = -CH,- S = -CH- T =-C- | | | ~CH,-O-CH,- -CH-O-CH,- HO-C- | *EA-PP* *EA-SP* *EA-T* | | | -CH,-S-CH, -CH-S-CH,- -C-S-C~ | | *TH-PM* *TH-SP* *TH-TT* | | | | | CH-NH-CH -C-N-CH, -C-N-CH- | | | | lo] CH, CH,- *AM-SS* *AM-TMM* *AM-TSP* 1402 HEtvetica Cuimica Acta ~ Vol. 53, Fasc. 6 (1970) — Nr. 165 For each class of compounds, the subgraphs built by the program must represent a complete and irredundant set of substructures. Any SAM structure must belong to one and only one subgraph. This is accomplished by using for the superatom names a combination of the four symbols T, S, P and M called a-substitution svmbols (see Diagram 3), preceded by a heteroatom prefix (AM for nitrogen, EA for oxygen, and TH for sulfur). The meaning of the «-substitution symbols and the structure each symbol or combination of symbols represents is described in our publication dealing with amines {3}. We will briefly review this notational scheme and illustrate it for the general class of SAM compounds. For each subclass (amines, alcohols, ethers, thiols, thioethers) the number of superatoms depends on the valence of the heteroatom. For nitrogen, all combinations of the symbols T, S, P, and M, taken one at a time, two at a time, and three at a time, result in a total of 31 superatoms. Because oxygen and sulfur are divalent, there are only combinations of one and two letters for these heteroatoms. The canonical order of the «-substitution symbols (T > S > P > M) requires the higher value symbol to be written to the left of a lower value symbol; this allows only one way to write a particular name. A subgraph with one tertiary a-carbon, one secondary «-carbon, and one g-methyl radical should have its partial name written as TSM and not STM, MST, or MTS. The number of symbols in the name represents the number of carbon atoms directly bound to the heteroatom (g-carbons). With the heteroatoms which are currently known to the program, 7.e. oxygen, sulfur, and nitrogen, names with 3 symbols can only represent superatoms of tertiary amines; those with 2 symbols refer to secondary amines as well as to ethers and thioethers, while one-symbol names may represent primary amines, alcohols, and thiols. In each particular name, the a-substitution symbols themselves give the number of f-carbon(s) attached to each a-carbon atom (3 for T, 2 for S, 1 for P,and none for M). The general relationship between superatom names and the structure they represent is depicted in Diagram 3, along with some examples. Once the INFERENCE MAKER has inferred a heteroatom, it builds the corre- sponding superatom names and for each superatom the program constructs a set of properties associated with the superatom and the mass spectrometric and NMR. related conditions which will have to be satisfied in order for the superatom to be validated. This is possible because the name of a superatom represents all the needed information (structure, weight, mass of the lowest possible «-fission peak, etc.). Moreover, the name of a superatom contains enough structural information to decide what kind of fragmentation can be expected to occur predominantly from a molecular ion containing as a subunit the partial structure represented by the name of that superatom. The program builds a set of numbers using the digits 1, 2, 3, and 4. These numbers are allowed to contain from one to n digits, m being the valence of the heteroatom. No digit of a higher value can be written to the right of a digit of a lower value; all possible numbers that do not violate the canonical order must be included in the set. With example 1 the following 14 numbers are generated!4): 1, 2, 3, 4, 11, 21, 22, 31, U1) For n = 3 (e.g. nitrogen), the following 20 combinations would be added to the 14 generated for divalent heteroatoms: 111, 211, 221, 222, 311, 321, 322, 331, 332, 333, 411, 421, 422, 431, 432, 433, 441, 442, 443, and 444. HELVeETiIca CuImica Acta — Vol. 53, Fasc. 6 (1970) — Nr. 165 1403 32, 33, 41, 42, 43 and 44. Each number is then translated to its corresponding a- substitution symbol (1 to M, 2 to P, 3 to S, 4 to T). The heteroatom prefix is attached with an intervening dash and the name is surrounded by two asterisks. The result is a name like *AM~SS* for the secondary amine superatom with both z-carbons mono- substituted (see Diagram 3). Since we are interested in subgraphs with at least one free valence, names containing only M’s are ignored !2). Each superatom has intrinsic properties as well as properties connected with mass and NMR. spectrometry. Some of the properties depend only on the heteroatom prefix; they are constants for a given heteroatom. Some of the intensity threshold values used during the validation process are examples of such properties. Other properties depend only on the combination of the «substitution symbols; they are not related to any particular heteroatom. Finally, each superatom has properties which are implied by both the heteroatom prefix and the combination of the z-sub- stitution symbols. Moreover, if some properties simply are numerical values which the program will use to perform calculations, others represent switches which will tell the program what kind of tests to perform for each particular superatom. The properties associated with each superatom are calculated and classified according the following outline: A. Intrinstc properties. Structure and weight are the only two intrinsic properties; their value depends on the complete superatom name (heteroatom prefix and o- substitution symbol). The program knows the partial structure corresponding to each | | a-substitution symbol (M = —CH,, P = —-CH,-, S = -CH- and T = -C-). The heteroatom is deduced from the heteroatom prefix (AM = N, EA = O and TH = S) and the number of hydrogen atoms attached to the heteroatom is equal to the difference between the number of x-substitution symbols and the valence of the super- atom. The weight of the superatom is not calculated from the chemical structure, but directly from the name. A mass is assigned to each «-substitution symbol (15 to M, 14 to P, 13 to 5 and 12 to T) and also to each heteroatom prefix. The mass corre- sponding to the various heteroatom prefixes is given by the mass of the molecule XH, (v = valence of X). This results in the following values: 17 for AM, 18 for EA and 34 for TH. The mass of any superatom is obtained by adding the masses of the a-substitution symbols to the difference between the weight of the heteroatom prefix and the number of «substitution symbols. For superatom *TH—SP* for example (see Diagram 3), this leads to the following calculation: 13 +14 +(34—2) = 59. B. Mass spectrometric properties which depend on the a-substitution symbols only. The number of carbon-carbon bonds available for «cleavage or, equivalently, the number of free valences of the superatoms, and the total substitution degree of the a-carbons are examples of such properties. In order to calculate the number of free valences, the program assigns to each «-substitution symbol a value (0 to M, 1 to P, 2 to Sand 3 to T). The sum of the values of each «-substitution symbol represents the number of free valences. For example, superatom *AM-TSP* (see Diagram 3) has (34+2+1) 7.e. 6 free valences. 12) The three general names *X—-M*, *X-MM* and *X-MMM¥ with X == AM, or EA and TH when at maximum two M’s are present, represent molecules. *~EA~M* and *EA—-MM¥*, for example, stand for methanol and dimethyl! ether respectively. Table 1, Tests used during the validation process SUPERATOMS NMR. tests Mass spectrometry tests Direct tests Multistep tests SIZE TMC HMC HYC M-XH, M-— _CH,-XH CH,=XCH, EVION ALPHA REARR ALKFIT —__—_—— CH, XH <1% >2% 1-100 >2% >10% >10% *X-P+ 2 1 0 2 7 + - - + - + - _ _ *X-S* 3 2 0 1 -~ = + - ~ - - + - + *X-T* 4 3 0 0 - - - - - - ~ + - + *X-PM* 3 2 1 2 + - - + _ + - — — + *X-PP* 4 2 0 4 + - - - - - - + + *X-SM* 4 3 1 1 $= - - - - + ~ + *K_SP* 5 3 0 3 + - = - - - 7 + + + *X-SS* 6 4 0 2 + - - - - _ - + + + *X-TM* 5 4 1 0 $+ = = - - - ~ + - + *X—-TP* 6 4 0 2 + _ - _ _ — _ + + + *X-TS* 7 5 0 1 + - ~ _ - - - + + + *X-TT* 8 6 0 0 + - - - - - _ + + + X = EA or TH. + means that the switch for that test is ‘on’. ~ means that the switch for that test is ‘off’. POrT COL AN ~ (OL61) 9 “OSBLE ‘ES "TOA — VLOY VOININD VOILEATAT Hucverica Cuimica Acta — Vol. 53, Fasc. 6 (1970) ~ Nr. 165 1405 The total degree of substitution of the a-carbons represents a different kind of property. It constitutes a switch that the program sets ‘on’ or ‘off’, depending on the name of the superatom under test. During the validation process the program will perform some tests related to that property only if the switch is ‘on’. In Table 1 are reported all the switches used for the validation of the 12 oxygen or sulfur super- atoms. From now on they will be referred to as tests rather than switches. Some tests are simple ones, like checking the intensity of a particular ion signal (test ‘AZ —XH,’, Table 1), while others imply more complex multistep processes, like searching the mass spectrum for sets of a-cleavage ions at m/e consistent with the structure of the superatom under test, and having intensities in accord with the charge retentive power of the heteroatom (test ‘ALPHA’, Table 1). More extensive comment on test ALPHA will be made later in the text. The test “REARR’ for example (see Table 1), is set to the position ‘on’ for those superatoms which, if they were present in the molecular ion as the central subunit, would lead after electron impact to a favored hydrogen rearranzement process. This occurs only with molecular ions containing as part of their structure a superatom with at least one substituted «-carbon. For such molecular ions one can expect the mass spectrum to exhibit strong signals for ions arising from the well known [5] rearrangement mechanism depicted below (2, b > c) with an ether (X = O) or thioether (X = S$) molecular ion as example: RI RS R38 né—R—dy cleavage ck dy Hmigration oy ky | | -R | | C-X cleavage | R? Rt R? R* R? a b c 2 Only superatoms with names containing at least two «-substitution symbols (ex- cluding M’s), with at least one of them being S or T, possess the required structure. To decide for which superatom the test should be performed, the program removes the M’s from the superatom name and sets test REARR to ‘on’ or ‘off’ depending on which «-substitution symbols are left. C. Mass spectrometric properties which depend on the complete superatom name. Examples of such properties include both tests and numerical properties. The lowest possible mass of an ion formed by a«-fission for a particular superatom is an example of a numerical property. The program calculates the value of this property, for each superatom, by adding to the mass of the superatom the mass corresponding to (n —1) methyl radicals, where n represents the number of free valences. For superatom *TH-TT* for example (see Diagram 3), the smallest a-fission fragment is (CH,)3-C-S—-C-(CH,),—; it cannot have a mass smaller than m/e 131 (mass of super- atom = 56, n = 6). An example of a test is represented by ‘ALPHA’ (see Table 1); it tells the program how to handle conditions related to «-cleavage, depending on the charge retentive power of the heteroatom and the structure of the superatom. The subtests it implies are described in the part dealing with the validation phase of the INFERENCE MAKER program. Other tests are simple intensity checks (tests ‘CH,=XH’, ‘CH,=XCH,’, ‘M —CH,XH’, etc., Table 1). 1406 HELvetica Cuimica Acta — Vol. 53, Fasc. 6 (1970). Nr. 165 D. Mass spectrometric properties which only depend on the heteroatom prefix. These properties include some of the various threshold values assigned to the intensity of particular ions or ion series. Oxygen containing superatoms, for example, are accepted for further consideration only if the hydrocarbon type ions C,H., ., originating from C-O cleavage exhibit a sum of intensities greater than 5%. The program sets this threshold to different values for sulfur or nitrogen containing superatoms. E. Properties pertaining to NMR. spectrometry. Here again, the values assigned to some of these properties depend only on the g-substitution symbols, while for others they change from heteroatom to heteroatom. Properties which have different values for different structures around the heteroatom are: 1. The minimum number of methyl radicals required by the structure of a super- atom (test ‘TMC’, Table 1). 2. The number of methyl! radicals linked to the heteroatom (test ‘HMC’, Table 1). 3. The maximum number of protons bound to a-carbon atoms, excluding methyl protons (test ‘HYC’, Table 1). Since we are dealing exclusively with saturated chemical structures, the minimum number of methyl radicals that an NMR. spectrum should exhibit to congrue with a superatom structure is equivalent to the number of free valences added to the number of M’s present in the name of the superatom. For example, the structure of superatom *\M-TMM* (see Diagram 3) requires that at least five methyl groups be inferred from the NMR. spectrum!3). To calculate the number of methyl groups compatible with the structure of a superatom, the program simply counts the M’s appearing in the name. A definite number of protons is part of the structure of every «-substitution symbol which has at least one free valence left for a carbon-carbon linkage (2 for P, 1 for S and 0 for T). By adding together all the protons of these «-substitution symbols, the program determines the maximum number of «-carbon hydrogens allowed by each structure. Superatom *EA-PP* for example (see Diagram 3), is assigned four such protons. Once the superatom and theory generation phase has been completed, the program corrects the relative abundances of the signals in the mass spectrum by removing isotope peaks; it then deletes from the spectrum any peak appearing at an improbable mass (M —3 through M. -14), adjusts the intensities of the remaining ions with respect to 100% for the base peak, and initiates the validation process for each of the 31 (nitrogen)?4) or 12 (oxygen and sulfur) superatoms. With oxygen or sulfur SAM compounds some of the tests are similar to those which were designed for amines; this holds for all the tests that are not related to mass spectrometry. The main difference arises from the fact that nitrogen, in contrast to either oxygen or sulfur, is very efficient in stabilizing, and hence retaining, the positive charge. This affects drastically the fragmentation pattern for amines, and as is shown in our publication [3], almost all the tests dealing with mass spectro- 18) {n order to generate a SAM molecule from *AM-—TMM*, the addition of three alkyl radicals is required. They could be methyl radicals or not, but, in this latter case, each alkyl chain must terminate in at least one methyl group. M4) A detailed description of the tests each amine superatom undergoes is given in our publi- cation [3]. HE vvetica Curmica Acta — Vol. 53, Fasc. 6 (1970) — Nr. 165 1407 metry relied on the charge localization concept [6]; «-cleavage and rearrangement according to the mechanism previously depicted (see 2) were the two main processes used by the INFERENCE MAKER program to efficiently interpret amine low resolution mass spectra. As is well known [7,, oxygen and sulfur are less effective than nitrogen in accomodating the positive charge. #-Cleavage plays a less important role, especially when the size of the molecule, or the branching of the alkyl radicals, is substantial. The influence of the heteroatom upon the fragmentation is often over- shadowed by the hydrocarbon moiety of the molecule; this has to be overcome for a successful interpretation of the mass spectrum. The partial lack of charge retention apparently hinders the ease of interpretation more for ethers and thioethers than for alcohols or mercaptans. The fragmentation is no longer triggered by a clear driving force as it was for amines. Other fragmentation paths have to be considered, like C-X bond scissions with the charge remaining on the alkyl radical (X = O or 5), loss of XH,, or HXR, followed by olefin expulsion according to the mechanism depicted below (3). u te +: XH—CH,;CH,+CH,+CHR ----- >» CH,=CHR + XH, + C,H, NS In order to describe how the validation phase of the INFERENCE MAKER program infers the correct superatom along with the size of the alkyl radicals C,H, 4.4 attached to each free valence, the various tests reported in Table 1 will be illustrated by using the mass spectrum of isopropy! x-amyl ether (4), a molecule which contains an *EA-SP* subgraph (see Diagram 3). (CHy).—CH—O—CH,—(CH,)4--CHy 4 The correct answer for that compound is: *EA—SP-(CH,,CH,) (C,H), where EA stands for oxygen and SP gives the number and the structure of the «carbon atoms. Diagram 4. INFERENCE MAKER output with isopropyl n-amyl ether (4) as an unknown ACTUAL MASS SPECTRUM = ((31.2) 41.15) (42.10) (43.100) (44.4) (45.30) (55.6) (56.1) (37.1) (59.3) (69.3) (70.5) (71.43) (72.2) (73.21) (115.16) (116.1}) MASS SPECTRUM CORRECTED FOR BC = ((31.2) (41.15) (42.10) (43.100) (44.2) (45.30) (55.6) (56.1) (57.1) (59.3) (69.3) (70.5) (71.43) (72.1) (73.21) (115.16) WAS A NMR. SPECTRUM AVAILABLE ? NO INFERRED MOLECULAR WEIGHT = 116 INFERRED EMPIRICAL FORMULA = CH,,0 SUBGENERA INFERRED: NONE WAS A NMR. SPECTRUM AVAILABLE ? NO INFERRED MOLECULAR WEIGHT — 130 INFERRED EMPIRICAL FORMULA = CgHy,O SUBGENERA INFERRED: *EA-SP-(CH,, CH,) (CyHy) 4 ISOMERS TOTAL NUMBER OF ISOMERS: 4 HELVveTiIca Curmica Acta — Vol. 53, Fasc. 6 (1970) — Nr. 165 1407 metry relied on the charge localization concept [6); #-cleavage and rearrangement according to the mechanism previously depicted (see 2) were the two main processes used by the INFERENCE MAKER program to efficiently interpret amine low resolution mass spectra. As is well known [7], oxygen and sulfur are less effective than nitrogen in accomodating the positive charge. g-Cleavage plays a less important role, especially when the size of the molecule, or the branching of the alky! radicals, is substantial. The influence of the heteroatom upon the fragmentation is often over- shadowed by the hydrocarbon moiety of the molecule; this has to be overcome for a successful interpretation of the mass spectrum. The partial lack of charge retention apparently hinders the ease of interpretation more for ethers and thioethers than for alcohols or mercaptans. The fragmentation is no longer triggered by a clear driving force as it was for amines. Other fragmentation paths have to be considered, like C—X bond scissions with the charge remaining on the alkyl radical (X = O or 5), loss of XH,, or HXR, followed by olefin expulsion according to the mechanism depicted below (3). , H +e oo te XH+CH,;CH,1CH,ACHR ------ > CH,=CHR + XH, + C,H, 8) 3 In order to describe how the validation phase of the INFERENCE MAKER program infers the correct superatom along with the size of the alkyl radicals C,Ho, .4 attached to each free valence, the various tests reported in Table 1 will be illustrated by using the mass spectrum of isopropyl #-amyl ether (4), a molecule which contains an *EA-SP* subgraph (see Diagram 3). (CHy)p—CH-O—CH,-. (CH,),CH 4 The correct answer for that compound is: *EA—SP-(CH,, CHg) (C,H), where EA stands for oxygen and SP gives the number and the structure of the «-carbon atoms. Diagram 4. INFERENCE MAKER output with isopropyl n-amyl ether (4) as an unknown ACTUAL MASS SPECTRUM = ((31.2) (41.15) (42.10) (43.100) (44.4) (45.30) (35.6) (56.1) (37.1) (59.3) (69.3) (70.5) (71.43) (72.2) (73.21) (115.16) (116.1)) MASS SPECTRUM CORRECTED FOR BC = ((31.2) (41.13) (42.10) (43.100) (44.2) (45.30) (55.6) (56.1) (57.1) (59.3) (69.3) (70.5) (71.43) (72.1) (73.21) (115.16)) WAS A NMR. SPECTRUM AVAILABLE ? NO INFERRED MOLECULAR WEIGHT = 116 INFERRED EMPIRICAL FORMULA = C,H,,0 SUBGENERA INFERRED: NONE WAS A NMR. SPECTRUM AVAILABLE ? NO INFERRED MOLECULAR WEIGHT = 130 INFERRED EMPIRICAL FORMULA = C.H,,0 SUBGENERA INFERRED: *EA-SP-(CHy, CH,) (CyHy) 4 ISOMERS TOTAL NUMBER OF ISOMERS: 4 1408 Herverica Cuimica Acta - Vol. 53, Fasc. 6 (1970) - Nr. £65 The second part of the answer indicates that two methyl radicals are attached to the ‘S’ a-carbon atom and a butyl radical (1-butyl, sec-butyl, t-butyl or isobutyl) to the ‘P’ g-carbon atom. Each of the 12 oxygen superatoms built by the program is initially put on a list. The program then checks each superatom for consistency with the data (mass spectrum and NMR. spectrum if one was supplied). As soon as a superatom fails to pass a test, it is removed from the list. The final result shows all the remaining superatoms and, for each of them, the alkyl radicals attached to each free valence. Diagram 4 contains the mass spectrum of 4 and the answer given by the INFERENCE MAKER on the basis of that spectrum. The first test (test ‘SIZE’, Table 1) is related to the size of the empirical formula which the program deduced from the mass spectrum. To pass that test, a superatom must not require more carbon atoms than are available. The minimum number of carbon atoms required by the structure of each superatom in order to build the smallest possible molecule is calculated by the program by adding the number of free valences to the number of a-substitution symbols; these minimum numbers are reported in Table 1 for each superatom. For C,H,, +2 compounds (X = O or 5), all superatoms pass that test provided n is greater than 7. With our example (4), the program selected CgH,,0 as the second empirical formula, and no pruning was achieved by that test. For heptane-3-ol (1), superatom *EA-TT* is eliminated at that very early stage of the validation process. The next three tests are only effective when an NMR. spectrum is supplied, in which case they are employed prior to any mass spectrometry tests. In order to build a saturated molecule, each superatom requires a minimum number of methyl radicals (test ‘TMC’, Table 1), a definite number of methyl radicals linked to the heteroatom (test ‘HMC’, Table 1), and a maximum number of «-carbon bound hydrogen atoms (test ‘HYC’, Table 1). Any superatom for which one of these conditions is not satisfied by the signals present in the NMR. spectrum is discarded from further consideration and will henceforth not be tested against the mass spectral data. It should be stressed that the program uses NMR. spectra only as methyl counters and, if desired, as «-carbon proton counters !5), It does not rely on fully interpreted NMR. spectra; if the user has some doubts about the multiplicity of signals, or if no integration curve was recorded, the program will also accept partial information |3}. From the NMR. spectrum of heptane-3-ol (1) the program inferred the presence of two carbon-bound methyl radicals and no oxygen-bound methyl group (see Diagram 2, run 1). Superatoms *EA—PM*, *EA—SM* and *EA-TM*, which require the presence of a methoxy group, as well as all superatoms for which more than two methyl radicals are mandatory (see test ‘TMC’, Table 1) are eliminated by the NMR. filter. Only superatoms *EA-P*, *EA-S* and *EA-Pb* pass. With that particular compound, the same final result is obtained with and without the aid of NMR. data, as far as the number of inferred superatoms is concerned (see Diagram 2). Using NMR. data results in an efficient pruning at the very beginning of the validation phase, and assigns a straight chain structure to the C,H, radical. As no NMR. spectrum is recorded for isopropyl m-amyl ether (4), the program simply skips the NMR. tests. 15) A detailed description explaining how the program takes advantage of NMR. data is reported in our previous publication dealing with amines [3). HELvetica Cuimica Acta — Vol, 53, Vasc. 6 (1970) — Nr. 165 1409 The program then encounters the mass spectrometry tests!6), The first condition programmed in the mass spectrometry part of the validation process is depicted in Table 1 as ‘M-—XH,’. If the peak at m/e corresponding to the mass of the M--XH, ion appears with an intensity greater than 1%, all superatoms with names formed by more than one «substitution symbol are rejected. Mass spectra of secondary alcohols are allowed to display intensities between 1% and 100% for the M --H,0 ion, and those of tertiary alcohols any intensity (from 0% to 100°) for that ion, but for primary alcohols this ion must be present in the spectrum with a relative abundance greater than 2%. For superatom *X-P* (X = EA or TH), the program then requires that the only peak which can arise from #-cleavage exhibits an intensity above 10% (test ‘CH,=XH’, Table 1); if it does, the program calculates the average intensity of all ions belonging to the series ((Af —XH,) —C,H, x n), starting with n = 1 and ending at m/e 42 (test ‘EVION’, Table 1). If the average intensity exceeds 10% (20% for mercaptans), the program then checks the average intensity of ions C,H,,,,, and C,Hy,,_,, starting with n = 3 (m/e 41 and 43) and increasing n until m/e of ion C,H,,_4 equals the mass of M--CH,XH, where M represents the molecular weight. Superatom *X--P* is definitely accepted if this last value exceeds 30% when X = EA or 85% when X = TH. The mass spectrum of 4 does not exhibit an M—18 ion. Superatoms *EA-P* and *EA-S* are therefore eliminated. Methy! ethers with a mono-substituted a-carbon always expell CH,OH (32 mass units) upon electron impact; superatom *EA—PM* is rejected because no M 32 ion appears in the mass spectrum of 4 (test ‘AZ --CH,XH’, Table 1). The next tests programmed into the validation process pertain to conditions about o-cleavage ions and the corresponding C,,H,,, , , ions formed by fission of the C. X bond. For those superatoms which have only one free valence, the program requires an intensity greater than 10°% for the only possible x-cleavage ion (test ‘CH, XH’ and ‘CH,=XCH,’, Table 1). For any other superatom the program then builds all genera !”) in accord with the structure of the superatom and the empirical formula. In order to achieve that, the masses of all theoretically possible «-cleavage peaks are calculated. If n represents the number of free valences of a superatom, m/e of the lowest mass g-fission ion which can be pictured by using the superatom’s structure and the elements of the empirical formula is given by adding the mass of the superatom (#1) to the mass of n--1 methyl radicals; m/e of the heaviest potential a-fission ion corre- sponds to the mass of the M-.15 ion (M = inferred molecular weight). Considering superatom *EA-SP* (n = 3, m = 43), and empirical formula CgH,,0, potential x-scission ions can only have the following masses: m/e = 73 (C4H,O), 87 (C,H,,O), 16) Only tests for oxygen or sulfur SAM compounds will be discussed here. Those pertaining to amines have been extensively explained in our publication (3) and are still valid. M) A generic description or genus is defined as an entity displaying the superatam and the alkyt radicals available for saturating the free valences, without any specification about the precise distribution of these radicals among the free valences. For example, *EA- SP(CHy, CH, CyHg) is referred to as a genus. A description in which the respective positions of the radicals are unequivocally specified will be referred to as a subgenus. From the genus *EA-SP (CH, CH, CH), the two subgenera *EA—SP-(CH3, CH,) (CyHy) and *EA-SP.-(CHg, CyHy) (CH) can be formed, Subgenera represent structures which are completely defined, with the exception of the inner structure of the C,H, ,,, radicals attached to the x-carbon atoms when these radicals contain more than two carbon atoms. 89 1410 Hetvetica Cuimica Acta — Vol. 53, Fasc. 6 (1970) — Nr. 165 101 (C,H,30) and 115 (C,H,,0). From these masses, the program then calculates all combinations of m peaks which satisfy the following mathematical relationships: If M = Molecular weight, m = mass of the superatom and p; = m/e of an a- cleavage peak, then (p;, P;-1,---. P,) with p; < p;,, <<... < p,, isa valid combina- tion if the equation | 3” p; = (n--1) x M+ m| is satisfied. 7 With our example (4), three a priori valid combinations satisfy the equation. They are: (101, 101, 101), (73, 115, 115) and (87, 101, 115), which correspond to the two genera *EA—SP-(CH,,CH,,C,H,), *EA-SP-(CH,,C,H,,C,H,) and to the subgenus *EA-SP_-(C,H,;,C,H,,C,H;). It should be noted that for all polyvalent ether super- atoms the genera are built without reference to the mass spectrum. This is not the case for the two polyvalent alcohol superatoms *EA-S* and *EA—T*. Since «-cleavage plays amore important role for alcohols than for ethers, the program performs a preselection by constructing only the subgenera for which «-cleavage leads to a set of ions exhibiting a sum of intensities larger than 20%. With our example, from the three possible subgenera *EA-T(CH,,CH3,C,H,), *EA-T(CH,,C,H,;,C3H,) and *EA-T-(C,H;, CyH,,C,H,), only the first one is generated (see mass spectrum, Diagram 4). The validity of each genus is then tested for consistency with the mass spectrum, All the conditions about «cleavage are included in the multistep test ‘ALPHA’ reported in Table 1. Diagram 5 illustrates how the program arrived at the correct solution for the mass spectrum of 4. It shows which superatoms were discarded even before genera were constructed, which genera were built and how they were eliminated. All the subtests included in the general test ‘ALPHA’ are also recorded in Diagram 5. First the program requires that no potential «-fission peak except the MW --15 peak be absent from the spectrum (subtest ‘ANYZERO’, Diagram 5). As there are no peaks corresponding to the loss of either C,H, or CsH, from the molecular weight in the mass spectrum of 4, all genera with an ethyl or a propyl group attached to an «-carbon are eliminated. Out of the 19 genera and subgenera reported in Diagram 5, 13 were eliminated by that test. The next test is only performed for ethyl ethers having superatom *EA-PP* as a central subunit. For such subgenera the program requires that the ion CH,CH,OH (w/e 46) give a signal with an intensity greater than 2%. The subgenus *EA—PP-(CH,,C;H,,) is eliminated from further consideration by that test (subtest ‘ETHION’, Diagram 5). Important g-series peaks (CH,-XH,_,+i x 14), having masses smaller than the mass of the ion arising from «-cleavage expulsion of the largest alkyl fragment, cannot be accounted for if the molecular ion is one not susceptible to undergo a favored rearrangement process according to the mechanism depicted under 2 (see test ‘REARR’ off in Table 1). Since w/e 45 is one of the major peaks in the mass spectrum of 4, the two subgenera *EA-SM-(CH,,C,H,,) and *EA-TM-(CH,,CH3,C,Hg) are rejected (subtest ‘LOWP’, Diagram 5). By the same reasoning the program will eliminate any molecule if the mass spectrum under study exhibits a strong signal (> 10%) at a mass value above that of the ion formed by «-cleavage expulsion of the smallest alkyl fragment (subtest ‘NOHIP’, Diagram 5). With our example all the remaining candidates contain at least one a-carbon bound methyl radical; since mje 115 is the last peak in the mass spectrum, none of them is eliminated by that test. HELVETICA Cuimica Acta -- Vol. 53, Fasc. 6 (1970) — Nr. 165 1411 Oxygen containing fragments formed by «-cleavage, even if they do not stabilize the positive charge as well as nitrogen containing ones do, still can compete with alkv] radicals for charge retention; this affords diagnostically useful ions, especially when the size of the alkyl group is not large enough to allow them to be highly branched. Before performing the next test, the program checks the size of the biggest x-carbon bound alkyl group. If it is larger than C,H, it could contain a quaternary carbon atom and would then favorably compete with the heteroatom for charge retention. In such a case, no minimum value is assigned to the sum of the intensities of the «-cleavage ions. Yet, if the e-carbon atoms of ether and thioether molecules Diagram 5. Description of the inference phase with isopropyl n-amyl ether (4) PS TT PM SM TM SP TP SS TS) TT - » W-18 Y - - Too. SM TM sP TP SS TS TT <« A — 32 | v BUILD GENERA v T-(CHg,C;H,,) TS-(CHg, CHy, CH, CH, CoH) TT-~(CHy, CHg, CHg, CHy, CH, CH) PP-(CyH5, CyHy) SM-(C,H,, CyH,) PP (CH. C5Hy)) SM-(CH3, CsHy,) PP-(CyHy, CgHa) SM-(C3H;,CgH,) TP-(CHy, CHg, CHy, Cl) SS—(CH3, CHy, CH, C,H,) TM-(CHg, CH, C,Hy) SP. (CHy, CHg, C4Hy) TP-(CHy, CHy, CgHs, CyH,) SS--(CHg, CHy, CgHy, CoH,) TM-(CH,, CyH,, CsH,) SP-(CHy, C,H, CgH,) TM.--(CgH,, CoH, CyHs) SP-(CyH5, CoH, CoH,) SP-(C Hy, CH,) (CyHy) SP-(CHy, CyHg) (CH,) Y TESTS PERTAINING 10 «CLEAVAGE ANYZERO -» ETHLION ~ » NOHIP- » ALPHASUM BUILD SUBGENERA < v ANSWER IS: ———» REARR SP-(CHy,CH,)(C,Hy) > LOWP -» BRANCH —~» ALLIS SP-(CH,CHg,C,H,) SP-(CHg. CH) (CyH,) SP-(CHy, CH) (CH) ALIXFIT < “ f v (CHg}pCH-O-CH,-(CH,),-CH, (CH4),CH-O-CH,-CH(CH,)-CH,-CH, (CH,),CH-O-CH,-CH,-CH(CH,), (CH,)gCH-O-CH,-C(CH,)s 1412 HeEtvertica Cuimica Acta — Vol, 53, Fase. 6 (1970) - Nr. 165 bear only small radicals, the total ion current carried by the «-cleavage ions should amount to at least a value representing 10% of the current carried by the ion giving the strongest signal. None of the remaining molecules were eliminated by that test (subtest ‘ALPHASUM’, Diagram 5). At 70 eV the larger alkyl fragment is preferen- tially expelled in an ¢-cleavage. For molecules which can generate more than one ion by «-cleavage, the program requires that each ion produced in such a way gives a stronger signal than the immediate next heavier ion formed by the same process, provided the alkyl radical expelled to give the heavier ion is smaller than C,H,. If it is C,H, or larger, it could be a secondary or even a tertiary radical, and the program weakens its requirement; in such a case the intensity of the low mass ion has to be greater than (0.5 +(0.1 x AC) x 1) where 1 stands for the intensity of the higher mass ion and AC for the difference in size between the two alkyl radicals lost to give the two a-cleavage peaks which the program compares. This test takes intv account the possibility of branching as well as the respective sizes of the C,H,, , , radicals expelled (subtest ‘BRANCH’, Diagram 5). Candidate *EA-T-(CH,,C;H,,) is expected to give a stronger signal for ion M —C,H,, than for ion M —CHg; since this is not the case in the mass spectrum tabulated in Diagram 4 (see m/e 115 and m/e 59), that molecule is rejected. When a molecule has a methyl radical attached to one of its «-carbon atoins, the M- 15 ion is often missing from the mass spectrum, especially when larger radicals can be expelled by «-cleavage from other sites. But, if all «cleavages lead to the M - 15 ion, 7.e. if the molecule bears only methyl groups on its z-carbons, the program will keep such a molecule for further test only if the M@—15 ion appears in the spectrum with a relative abundance exceeding the value of 20 x (1-1/m), where m represents the number of methyl radicals attached to ¢-carbon atoms. The subgenus *EA-TT- (CH,,CH,,CH,,CH,,CH;,CH,) would have passed that test (subtest ‘ALL15’, Diagram 5) if m/e 115 had shown up with an intensity greater than 20 x 5/6, te. greater than 16%. For all the remaining candidates for which there exists more than one way to distribute the alkyl radicals among the free valences of the superatom, the program then builds subgenera out of the genera. From *EA-SP-(CHg, CH3,C,H,), the only genus not rejected at that stage of the validation process, the program builds the two subgenera *EA-SP-(CH,, CH,) (C,H,) (5) and *EA-SP-(CH3, C,H) (CHg} (6). CH,—CH--O—CH,—C,Hy CyHy—CH—O—CH,—CH, | CH, 5 CH, 6 The program then simulates for structures 5 and 6 the rearrangement process depicted under 2; it calculates the mass for every potential ion arising from such a mechanism. If at least one signal corresponding to such an ion is present in the mass spectrum with a relative abundance above 25% (15% for thioethers and 30° for amines), the molecule passes the test successfully (test “REARR’, Table 1). From + + structure 5 the two ions CH,-CH=OH (m/e 45) and CH,=OH (m/e 31) can originate from a-cleavage followed by simultaneous hydrogen transfer to the oxygen atom and C-O bond scission. Structure 6 can also lead to these two ions and, in addition, to ion HELVeEtTica CuHImica Acta — Vol. 53, Fasc. 6 (1970) — Nr. 165 1413 C,H,-CH=OH (m/e 87). Since m/e 45 has an intensity of 30% in the mass spectrum under study, both structures 5 and 6 are accepted by that test (see Diagram 5). The last test which each remaining candidate undergoes is depicted in Table 1 as ‘ALKFIT’. The final decision about keeping or rejecting a molecule depends on the relative abundances of the C,H.,., ions formed by rupture of the C-O bond. The minimum intensity each alkyl ion should exihibit is related to its size and to the degree of substitution of the carbon atom which was originally an x-carbon of the molecular ion. The higher the degree of substitution of this carbon atom, the more likely is C-O bond rupture with charge retention on the alkyl moiety. Moreover, as large alkyl ions tend to further decompose, the bigger the alkyl ton the less important its diagnostic value as a potential ion. The program requires that all C,H,,,,, ions (with n > 2) formed by cleavage of the C-O bond give signals with intensities exceeding the integer value of (500 + (150 » s})/n3, | where s represents the degree of substitution of the «-carbon atom (0 for -CHg, 1 for | | —CH- and 2 for —-C-) and n the number of carbon atoms in the alkyl ion. Alkyl | radicals which are branched at the «-carbon atom are thus required to yield stronger signals than the corresponding unbranched ones; the minimum required intensity decreases also as the size of the alkyl radical increases. For example, the relative abundance of a peak corresponding to a C,H,, ion must exceed 4% if the a-carbon atom is not branched, 5° if it is mono-substituted, and 6% if it is di-substituted }4) ; the above mentioned formula allows unbranched C,H,,,,i0ns to be missing from the mass spectrum if they are larger than C,H,,. With our example, candidate 6 would have passed that test if at m/e 85, which corresponds to a C,H, ion formed according 7, 85 71 43 C,Hy~CH4-0-—CH,-CH, CH, CH} -0 4CH-CH, 1 CH, 7 CH, 8 a peak had been present with a relative abundance greater than (500 + (1 «150))/216, t.€. 3%. The correct molecule (5) is accepted by that final test. Peaks at m/e 43 (C,H,) and m/e 71 (C,H,,) originating from the following cleavages (8) are bigger than respec- tively (500 4150)/27, 7.2. 24%, and 500/125, t.e. 4% (see spectrum tabulated in Diagram 4). Finally, a subroutine program calculates the number of isomers which are compatible with the structure of each subgenus inferred. Diagram 4 shows that the program first selected from the mass spectrum of 4 a molecular weight of 116 amu., and henceforth attempted to validate a C,H,,0 structure. Since no such molecule could fully explain the mass spectrum, the program repeated the process with C,H,,0 and found the correct answer. The fact that the program did not get misled by the absence of the molecular ion at m/e 130 brings up the following question: Would an experienced mass spectrometrist have rejected all C,H,,O isomers ? The results we have obtained with 210 mass spectra are reported in Tables 2, 3, and 4. Results for 31 amine mass spectra other than the ones listed in Table 4 18) These values are calculated from the formula (500+ (150+ s))/n3, with n = 5 ands = 0,1 and 2 respectively. 1414 HELVETICA Cuimica Acta — Vol. 53, Fasc. 6 (1970) — Nr. 163 are already reported in one of our publications [3]. The correct structure is always included in the answer. In all cases the initial search space’) is already curtailed tremendously by using only mass spectral data. The results we have obtained Table 2. Results for ether and alcohol mass spectra Alcohol Number = Number Ether Number = Number of of of of CyHen+2O inferred Cy Hensg0 inferred isomers isomers isomers isomers A B \ B n-butyl 7 2 1 Methyl #-propyl 7 2 1 isobutyl 7 2 1 Methyl isopropyl 7 3 1 sec- Butyl 7 3 2 Methyl n-buty] 14 2 1 2-methyl-2-butyl 14 1 1 Methyl isobutyl 14 2 1 1-pentyl 14 4 1 Ethyl isopropy! 14 1 1 3-pentyl 14 1 1 Ethyl 2-butyl 32 4 1 2-methyl-1-butv1 14 4 2 Ethyl isobutyl 32 + 2 2-penty! 14 2 1 Ethyl sec-butyl 32 2 2 3-hexyl 32 2 1 Ethyl t-butyl 32 1 1 3-methyl-1-pentyl 32 8 4 Di-n-propy! 32 1 1 4-methyl-2-penty] 32 4 1 Di-isopropyl 32 1 1 i-hexyl 32 8 1 n-Propyl n-butyl 72 2 1 3-heptyl 72 4 1 Ethyl n-pentyl 72 4 1 2-heptyl 72 8 L Methyl z-hexyl 72 8 1 3-ethyl-3-pentyl 72 1 1 fsopropyl sec-butyl 72 3 2 2,4-dimethyl-3-penty! 72 3 1 Isopropyl v-pentyl 171 4 1 1-heptyl 72 17 1 n-Propyl n-penty] 171 4 1 3-methyl-l-hexyl 72 17 6 Di-n-butyl 171 3 1 J-octyl 171 39 1 Isobutyl t+ butyl 171 2 1 3-octyl 171 8 i Ethyl »-heptyl 405 34 1 2,3,4-trimethyl- 171 3 1 n-Butyl n-pentyl 405 8 1 3-pentyl Di-n-pentyl 989 10 1 1-nonyl 405 89 1 Di-isopentyl 989 i8 7 2-nonyl 405 39 1 Di-n-hexyl 6 045 125 2 1-decy} 989 211 1 Di-n-octyl 151 375 780 1 6-ethyl-3-octyl 989 39 9 Bis-2-ethylhexyl 151 375 780 21 3, 7-dimethyl-1-octyl 989 211 41 Di-n-decyl 11 428 365 22 366 1 1-dodecy! 6 045 1 238 1 2-butyl-l-octyl 6045 1238 25 1-tetradecyl 38 322 7 639 1 3-tetradecyl 38 322 1 238 1 1-hexadecy] 151 375 48 865 1 A = Inferred isomers when only mass spectrometry is used. B = Inferred isomers when the number of methyl radicals is known from NMR. data. 9) Since the program starts without knowing the elemental composition, it is not possible to assign a definite value to the size of the search space. Once the program has inferred an empiri- cal formula Cy Hgn+yX (v = valence of X), the search space includes all the isomers of empirical formulae Cy Hy pry, Cn pHoataeyX and CysoHy pygeyX- Phe number of a priori possible isomers reported in tables 2, 3 and 4 for each compound, has been limited to all the isomers correspond- ing to the correct empirical formula. These numbers are calculated by a subroutine of the INFERENCE MAKER program. In one of our previous publications [8] the number of iso- mers with empirical formulae C,,H,,O and C,,H,gO have been wrongly reported to be 2460.and 6123 respectively; they should be corrected to 2426 and 6045. Hetvetica Cuimica Acta — Vol. 53, Fasc. 6 (1970) ~ Nr. 165 1415 also show that if NMR. spectra were used (only as methyl counters) the structure determination would be completely solved for many of the examples reported in Tables 2, 3, and 4. It can be concluded, that even without the aid of NMR. spectrometry, the effi- ciency of the INFERENCE MAKER program is such that the PREDICTOR program of Heuristic DENDRAL cannot further differentiate between the inferred structures. If desired, the STRUCTURE GENERATOR program can be used to draw the structures. Although we agree that ‘saturated acyclic monofunctional’ molecules Table 3. Results for thioether and thiol mass spectra Thioether Number Number Thiol Number Number of of of of C,Hen425 inferred CyHgn,eS inferred isomers isomers isomers isomers A B A B Methyl ethyl 3 1 1 n-Propyl 3 2 i Methyl #-propy] 7 1 1 Isopropyl 3 1 1 Methyl isopropyl 7 7 1 n-Butyl 7 3 1 Di-ethyl 7 1 1 Isobuty! 7 3 1 Methyl w-buty! 14 3 1 t-Butyl 7 1 l Methyl isobutyl 14 5 2 2-methyl-2-butvl 14 1 1 Methyl ¢-butyl 14 1 1 3-methyl-2-butyl 14 2 1 Ethyl isopropyl 14 1 1 3-methyl-1-butyl 14 6 3 Ethyl »-propy] 14 2 1 1-pentyl 14 + 1 Ethyl u-butyl 32 3 1 3-pentyl 14 5 3 Ethyl ¢-butyl 32 1 L 2-pentyl 14 6 3 Ethyl isobutyl! 32 3 2 1-hexyl 32 8 1 Di-n-propyl 32 2 1 2-hexyl 32 12 5 Methyl a-pentyl 32 10 1 2-methyl-1-pentyl 32 8 + Di-isopropy] 32 1 1 4-methyl-2-pentyl 32 4 2 Ethyl x-pentyl 72 4 1 3-methyl-3-pentyl 32 1 1 n-Propyl a-butyl 72 5 1 2-methyl-2-hexyl 72 8 3 Tsopropyl 7-butyl 72 5 2 1-heptyl 72 17 1 Isopropyl! f-butyl 72 1 1 2-ethyl-1-hexyl 171 39 9 n-Propy] isobutyl 72 3 2 1-octyl 171 39 1 Tsopropy! sec-butyl] 72 4 3 i-nonyl 405 389 1 n-Propyl #-pentyl 171 4 1 1-decyl 989 211 1 Ethyl #-hexyl 171 8 1 1-dodecy] 6 045 1 238 1 Di-z-butyl 171 5 1 Di-sec-buty] 171 3 1 Di-isobutyl 171 3 1 Methyl #-heptyl 171 21 1 Di-n-pentyl 989 12 1 Di-n-hexyl 6 045 36 1 Di-n-hepty! 38 322 153 1 A = Inferred isomers when only mass spectrometry is used. B = Inferred isomers when the number of methyl radicals is known from NMR. data. represent only a small fraction of all known organic compounds, it is interesting to realize that with those compounds, the program in general performs better than an experienced mass spectyvometrist. More important perhaps is the fact that this kind of 1416 Heivetica Curmica Acta» Vol. 53, Fasc. 6 (1970) — Nr. 165 research requires a formalization of mass spectrometry rules; such a formalization did not exist before. In view of the success with which the mass spectra of SAM compounds were interpreted, especially those of ethers and alcohols which are known to be difficult to interpret without taking advantage of low voltage data [9], we believe that no major obstacle exists which would prevent such a program from working with more complicated molecules. Table 4. Results for amine mass spectra Amine Number = Number Amine Number = Number of of of of CyHents’ inferred CyHenrgN inferred isomers isomers isomers isomers A B A B 1-propy! 4 1 1 N-methyl-di-isopropyl 89 15 3 Isopropyl 4 2 1 1-octyl 211 39 1 1-butyl 8 2 1 Ethyl-1-hexyl 211 24 1 lsobutyl 8 2 1 1-methylheptvl 211 34 1 sec-Butyl 8 4 2 2-ethylhexy] 211 39 9 t-Butyl 8 3 1 1, 1-dimethylhexyl 211 32 4 Di-ethyl 8 3 1 Di-1-butyl 241 24 1 N-methyl-x-propyl 8 4 1 Di-sec-butyl 211 33 8 Ethyl-n-propyl 17 5 1 Di-isobutyl 211 17 5 N-methyl-di-ethyl 17 4 1 Di-ethyl-2-butyl 211 17 3 1-pentyl 17 4 1 3-octyl 211 26 2 Isopenty! 17 4 2 i-nonyl 507 89 1 2-pentyl 17 2 1 N-methyl-di-n-butyl 507 13 1 3-pentyl 17 5 1 Tri-1-propyl 507 2 1 3-methyl-2-butyl 7 4 1 Di-1-pentyl 1 238 83 1 N-methyl-1-butyl 17 4 1 Di-isopentyl 1 238 109-16 N-methyl-see-butyl 17 3 1 N, N-dimethyl-2- 1 238 156 9 N-methyl-isobutyl 17 4 1 ethylhexyl l-hexyl 39 8 1 1-undecy! 3057 507 1 Tri-ethyl 39 2 1 1-dodecyl 7639 1 238 1 2-hexyl 39 8 1 1-tetradecyl 48 865 10115 1 Di-1-propyl 39 8 1 Di-1-heptyl 48 865 646 1 Di-isopropy! 39 8 1 N, N-dimethyl-1- 48 865 4952 1 N-methyl-1-pentyl 39 8 1 dodecyl N-methyl-isopentyl 39 8 2 Tri-1-pentyl 124 906 40 1 Ethyl-n-butyl 39 6 1 Bis-2-ethylhexyl 321 988 2340 24 N,N-dimethyl-1-butyl 39 10 1 N, N-dimethyl-1- 321 988 3.895 1 1-hepty] 89 17 1 tetradecy] Ethyl-1-pentyl 89 16 1 (Di-ethyl)-1-dodecyl 321 988 2476 1 1-Butyl-isopropyl 89 1t 4 1-heptadecyl 830219 124906 1 4-methyl-2-hexyl 89 16 4 N-methyl-bis-2- 830 219 2340 24 ethylhexyl 1-octadecyl 2156 010 48 865 1 N-methyl-1-octyl- 2156 010 15 978 1 1-nonyl N,N-dimethyl- 14715813 1284792 1 1-octadecyl A = Inferred isomers when only mass spectrometry is used. B = Inferred isomers when the number of methyl radicals is known from NMR. data. Hetvetica Curmica Acta — Vol. 53, Fasc. 6 (1970) — Nr. 165 1417 Financial assistance from the Advanced Research Projects Agency (contract SD-183), the National Aeronautics and Space Administration (grant NGR-05-020-004) and the National Institutes of Health (grants AM-12758 and AM 04527) is gratefully acknowledged. Experimental. — The computer program described here runs on the IBM 360/67 computer at the Stanford Computation Center. It is written in the LISP programming language. The computer can interpret low resolution mass spectra at a rate of 20 spectra per minute. Mass spectra which had not been reported in the literature were recorded in our laboratory, some with a Varian MAT CH-4 mass spectrometer, others with an AEI MS-9 instrument. BIBLIOGRAPHY [1] ¥.M. Sheikh, A. Buchs, A. B.Delfino, G.Schroll, A.M. Duffield, C.Djerassi, B.G. Buchanan, G.L. Sutherland, E.A. Feigenbaum & J. Lederberg, Org. Mass Spectrom., submitted for publica- tion. °2) G.Schroll, A.M. Duffield, C. Djerassi, B.G. Buchanan, G.L. Sutherland, FLA. Feigenbaum & J. Lederberg, J. Amer. chem. Soc. 97, 7740 (1969). [3] A. Buchs, A.M. Duffield, G. Schvroll, C. Djerassi, A. B. Delfino, B.G. Buchanan, G.L. Sutherland, E.A. Feigenbaum & J. Lederberg, J. Amer. chem. Soc., submitted for publication. (4) H. Budzikiewicz, C. Djerassi & D.H. Williams, ‘Mass Spectrometry of Organic Compounds’, pp. 100-101, Holden-Day, San Francisco 1967. [5] Op. cit. [4], p. 300. [6] Op. ct. [4], pp. 9-14. [7] Op. eit. [4], p. 297. [8] J. Lederberg, G.L. Sutherland, B.G. Buchanan, E.A. Feigenbaum, 4.V. Robertson, A.M. Duffield & C. Djerassi, J. Amer. chem. Soc. 97, 2973 (1969). (9] Op. cié. [4], p. 231.