3) Generation of both soft and hard copy displays of data
contained in the Context Base and the MDB.

4) User selectable analysis of any data available to the DMS.

The implementation of METASYS is not yet complete. The
PMON and the basic MVI processes are functional, but the DMS
development has not yet been completed. This development is
proceeding rapidly, however, which can be attributed to the
advantages of program development on the PDP 10.

1.9 Summary

As the above hardware and software improvements are being
made we will continue evaluation of the GC/HRMS' system in
parallel with its actual application to real problems. GC/HRMS

is a relatively new and difficult technique for routine
application. In order to use it effectively, we will have to
exert some effort toward determining and optimizing the
performance of the many elements of the system, the GC, the MS,

and the computer hardware and software.

2 PART 2: DEVELOPING PERFORMANCE AND THEORY FORMATION
PROGRAMS TO ASSIST IN BIOMEDICAL STRUCTURE ELUCIDATION PROBLEMS

2.1 Introduction

The Heuristic DENDRAL computer programs assist with
structure elucidation problems by helping interpret mass spectra
and helping generate structures that are consistent with data
obtained from a variety of other spectroscopic and
physical/chemical sources. The Meta-DENDRAL programs assist with
rule formation problems in eases where the rules of mass
spectrometry are not known.

Both the interpretation and rule formation programs are
written as interactive tools to be controlled by professionals to
combine the professional’s judgment with the computer’s
combinatorial power.

13
2.2 CONGEN

The CONGEN [48,53] program represents a significant
extension of a program which has developed over the last several
years, the cyclic structure generator [40,41]. The purpose of
CONGEN is to assist the chemist in determining the chemical
structure of an unknown compound by 1} allowing him to specify
certain types of structural information about the compound which

he has determined from any source (e.g., spectroscopy, chemical
degradation, method of isolation, ete.) and 2) generating an
exhaustive and non-redundant list of structures that are

consistent with the information. The generation is a stepwise
process, and the program allows interaction at every stage; based
upon partial results the chemist may be reminded of additional
information which he can specify, thus limiting further the
number of final structures.

CONGEN fits with the other DENDRAL programs as a "backstop"
solution to structure elucidation problems. If the mass spectrum
of an unknown compound is available, then CLEANUP and MOLION
could be used, but if the general class of the compound is not

known, PLANNER has no starting point from which to work. In such
cases, structural information can be extracted manually from the
Spectrum and given to CONGEN for analysis. Because CONGEN makes

no assumptions about the source of this information, other
spectroscopic or chemical techniques may be used to supply
Supplemental data.

At the heart of CONGEN are two algorithms whose accuracy
has been mathematically proven and whose computer implementation
has been well tested. The structure generation algorithm
{31,37,40,41] is designed to determine all topologically unique
ways of assembling a given set of atoms, each with an associated
valence, into molecular structures. The atoms may be chemical
atoms with standard chemical valences, or they may be names
representing molecular fragments ("sSuperatoms") of any desired
complexity, where the valence corresponds to the total number of
bonding sites available within the superaton. Because the
structure generation algorithm can produce only structures in
which the superatoms appear as single atoms (we refer to these as
intermediate structures), a second procedure, the imbedding
algorithm [48,53] is needed to expand the superatoms to their
full chemical identities.

These two routines give the chemist the ability to
eonstruct

structures from a given set of molecular “building blocks"
which may be atoms or larger fragments. By itself, this capacity
is of limited utility because the number of final structures can
be overwhelming in many cases. Usually, the chemist has
additional information (if only some general rules about chemical
stability, which the program has no concept of) that can be used
to limit the number of structural possibilities. For example, he

14
may know that because of a compound’s’ stability, it cannot
contain a peroxide linkage (0-0) and thus the programs need not
consider such structures when there are two or more oxygens in
the "building block" list.

In the past year CONGEN has-7 reached the level of a
practical production program which can aid chemists, both locally
and at remote network sites, in solving the structures of drug-
related compounds and natural products. The development of this
program during the year has been strongly guided by the
difficulties and new requirements which have appeared as it was
applied to awide variety of cases, and its efficiency and
usefulness have increased dramatically. We report here the
details of the modifications and additions we have made to
CONGEN, and the effects they have had on its utility. Also,
because of the rich repertoire of structure modification and
testing functions available within CONGEN, we have found it to be

an invaluable "laboratory" for the testing of new ideas, and we
briefly describe two pilot projects which form the basis for
future research. Discussion of applications of CONGEN to

problems of biochemical interest is included in Part 3.
Program modifications

DEPTH-FIRST GENERATION. This modification has been both
the most difficult and the most useful. The structure-generation
algorithm which was originally part of CONGEN processed the
"tree" of subgoals and subgoals-of-subgoals ina breadth first
fashion. Although this was the most logically coherent and
understandable encoding of the algorithm, it meant that a user
would have to wait until the very end of a generation problem
before he could see any of the results. This was particularly
frustrating when a problem was submitted to CONGEN which was too
big and/or time-consuming, because the user could never get any
results at all. To alleviate this difficulty, we undertook a
complete reorganization of the structure-generation algorithm so
that it would proceed depth-first, giving results continuously as
the computation progressed.

It is difficult to communicate the complexity of such a
reprogramming without a major digression, but the flavor of the
necessary changes is captured in the following example. At
several points in the algorithm, there are what might be called
"branching funetions" whose purpose it is to solve some
intermediate problem which has several alternate solutions. It
is easiest to define such a funetion so that it computes the
whole list of possibilities and returns the list to the caller.
It is then the caller’s responsibility to determine what is to be
done with each possibility, and the branching function itself can
be viewed as a separate module. This is a breadth first
approach, and the difficulty is that the caller can make no
progress until the branching function has eonstructed and
returned all possibilities. The depth-first approach is to have
the branching function itself be responsible for further

15
processing each time it creates a new result. To retain the
modularity of the branching function, some mechanism is needed to
allow the caller to "tell" it what this further processing
consists of, and such a mechanism was instituted throughout the
structure-generation algorithm.

We made use of the depth-first generation by instituting an
interrupt mechanism in CONGEN whereby a user can examine the
developing list of structures as they are created. This isa
tremendous advantage both psychologically, because it gives the
user a feeling that the program is "doing something", and
operationally because it provides rapid feedback. A chemist can
now often see quickly that a given case will create many more
Structures than expected, and the intermediate output can suggest
forgotten constraints or superatoms. The following is an example
of a terminal session in which the interrupt mechanism is used.
The character control-S gives a "snapshot" of progress on the
problem while control-I allows for the drawing of partial
results. Both of these features are illustrated in the sample
CONGEN session shown in Appendix A.

NEW CAPABILITIES FOR THE USER. There have been several
additions to CONGEN which are visible to the user and which
generally increase the flexibility and power of the program.
These include

1) Making CONGEN aware of aromaticity, a chemical property
of molecules which results from certain combinations of double
bonds in rings. Aromaticity has a profound effect upon both the
chemical reactivity and symmetry properties of molecules, and
CONGEN can now be directed to detect aromaticity in its output
structures, to compensate for the difference between the actual
symmetry of an aromatic system and the symmetry which appears in
the graph representing it, and to distinguish aromatic from non-
aromatic atoms when it tests GOODLIST and BADLIST entries.

2) Giving the user the ability to type "?" to any prompt in
the program, which results in a summary of the possible inputs.
In some cases this summary is a list of possible commands, while
in others it is a short explanatory message. A new interactive
teletype-input routine was developed which makes it easy to
include such help messages in the program, and which mimics the
handy command-recognition and command-completion features of the
TENEX operation system.

3) Including new specifications in the EDITSTRUC language
for describing substructural features. The user can now declare
a bond in a substructure to be an "anybond", which means that the
atoms at the termini are connected but that the multiplicity of
the connection is unspecified. This is especially handy when
defining substructures containing aromatic portions because bond
multiplicity is an indistinet concept in aromatic systems.
Another new structural element which can be specified is a
"Linknode", a node which stands for a variable-length chain of

16
atoms of the given type rather than a single atom. The minimum
and maximum lengths of such a chain can be specified as well.
The linknode feature is useful for defining constraints on ring
fusions and other constraints such as Bredt’s rule which depend
on path length. Other extensions have been made internal to
CONGEN which will shortly be reflected in the user-level language
of EDITSTRUC. These include numerical inequalities involving
node properties (e.g., "the number of H’s on atom 3 is greater
than the number of H’s on atom 5") or linknode lengths (e.g.,
"the sum of the lengths of linknodes 2 and 6 is greater than 5"),
and greater control over the number of fittings found for a
GOODLIST constraint (e.g., the ability to distinguish between
"the number of N’s in six-membered rings" and "the number of six-
membered rings containing N").

Y)Allowing greater flexibility in the selection of terminal
type. This choice controls the output of structural drawings so
they are best suited to the user’s terminal. Several different
types of character-oriented and graphics-display terminals are
now supported.

5) Making CONGEN accessible from the GUEST login account at

SUMEX. This involved preventing a GUEST user from reaching
eertain critical points in CONGEN which would allow greater
system access than is normally authorized for guests. We can now

offer trial access to CONGEN via the guest mechanism without
worrying about SUMEX misuse.

6) Creating a BATCH command for CONGEN. This allows the
user to submit time-consuming, compute-bound calculations to the
batch-processing facility of SUMEX. The computation is then run
automatically at off-hours when it will not overload the system
resources. The user can now run CONGEN in its interactive mode
to input all of his data and then submit the large tasks to BATCH
for overnite processing.

7) Including a pruning function MSPRUNE which is used to
test a list of candidate structures for consistency with a set of
observed peaks from a mass spectrum. The candidates are
typically generated by CONGEN using structural data from other
sources. The user specifies the observed MS peaks (as elemental
compositions or nominal masses or a combination of both) along
with a set of constraints on the allowed cleavage processes.
MSPRUNE retains only those candidates which can account for the

observations via one of these allowed processes. The constraints
Speak of the number of bonds broken and the number of steps ina
process, the proximity of pairs of cleaved bonds {i.e., whether

or not two adjacent bonds can break in a given process), the
multiplicity or aromaticity of each cleaved bond and the possible
neutral transfers. MSPRUNE is the first CONGEN function which
can aid directly in the interpretation of "raw" spectral data.

8) Internal CONGEN Developments. The basic algorithms used
for structure generation in CONGEN are firmly rooted in

17
mathematical graph theory. During the past year, there has been
Significant refinement of several of these graph theoretical
algorithms. The new algorithms have been coded in SAIL, an
extended ALGOL type language; and a sophisticated executive has
been developed to coordinate the various SAIL routines as well as
to direct the communication and control between the SAIL
component and LISP component of CONGEN.

The power and utility of CONGEN rests, to a great extent,
on the fact that it can generate structures under user supplied
constraints. The most powerful of the routines used in
constrained structure generation is the fragment imbedder
[37,48]. It is this routine which permits CONGEN to efficiently
generate only those structures’ containing given polyatomic
fragments (i.e., superatoms). The fragment imbedding program was
completely rewritten so that it operates now in a "depth first"
rather than "breadth first" style. This was done so that the
user can request CONGEN to produce examples only of candidate
structures in those cases where the total number of candidate
Structures is very large. This change also increases the
efficiency of the fragment embedding process.’ and has’ the
advantage that if a CONGEN run must be interrupted, the user is
left with at least some candidate structures rather than just
intermediate results.

During the grant period, a very general substructure
matching algorithm was developed and coded in SAIL. This
algorithm accepts as input a structure and a "pattern" and

returns the number of times the pattern distinctly occurs in the
structure. Here a pattern is a partially specified substructure
in which atom names, bond widths and hydrogen attachments all may
assume a range of values. This routine is used by CONGEN for
post checking of structures and classifying lists of structures.

An improved technique to determine the topological symmetry
group of a structure was also developed and coded in SAIL. This
routine is used in several parts of CONGEN, e.g., fragment
imbedding. This new routine is, statistically, at least an order
of magnitude faster than the old group finding routine.

The language LISP, although quite powerful, does not
produce very efficient machine code. It was for this reason that
several of the routines used by CONGEN were coded in SAIL.
However, because of the widely variant data types, LISP and SAIL
are not compatible languages. Hence, all of the SAIL programs
reside in their own TENEX fork, and they communicate with the
LISP fork via a shared memory page. The new CONGEN SAIL code
executive program handles all interfork communication for the
SAIL routines, and it allows one to make additions or
modifications to the SAIL portion of CONGEN with relative ease.
This ease of change is also aided by the fact that all the SAIL
programs are written in highly modularized form.

Preliminary testing of the new CONGEN SAIL fork indicates

18
these modifications and additions will yield a Significant
increase in the overall efficiency of CONGEN, and hence will
enable one to consider a broader range of chemical problems.

INTERNAL CONGEN IMPROVEMENTS = LISP. Because of the divers
assortment of chemical problems to which CONGEN has been applied,
we have been able to exercise all parts of the program ina
variety of contexts. As a result, we have been able to uncover a
number of hidden inefficiencies in the LISP section of CONGEN,
and although correcting these has not had a direct impact on the
command structure of the program, we estimate that a decrease of
over 50 in CPU time has been achieved for typical CONGEN cases.
In some cases this decrease is as high as 90.

These improvements have been numerous, but one stands out
as most significant. Several changes were made to the graph-
matching routine which is responsible for testing the presence or
absence of structural features in molecules or molecular
fragments. The new routine uses list space (a key resource in
the LISP programming system) much more Parsimoniously, and it
incorporates a new and very efficient representation of
Substructures which makes optimum use of the linked-list data
representation in LISP, Also ineluded were a number of
heuristics which, although they do not alter the Output of the
graph matcher, do dramatically decrease the amount of time Spent
on typical tests. The highly efficient SAIL graph matcher,
described above, will soon supplement the LISP version, thought
the latter will still be needed in some cases because of its
greater flexibility.

Other inefficiencies were detected and fixed in the portion
of CONGEN which builds tree-like molecules and molecular
fragments, where it was discovered that a built-in assumption
(that the most common monovalent atom would be hydrogen) was
adversely effecting the running times of some CONGEN eases, and
in the portion responsible for eomputing the symmetry groups of
graphs.

PILOT PROJECTS. CONGEN provides an excellent environment
for the testing of new ideas because it contains an extensive
"library" of functions for the creation, manipulation and testing
of topological representatives of molecular structure. Below we
describe two pilot projects which were explored within this
environment and which provide the basis for proposed future
research topics.

We developed within CONGEN a program called XMECH [60]
whose purpose it was to study the possible mechanisms of
ecyclizations and skeletal rearrangements of monoterpanes,
terpanes and sesquiterpanes. The study of these compound classes
is an important sub-field of natural-products chemistry, and
Simple carbonium-ion mechanisms, such as cyclizations to double
bonds and 1,2-alkyl and/or 1,2-hydride shifts, are frequently
invoked to rationalize interrelationships between various

19
skeletal types. Using XMECH we were able to explore various
combinations of these basic mechanisms and to develop exhaustive
lists of skeletal types, known and unknown, which should be
accessible from known biogenetic precursors via this approach.
Our results indicate that although such mechanistic
rationalizations are widely used, the method is quite non-
selective: If a sufficient number of mechanistic steps is
included to account for even a modest fraction of known
skeletons, a vastly larger number of skeletal types are obtained
which have never been seen in nature, It seems clear that there
are much subtler mechanistic considerations which account for the
Specificity of biogenetic pathways, and our work points out the
danger of rationalizing that specificity with an overly simple
model. XMECH has laid the groundwork for a much more general
Program, REACT, in which a user will be able to define chemical
reactions and apply them to problems of mechanistic chemistry and
structure elucidation.

A second pilot project is the Program MDGGEN which embodies
a new, general approach to the interpretation of a mass Spectrum
in terms of structural possibilities for an unknown. The method
used in MDGGEN compliments the MSPRUNE function described above
(section 7 of NEW CAPABILITIES FOR THE USER) because it uses MS
data at the beginning of a problem rather than as a final filter
on candidate structures. Whereas MSPRUNE is logically part of
the TEST phase in the traditional DENDRAL scheme of PLAN-~
GENERATE-TEST, MDGGEN logically belongs in the PLAN phase.
Conceptually, MDGGEN is related to the PLANNER program, except
that MDGGEN analyzes MS data without relying upon class-specific
fragmentation rules as does PLANNER. Using avery simple and
general fragmentation theory, MDGGEN processes’ selected peaks
from a mass spectrum and constructs possible ways of segmenting
the overall composition of the molecule to account for those
peaks. These segmented descriptions are graphs Similar to
topological chemical structures except that one node may stand
not just for a single chemical atom, but a collection of atoms fa
composition) representing a connected piece of the molecule. We
call these mass-distribution graphs, or MDG’s. The structure-
generation facilities of CONGEN allow us to assemble the atoms
within each node-composition in all unique ways, and to imbed
these assemblies in all unique ways into the overall MDG
structures. In this way, we arrive at chemical structures which
account for the MS data according to the Simple theory. MDGGEN
is still in its infancy, With the practical limitations of
computer time and storage requirements restricting it to small
molecules (up to perhaps ten non-hydrogen atoms) and relatively
few observed peaks Cup to roughly seven or eight ion
compositions). This early development, which could take place
rapidly because of the existing facilities within CONGEN, has
helped us to focus our attention on the critical advances which
will be needed in creating a more flexible and generally useful
program.

20
2.3 PLANNER

The DENDRAL PLANNER program [28,33] is designed to analyze
the mass spectrum of a compound opr 2° a mixture of related
compounds. Because there is no ab initi"™ way of relating a mass
spectrum of a complex organic molecule to the structure of that
molecule, PLANNER requires fragmentation rules for the class of
compounds to which the unknown belongs. This is its major
limitation.

Applications and limitations of PLANNER have been discussed
extensively. [28,33] The program is very powerful in instances
where mass spectrometry rules are strong (i.e., general, with few
exceptions). In instances where rules are weak or nonexistent,
additional work on known structures and spectra may yield useful
rules to make PLANNER applicable (see INTSUM and RULEGEN, below).
One unique feature of

PLANNER is its ability to analyze the spectra of mixtures
in a systematic and thorough way. Thus, it can be applied to
spectra obtained as mixtures when GC/MS data are unavailable or
impossible to obtain.

The power of the PLANNER has been substantially increased
by including the MOLION program (discussed below) as a subroutine
for computing the list of plausible molecular ions. Since this
Subprogram does not depend on knowledge of the compound class,
the PLANNER no longer needs to have class-specifie rules for
determining the mass and empirical formula of the unknown
molecule.

The major use of the Planner in the past year has been as a
means of testing new class-specific mass spectrometry rules

proposed by the Meta-DENDRAL program described below. One
measure of quality of a set of proposed rules is their ability to
discriminate among isomers in the same class. For example, the

monoketoandrostane rules can be partly evaluated by their ability
to assign the keto group to the correct substituent position,
based on the mass spectrum of the compound. Since there are
eleven possible positions, we are asking the rules Lo
discriminate the correct structure from the other ten
monoketoandrostanes.

2.4 Meta-dendral Rule Formation Programs

When the mass spectrometry rules for a given class of
compounds are not known, the INTSUM, RULEGEN and RULEMOD programs
can help a chemist formulate those rules. Essentially, these
programs categorize the plausible fragmentations for a class of
compounds by looking at the mass spectra of several molecules in
the class. All molecules are assumed to belong to one class
whose skeletal structure must be specified. Also, the mass

21
Spectra and the structures of all the molecules must be given to
the program.

INTSUM collects evidence for all possible fragmentations
(within user-specified eonstraints) and summarizes the results.

For example, a user may be interested in all fragmentations
involving one or two bonds, but not three; aromatic rings may be
known to be unfragmented ; and the user may be interested only in

fragmentations resulting in an ion containing a heteroatom.
Under these constraints, the program correlates all peaks in the
mass spectra with all possible fragmentations. The summary of
results shows the number of molecules in whose spectra there is
evidence for each particular fragmentation, along with the total
(and average) ion current associated with the fragmentation.

The INTSUM program [34] is in routine, production use to
assist in interpretation of the mass spectra of new classes of
molecules (see Part 3 for details).

The RULEGEN program attempts to explain the regularities
found by INTSUM in terms of the underlying structurai features
around the bonds in question that seem to "drive" the
fragmentations. For example,

INTSUM will notice significant fragmentation of the two
different bonds alpha to the carbonyl group in aliphatic ketones.
It is left to RULEGEN to discover that these are both instances
of the same fundamental alpha-cleavage process that can be
predicted any time a bond is alpha to a carbonyl group.

The RULEMOD program modifies and condenses the set of rules
produced by INTSUM and RULEGEN together. It looks at the
negative evidence associated with each candidate rule in order to
select the best ones, then merges rules that seem to explain the
Same breaks (if possible). The program was substantially
improved in several ways, as described in the next section.

2.4.1 Improvements Made to the Meta-DENDRAL Programs

2.4.1.1 INTSUM Improvements

Transfers of arbitrary neutral species can now be specified
as part of the mass spectrometry processes, instead of transfers
of hydrogen atoms”) alone. This capability increases the utility
of the program in at least two ways: first, it allows a chemist
to control the program better -- to produce the kinds of results
that are more chemically meaningful -- and second, it allows the
program to explore more complex processes within its space and
time limitations. For example, carbon monoxide and water were
listed as plausible neutral molecules to transfer in or out of
fragments for the triketoandrostanes. Thus, the processes are
listed with and without these transfers, just as chemists prefer,

22
instead of showing loss of CO as a set of two breaks around the
keto group, or loss of H,0 as loss of oxygen (breaking the C=0
bond) accompanied by loss of two hydrogens. What is more, the
program can now produce these results without violating its
chemical heuristics of (a) not breaking adjacent bonds, and ‘b)
not breaking double bonds. This economy also pays off in
increasing the complexity oof the processes) that can be
considered. Because loss of CO, for example, is a result of a
transfer instead of the result of breaking two bonds, the number
of bonds broken in accompanying processes can be increased by
two.

Another INTSUM improvement was to increase the options for
initial data filtering. Thresholding is too simple for many
problems, so we now provide an option to cluster peaks and select
the n largest peaks from each cluster.

The format of the input data is also now less” strict than
before. We have written programs to read spectra in Aldermaston
format. And we have merged CONGEN’s Editstruc package into the
INTSUM setup routines to allow a chemist to associate structures
With spectra interactivity. This greatly decreases the chances
of error in setting up the input data.

Several modifications were also made to the program to
increase its efficiency, e.g., processing all intensities as
integers (between 0 and 1000).

2.4.1.2 RULEGEN Improvements

The evaluation of prospective rules in RULEGEN guides the
entire rule generation procedure. To tune this procedure, we
modified the evaluation function in several ways and compared the
resulting sets of rules. We were looking for an objective way of
telling the program to keep rules general, but "not too general".
The current evaluation function is substantially improved asa
result.

Because the RULEGEN program searches such a large space of
partial and complete rules, it requires large amounts of computer

time (sometimes more than 60 cpu minutes). Thus, we have
investigated several improvements for efficiency alone. In
addition, we have made the program easier to set up and run in
batch mode to reduce the chemist’s personal time investment. And
we have made the program easily restarted from any intermediate
point -- to protect the chemist from machine failures.

2.4.1.3 RULEMOD Improvements

At the time of the last annual report RULEMOD was a new
prgram still in its experimental stages. Since then we have
added new subprograms and integrated the program with other
programs to make it a useful and necessary part of Meta-DENDRAL.

23
Two new subprograms greatly improve RULEMOD’s performance.
(1) A program to add specifications to rules was completed. It
looks for plausible ways of making a rule more specific in order
to decrease the number of counterexamples to the rule. (2) A
complementary program to make rules more general was also
completed. The program tries to find ways to reduce the number
of descriptors on nodes of subgraphs in order to increase the
breadth of applicability of rules. Its major constraint is that
it cannot make any change that would increase the number of
counterexamples. Both of these subprograms make the final rules
much closer to rules that chemists approve of.

The subprogram that merges rules was also improved. The
program tries to merge pairs of rules into a more general form
for economy and clarity of rules. Its major constraint is that
no explanations are lost, i.e., all the data points explained by
the initial pair of rules will still be explained after merging.
Formerly we insisted that the more general form must cover all
the same data points as the initial rules, but this was found to
be too narrow a constraint. By giving the program a more global
view of the entire set of rules, we can let the more general,
merged form explain fewer data points than its component rules as
long as other rules explain the remainder.

2.4.2 Search for New Applications of the Rule Formation
Programs

In this year the Meta-DENDRAL programs have matured enough
to let us consider extending them beyond mass spectrometry. The
domain that we chose was 13C NMR spectroscopy, for a variety of
reasons.

13C NMR has been characterized as the spectroscopic
technique of the 1970°s [68]. Our laboratories have been
involved in experimental work on 13C NMR spectra of amines, keto
and hydroxy steroids [62-64]. In addition, we have carried out a
preliminary investigation of a Heuristic DENDRAL approach to
interpretation of 13C spectra of amines [39].

There are several parallels between rule formation in mass

Spectrometry and 13C NMR spectrometry. In both techniques the
precise reasons for molecular fragmentation (in the former) or
NMR absorption (in the latter) are poorly understood. In the

absence of a detailed theory capable of accurate prediction of
spectra, we seek empirical srules which can relate observed data
to measurable structural parameters. Some of the structural
parameters presumed relevant, @e.g.,; atom type, bond
multiplicities, are shared in both techniques. Some of the
current Meta-DENDRAL structural manipulation functions can be
used for either technique. An important difference is that the
Planning phase of Meta-DENDRAL (i.e., INTSUM) necessary in
applications in mass spectrometry is not required for 13C NMR
because we will deal initially with spectra whose absorption

24
peaks (or "shifts" relative to any internal standard) are
assigned to specific atoms in the known structures. Typically
scientists have sought an explanation for the 13C NMR shift of an
atom in terms of the structural environment of the atom.
Searching such structural environments is a problem which is
amenable to solution by existing and proposed parts of the Meta-
DENDRAL program.

As in applications to mass spectrometry [58] we will
Propose a set of factors which might affect 13C NMR absorptions.
With a description of these factors we will use the Meta-DENDRAL
program to produce a set of rules which will reproduce and
predict resonance shifts of individual 13C atoms.

The current Meta-DENDRAL program represents a basic
framework for studying 13C NMR rule formation. We believe that
the program will require little revision to accommodate the
differences in data and rules. We have already considered some
of the problems of changing the form of rules. The subgraphs in
the situation parts of rules need to be generated "outward" from
a specific 13C atom instead of outward from a bond broken in the
mass spectrometer. The action parts of rules need to take
account of an explicit absorption range whereas for mass
Spectrometry the rules predict much more precise data points
{mass positions). We have made a preliminary test of the
program’s extensibility in the context of alkanes.

For the alkane study we used only a topological model of
molecular structure, not a geometric model. The rules that were
formed from a test set predicted shifts for 13C atoms in other
alkanes (outside the test set) with accuracy within 1.5 ppm. The
major modifications needed in the program to _ produce these
preliminary results were the following:

(a) change RULEGEN to generate rules by expanding the
Subgraph environments outward from a central atom rather than
from a central atom rather than from a central bond;

(b) change the form of rules to associate a range of shifts
with each subgraph rather than a precise fragment mass;

Ce) redefine RULEGEN’s evaluation function for partial
rules to take account of the desire to predict narrow ranges of
shifts.

Other domains were considered, including finding rules to
associate pharmacological activity with molecular structure and
finding rules for other organic chemical analysis techniques. of
all that we considered, 13C NMR appears to offer the most in
terms of both feasibility and utility.

25
2.5 Results

2.5.1 Keto-androstanes

We have shown that the Meta-DENDRAL program is capable of
rationalizing the mass spectral fragmentations of sets of
molecules in terms of substructural features of the molecules.
On Known test cases, aliphatic amines and estrogenic steroids,
the Meta-DENDRAL program rediscovered the well-characterized

fragmentation processes reported in the literature. On the three
classes of ketoandrostanes for which no general class rules have
been reported, the mono-, di-, and triketoandrostanes, the

program found general rules describing the mass spectrometric
behavior of those classes. The general rules shown in Tables II,
IV, and VI explain many of the significant ions for compounds in
these classes while predicting few spurious ions. The program
has discovered consistent fragmentation behavior in sets of
molecules which have not appeared by manual examination to behave
homogeneously in the mass spectrometer.

Programs with knowledge of the scientific domain can
provide "smart" assistance to working scientists, as shown by the
reasoned suggestions this program makes about extensions to mass
spectrometry theory. We are aware that the program is not
discovering a new framework for mass spectrometry theory; to the
contrary, it comes close to capturing in a computer program all
we could discern by observing human problem-solving behavior. It
is intended to relieve chemists of the need to exercise their
personal heuristics over and over again, and thus we believe it
can aid chemists in suggesting more novel extensions to existing
theory. It can be argued that the two-dimensional connectivity
model of molecules used in this study is not the right model for
mass spectrometry; that there are deeper rationalizations of a
fragmentation process than subgraph environments. However, this
model is commonly used by working chemists and once
fragmentations based on this model are defined, chemists can
readily provide the remaining "mechanistic" rationalizations or
see that further experimental work with labeled compounds is
necessary. (Other limitations of the method have been discussed
at the end of the methods section.)

Recent statistical pattern recognition work addresses some
of the points on rule formation and spectrum prediction raised in
this paper. We have avoided blind statistical methods for three
important reasons. 1) We wish to explore thousands of possible
Subgraphs with associated features, as we search for those which
are in some way important. Current pattern recognition
procedures are restricted to much smaller numbers of manually (for
computer-assisted) selected features, adding additional bias to

the procedure. 2) We want to know how certain rules were
obtained by the program and why certain other rules were rejected
or not detected. We can trace the reasoning steps of the Meta-

DENDRAL program and determine chemically meaningful answers to

26
such questions in a way that is not possible with purely
statistical programs. 3) We wish to constrain the rule formation

activity in ways that are natural to a working chemist. For
example, we may want the program to avoid fragmentations
involving aromatic rings or two bonds to the same atom, or, as
mentioned above, we may want to look at fragmentations

accompanied by loss of CO or other neutral fragments.

Rules can be formulated to explain data in terms that are
known to be meaningful to chemists; most importantly, the rule

formation constraints are under the control of the chemist. Also
we feel that this approach provides a high level of generality in
describing fragmentation processes. Although the rules are

developed in the context of a particular set of compounds, they
are not tied to that set but can be applied in other contexts, or
eompared to rules developed from other sets of compounds ina
search for common features of the rules. For these reasons, we
believe that the Meta-DENDRAL program offers a powerful and
useful complement to pattern recognition programs for finding
relationships between structures and spectral data.

We are cautiously optimistic about the general
applicability of this rule formatton method, although we have
demonstrated its utility for only a small number of compound
classes and only in the context of mass spectrometry.

2.6 Heuristic Programming Project Workshop

In the first week of January, 1976, about fifty
representatives of local SUMEX-AIM projects convened at Stanford
for four days to explore common interests. Six projects at
various degrees of development were discussed during the
eonference. They included the DENDRAL and META-DENDRAL projects,
the MYCIN project, the Automated-Mathematician project, the Xray-
Crystallography project, and the MOLGEN project. Because of the
interdisciplinary nature of each of these projects, the first day
of the conference was reserved for tutorials and broad overviews.
The domain-specific background information for each of the
projects was presented and discussed so that more technical
discussions could be given on the following days. In addition
the scope and organization of each of the projects was presented
focusing on the tasks that were being automated, how people
perform these tasks, and why the automation was useful or
interesting.

In the following days of the workshop, common themes in the
management and design of large systems were explored. These
included the modular representations of knowledge, gathering of
large quantities of expert knowledge, and program interaction
with experts in dealing with the knowledge base. Several of the
projects were faced with the difficulties of representing diverse
kinds of information and with utilizing information from diverse

27
sources in proceding towards a computational goal. Parallel
developments within several of the projects were explored, for
example, in the representation of molecular structures and in the
development of experimental plans in the MOLGEN and DENDRAL
projects. The use of heuristic search in large, complex spaces
was a basic theme to most of the projects. The use of modularized
knowledge typically in the form of rules was explored for several
of the projects with a view towards automatic acquisition, theory
formation, and program explanation systems.

For each of the projects, one session was devoted to plans
for future development. One of the interesting questions for
these sessions was the effect of emerging technology on
feasibility of new aspects of the projects. The potential uses
of distributed computing and parallel processing in the various
projects were explored, particularly in the context of the
DENDRAL project.

Most of the participants felt that the conference gave them
a better understanding of related projects. And because many
members of the SUMEX-AIM staff actively participated, the
workshop also provided all projects with information about system
developments and plans. The discussions and sharing of ideas
encouraged by this conference has continued through a series of
weekly lunches open to this whole community.

3 PART 3: APPLICATIONS TO BIOMEDICAL STRUCTURE
ELUCIDATION PROBLEMS

3.1 Introduction

In our grant proposal we discussed the application of the
instrumentation and computer programs described above to the
Study of molecular structure problems ina variety of biomedical
applications areas. This is our primary research area, and we
discussed specific classes of problems and eompounds for
investigation. We also made it quite clear that our facilities
would be made available to wider community of collaborators/users
aS our resources permitted. Both categories of application,
i.e., within our own group, and with an outside group, are
described in some detail below. Our last annual report described
several steps taken to encourage a broad community of researchers
to use our facilities. For example, we sent a questionnaire to
members of the American Society for Mass Spectrometry, Committee

28
III on Computer Applications, and a follow-up letter to persons
indicating a desire to know more about access to our programs.
The same note has been sent to several other persons whom we know

from personal contacts might be interested. Because of the
nature of their investigations, many of these people receive NIH
support. Several of our publications (e.g., [45-49,53-61])

mention the availability of our programs. In addition, through
individual contacts and formal presentations at conferences we
have been encouraging outside use of the programs.

The availability of SUMEX as a mechanism for resource
sharing has made it possible for us to extend access to our
programs to a number of people. Without SUMEX, this access would
be impossible, and most of our programs (those which are not
easily exportable) could be used only by ourselves.

3.2 Applications by Professor Djerassi’s Research Group

Our existing grants, outlined below, mesh well with our
instrumentation and program development under the present award.
Under NIH Grant GM06840 we have been studying natural products
from marine sources with major emphasis. on terpenoids and

sterols. For this work we have been dependent on the use of our
711  ainstrument for high resolution mass spectrometry which we
require for the identification of all new compounds, many of
which are present in only very small quantities. We were

particularly anxious to have access to GC coupled with a high
resolution mass spectrometer because we hope to be able to screen
large numbers of marine animals for their sterol content using
this technique. We are currently engaged in intensive efforts in
analysis of mixtures of marine sterols involving our computer-
based procedures. The program for the development of the
computer operated and assisted system of marine sterol structure
analysis has been planned to proceed in three stages:

1) Analysis of all literature published concerning marine sterols
so that a complete listing of known sterol structures and

organisms studied could be compiled.

2) Collection, evaluation, digitization and computer file
construction for the mass spectra of all known marine
sterols, followed by the institution of a computer operated
file search sequence for direct analysis of marine sterol GC-
MS data.

3) The application of the INTSUM, RULEGEN, and RULEMOD programs
to the computer file of marine sterol spectra so that a
series of fragmentation rules can be extracted for use in the
generation of possible structures from mass spectral data for
new marine sterols, that is, sterols whose mass spectra
cannot be matched with any spectra contained in the computer
search file.

29
We are presently completing the second stage and beginning
the third. The following discussion will be a summary of the
work that has been completed, and the work that is in progress or
planned.

The literature concerning marine sterols is extremely
extensive. Over a thousand reports concerning marine sterols can
be found scattered throughout a multitude of journals dating back
to the initial report by Henze in 1908. In spite of the
occurrence of a number of good review works in the literature, we
have found the compilation of all reported marine sterol
Structures and organisms studied to have been an imposing task,
which we have now completed successfully. The search has also
pointed up a number of entire phyla of marine invertebrates for
which no sterol analysis have been reported, and has therefore
pointed out perhaps the best candidates to which the developing
automated analytical procedures should be applied. The search
has also generated an extensive and very refined list of
descriptions which are now used in a computer generated update of
our bibliography every two weeks for this very active field.
This laboratory has been involved in sterol work for some years
and so our own samples and mass spectral files have made a
Significant contribution to the compilation of the complete mass
spectral file of marine sterols.

Table I represents a listing of marine sterol spectra as
well as a listing of purely synthetic sterol mass spectra (for
use in evaluation of the INTSUM results) which have been
eontributed by this laboratory. These spectra are now part of
completely functional computer files. We have requested and
received samples of other marine sterols from researchers around
the worid who have reported their isolation. A large number of
these sterols have now had mass spectra taken and the enlargement
of our computer mass spectral file is proceeding rapidly.

The series of programs for processing raw GC-MS data and
searching mass spectral files have recently been instituted on
the chemistry PDP 11/45 computer. The series of programs which
have potential application to processing our data are CLEANUP (a
Program for subtracting GC column bleed or background and noise
from raw GC-MS data, and resolving spectra of overlapping
elutants), MOLION (a program for generation of molecular ion
candidates from mass spectral secondary losses), and SEARCH (fa
program for searching and comparing experimental mass spectra to
the file of known marine sterol mass spectra). several data
Management programs exist for displaying the results of the file
search and other operations. Development of a program to utilize
GC retention indices is progressing. The first experimental file
search for an actual sample run will be possible within the next
few weeks, but we have already used the SEARCH program to process
and evaluate several duplicate marine sterol mass spectra from
our files as listed in table I. Table II represents the results
of this kind of experiment. Three separate (24E)-STIGMASTA-
5,24(28)-DIEN-3BETA-OL (trivial name "FUCOSTEROL") mass spectra

30
were compared to 25 marine sterol mass spectra in the computer
files via the SEARCH program. The program was able to select
each of the mass spectra from the main file with the inclusion of
one thirty carbon sterol (24Z2)-24-PROPYLIDENECHOLEST-~5-2N-3BETA-
OL which possesses a structure similar to FUCOSTEROL, the twenty-
nine carbon sterol. This kind of study has shown that in
Principle the SEARCH program funetions for marine sterol
correlations, but requires some fine tuning to reduce this kind
of error. The search strategy modifications should be complete
within the next several weeks.

One other aspect of this work should be mentioned. We have
found that for very complex marine sterol mixtures a single GC-MS
run is sufficient to identify the major sterol components anda
few minor components. Further separation procedures are required
to analyze the remaining minor components. We have found many of
the minor components to be of significant biosynthetic and
ecological interest. We have spent a considerable effort
perfecting rapid separations or enrichments of these minor sterol
components so that GC-MS analysis can be run on then. We now
have a procedure utilizing silica gel, alumina , silver nitrate
impregnated alumina and silica gel, and high pressure reversed
phase liquid chromatography which produces separations and/or
enrichments so that GC-MS data can be obtained for every sterol
of even a 30 component mixture. Perfecting these separations
have required over six months. We have used the sterol extracts
of two Gorgonians or soft corals, Pseudoplexaura Porosa and
Plexaura Homomolla. Within these extracts we have discovered
several new classes of marine sterols, including several twenty-
two carbon sterols of unusual stereochemistry, a twenty-one
carbon sterol, several new 5-BETA stanols, and ae series of
extremely interesting 19-nor-delta-5-sterols ‘publications in
preparation). We feel certain that with the institution of the
computer assisted procedures described herein, the time required
for this kind of study (half a year) can be cut down to weeks.

Application of INTSUM to the marine sterol spectral files
has just begun. One aspect of the INTSUM work which should be
mentioned here is that in addition to the free 3-beta-hydroxy
marine sterol files, a number of marine sterol derivatives
(acetates, O-methyl ethers, trimethylsilyl ethers, and other
derivatives) were compiled from the mass spectral library in this
laboratory. INTSUM will be applied to these marine sterol
derivative files in order to extract fragmentation rules.
Comparison of the results for the free and derivatized sterols
will point up the cases where some of the derivatives ‘(which have
superior GC properties) can be used with a minimum of loss of
mass spectral information. We are confident that the file search
system will be functioning before July. We already have marine
extracts arriving from our collaborators in Brazil, and have
offered the use of the system, once it is functioning, to
researchers in Japan and Britain. We feel that the system will
be of great benefit to the large number of researchers in the
marine sterol field.

31
Another major area of interest in our chemical laboratories
is the structural analysis of marine terpenoids using CONGEN in
conjunction with a variety of spectroscopic data collected on
these compounds. For the past year we have been involved in the
application of CONGEN in the area of structural elucidation
specifically related to marine natural products other than
steroids. CONGEN’s advantages in these studies lie chiefly in
its ability to provide interactively the chemist with assurance
that no plausible solutions have been overlooked, as well as an
insightful measure of the progress of the problem, thereby
Suggesting clues to guide the course of the investigation.

(+)-Palustrol.

The utility of CONGEN has been demonstrated recently [57]
in the identification of (+)-palustrol, a tricyelie sesquiterpene
alcohol from the marine Xeniid Cespitularia virdis. Inferences
derived from 1H and 13C nmr spectra suggested molecular fragments
whose assembly by CONGEN-) resulted in an initial set of 272
candidate structures. Examination of the set suggested
appropriate nmr decoupling experiments resulting in the
imposition of additional constraints which reduced the initial
set of candidates to 88. Dehydration of the tertiary alcohol and
Spectral examination of the resulting olefins provided additional
structural constraints which reduced the set further to 22.
Recognition of an additional constraint after examining these
possibilities eliminated two of the 22. Of the remaining 20
structures, only four (1 - 4) obey the isoprene rule, and of
these four, i and 2 may be deleted because there dehydration
would yield unsaturated analogs which violate Bredt’s rule.

HO
HO
1 2
726
OH
3 4

Examination of the literature revealed that structure 3 had
been assigned to (-)-palustrol (L. Doleijs, V. Herout, and F.
Sorm, CCCC, 26, 811 (1961)). The published infrared spectrum for
(-)-palustrol was identical in all respects to that of the

unknown alcohol, thus establishing its structure. Our structure

3, however, displays the opposite rotation of polarized light.
Briareine D.

A recent study by Tursch and Bartholome (C. Bartholome,

PhD. Thesis, University of Brussels, 1974) resulted in two
alternative proposed structures for Briareine D, one of four
chlorinated diterpene lactones isolated from the gorgonian

Briareum asbestinun.

Rigorous examination of the structural inferences which led
to the proposed structures yielded molecular fragments and
constraints which were supplied to CONGEN for’ construction of

structural candidates. The results confirmed the proposed
structures, 5 and 6, and, more importantly, suggested two
additional candidates (7,8) which had not been considered
previously and could not be excluded on the basis of existing
data. rc
XX x
Cl xX |
O
HO Xx HO x
5 6
x xX x
O
xX
| |
HO x HO x
7 8
(X =RCOO)

Work is currently in progress on the CONGEN-assisted
structure elucidation of the aglycone portion of Lemnalialoside,
a diterpene glycoside from Lemnalia digitata, and a tricyclic
sesquiterpene hydrocarbon from Sinularia mayi.

Further applications are summarized under headings of

subsequent sections which refer to specific programs. Much of
the effort in application of our programs to the mass spectral
data implicitly assumes that the data are available. In fact,

without the current and future instrumentation effort discussed
in Part 1, these program applications would not be feasible.

33
3.2.1 CLEANUP

The spectral cleanup program, written for ourselves and our
collaborators in the Dept. of Genetics, Stanford Hospital (see
Local/Stanford Community, below) is now in routine use. A
manuscript describing the method is now in press [61]. Several
improvements have been made in the program to increase its
capabilities for dealing with complex multiplets of overlapping
GC peaks and to improve its efficiency. The resulting version of
the program has been exported to several other laboratories which
have expresses interest in our methods (see end of Part 3).

3.2.2 INTSUM

As a means of extending the rules of fragmentation in mass
spectrometry, several classes of compounds are under study as we
attempt to determine characteristic modes of fragmentation. The
following is a brief description of each such class and the
current status of our research:

1. Pregnanes: Pregnanes related to the progesterone skeleton
have been analyzed in some detail in collaboration with Dr.
S.  Hammerum, (University of Copenhagen, Denmark). Two
manuscripts describing this work have recently appeared
[65,66].

2. Androstanes: Keto-substituted analogs of the skeleton of the
important steroidal hydrocarbon, androstane, were being

Studied in collaboration with Dr. Roy Gritter (an IBM
scientist who spent his sabbatical leave in our laboratory
learning more about mass spectrometry). This study is
important to our understanding of the mass spectral behavior
of complex, polycyclic systems. It is providing a model for
the use of Meta-DENDRAL programs. We have completed this
study and a manuscript describing our method and results is
now in press [58] in the Journal of the American Chemical
Society.

3. Macrolide Antibiotics : We have finished the first Stages of
our analysis of the fragmentation of several members of these
macrocyclic systems. We have solicited and obtained a mall
number of additional compounds to supplement our own limited
number of samples. We are currently correlating the INTSUM
results from closely related structures to identify
systematic modes of fragmentation. We are designing
experiments of deuterium labelling and metastable defocusing
to help distinguish among alternative explanations by INTSUM
for several prominent ions in the spectra of these compounds.
Further efforts on this problem are hindered by lack of
available standards.

yy Insect Juvenile Hormones: In collaboration with Dr. Loren
Dunham, Zoecon Corp., we are investigating regularities in

34
the fragmentation behavior of the juvenile hormones.
Previous work on the mass spectra of these compounds was
carried out only at low resolving powers. We have obtained
the high resolution mass spectral data for these compounds
and have completed the INTSUM analysis of the data. Our
findings have been described in a manuscript which will
appear shortly [67] in Organic Mass Spectrometry. Our
results will prove valuable for structural analysis and
detection of these compounds and congeners.

5) Marine Sterols : The previous’ section Summarizes our
continuing efforts in marine sterol analysis, including the
importance of INTSUM in these studies.

3.2.3 RULEGEN AND RULEMOD

As described above, RULEGEN and RULEMOD can be _ used to
assist in discovery of mass spectrometry fragmentation rules
which depend on substructural features of molecules. Thus, it
can be used for classes of compounds where the fragmentation does
not depend on the basic skeleton, but on local features expressed
by common substructures. Our’ studies [58] on the performance of
the program (see Meta-DENDRAL section) have involved analysis of
Spectra of previously well-characterized classes of compounds.
We have analyzed spectra of aliphatic amines and estrogenic
steroids in terms of fragmentation dependence on substructural
features of these molecules. Excellent agreement with literature
descriptions of fragmentation were obtained. We then proceeded
with a study of the previously uncorrelated mono-, di- and
triketoandrostanes. Our results [58] provide new insights into
regularities of molecular fragmentation among members of the same
group. The results also indicate little or no additivity of
effects of keto substitution; spectra of diketoandrostanes are
not superpositions of the respective monoketoandrostanes.

3.2.4 CONGEN

We are currently engaged in efforts to explore the utility
of CONGEN to avariety of structure elucidation problems. The
current areas of application are summarized below, together with
progress to date.

1) Ion Structures: CONGEN has been used to construct possible ion
structures under a variety of constraints in support of
studies on the structures of ions in the mass spectrometer.
These studies are crucial to a deeper understanding of

molecular fragmentation. The programs results are used to
ensure that no plausible alternatives have been overlooked
during efforts to characterize the structures. We have

recently published a detailed description of the use of
CONGEN which illustrates the systematic approach available
with the program [55].

35
2) Terpenoid Systems: We are using CONGEN to explore questions of
the scope of terpenoid isomerisn. We would like to determine
some criteria which might allow us to say something about why
only certain structural types are found in nature, to the
exclusion of many possibilities which are very similar in
structure. A manuscript describing our first results is now
in press in Tetrahedron [60] and describes some aspects of
the structural isomerism of mono- and sesquiterpenoid
skeletons.

3) Seope of Structural Isomerism: We are investigating the
philosophical and pedagogical aspects of the scope of
structural isomerism. This investigation is important to our
program design and strategy as we identify the ways persons
consider and reject whole categories of structural
possibilities. A manuscript describing this work has

appeared in the Journal of Chemical Information and Computer
Science [54].

4) Constraint Implementation: A detailed description of the kinds
of constraints available to guide CONGEN in its exploration

of structural possibilities has been presented [56]. This
description also presents how constraints and efficient
implementation of chemical "common sense" were derived from

considerations of manual approaches to structural problems.

5) Marine Natural Products: The previous section described use of
CONGEN in solving unknown structures in this area of
application of our techniques.

3.3 Utilization of the Mass Spectrometry Resource

3.3.1 Applications of High Resolution Mass Spectrometry
A) Prof. Djerassi’s Group

We have run about 75 samples to obtain high resolution mass
spectra in support of DENDRAL research problems. These have
ineluded marine sterols faequisition of reference spectra and
verification of structures of new synthetic materials), macrolide
antibiotics, ketoandrostanes and substituted pregnanes for Meta-
DENDRAL studies of fragmentation processes.

B) Stanford Chemistry Department.

We have run a number of spectra for other researchers in
the Department of Chemistry. Samples have included a number of
diterpanes, alkaloids and unknown compounds from both chemical

and enzymatic cyclization procedures.

C) Other Stanford Community

36
We have run spectra for a number of our collaborators in

the Medical School. These have included samples from the
Departments of Genetics, Psychiatry and Anaesthesia, representing
Structural analyses of metabolic products, drug purity and

possible reaction products of an anesthetic, respectively.
D) U.S. and Foreign Collaborators.

Spectra have been obtained for Dr. Dunham, Zoecon Corp., of
Juvenile hormones for INTSUM studies [67]; Dr. Gritter, now back
at IBM, steroids for Meta-DENDRAL studies [58]; Dr. Fitch, Yale
University, alkaloid metabolites; Dr. Tomer, Univ. of Brooklyn,
spectra for fragmentation studies; Dr. Jaeger, Univ. of Wyoming,
structure identification of crown ether components; Dr.
Spangler, Univ. of Idaho, structure identification of sulfides
for studies of remote sulfur-sulfur interaction in the mass
spectrometer. High resolution spectra have been provided to Dr.
Nakano, Venezuela, alkaloids, Drs. Mors and Gilbert, Brazil,
Steroids and alkaloids, Dr. Sultanbawa, Ceylon, triterpenes and
alkaloids, and Dr. Orazi, Argentina, terpenoids.

3.3.2 Applications of GC/High Resolution Mass
Spectrometry.

During the past year we have analyzed the following samples
by GC/ HRMS (these samples represent real applications and do not
include the many samples of standard compounds which were
analyzed during this time during development of the GC/HRMS
system):

A) Prof. Djerassi’s group - We have analyzed about 40
mixtures of marine natural products, primarily sterols, by
GC/HRMS. Some samples were standard compounds necessary as
reference materials but available only as mixtures. some samples
were mixtures of unknown compounds. Spectra were obtained

primarily on underivatized sterols, occasionally from acetate
derivatives.

B) Other Stanford collaborators - We have run GC/HRMS
analyses of several mixtures of diterpenes and precursors, and
enzymatic and chemical cyclization products of squalene epoxide
analogs for Prof. van Tamelen, Dept. of Chemistry. We have
analyzed ten urine fractions in conjunction with on-going work
with Prof. Lederberg’s group in the Dept. of Genetics. These
have been primarily organic and amino acid fractions, derivatized
as appropriate, and urinary polyamines analyzed as the
trifluoroacetate derivatives.

3.3.3 Other Mass Spectral Studies

We have obtained a number of conventional mass spectra ‘flow
resolution) in cases where high resolution data were not required

37