OCT ~ 3 1969

 

UNDERSTANDING THE FORMATO PROCESSES oF SCHEMTI Fic. PUFERENCE
Feet

 

 

Ei CHS CONTEXT OF ORGANIC CHESTS? PY:

RB. G. Buchanan, G. L. Sutherland

y \ Ee. A. Feigenbaun

sy sy PART T: THE MASS SPRCTRONE TRY PRonran

eo a The set of computer projrans known as Heuristic

Ye ‘ DENDRAL is an attompt to develop machine intelligence in a
scientific fielé@. In particular its task domain is tha

analysis of mass spectra, chemical @ata gathered routinely

from a relatively new analytical instrument, the mass
Spectrometer. Heuristic DFNDRAL has been developed as a
joint project of the departments of Computer Science,

Chemistry, and Genetics at Stanford University. This

collatoration of chemists and conputer scientists has

proguced What appears .to be an interesting program from the

viewSobut_of artificial intelligence and a useful tool from
Ww

the viewpoint of chenistry.

* This research vas supported by the Advanced vo we, the

the NEveis71c D&AL Pre;
Projects Agency. Gtr kr Ov Tey

the ont

   

Professor Joshua Lederberg, Fr. Allan
Delfing, Dr. Alan Duffield, Dr. Gustay Sckhroll, ard

Professor Carl Djerassi
For this discussion it is sufficient to say that a mass
pectroneter is an instrument into which is put a minute
sample of some chemical compound and out of which comes data
usually represented as a bar graph. This is what is
referred to here as the mass spectrum. The instrument
itself bomhards molecules of the cowpound with electrons,
thereby proluciny ions cf different masses in varying

proportions. The x-points of the bar graph represent the
I

masse

yn

of ions produced and the y-points represent the
relative abundances of ions of these masses.

The Heuristic DENDRAL process of analyzing a mass
Spectrum by computer consists of three phases, the first,
preliminary inference {or planning), obtains clues from the
data as to which classes of chenical compounds are suggested
or forbidden by the data. The second phase, structure
generation, enumerates all possible explicit structural
hypotheses which are compatible with the inferences made in
phase one. The third phase, prediction and testing,
predicts consequences from each structural hypothesis and
compares this prediction with the original spectrun to
choose the hypothesis which best explains the data.
Corresponding to these three phases are three sub-programs.
The projram(s) have been described in previous publications,

primarily in the volume of Machine Intelligence 4, and in a

Y

eries of Stanford Artificial Intelligence Project Memos (4,
5, 7).

The Prelibrinary Inference Maker program contains a list
of nanes of structural frayrments, each of which has special
characteristics with respect to its activity in a wass
Spectrometer. These are called "functional grouns"’. Each
functional group in the list is a LISP atom, with properties
specifying the necessary and/or sufficient conditions
(spectral peaks) which will appear in a mass spectrum of a
substance containing that fragment. Other properties of the
functional group indicate which other groups are related to

this one - as special or general cases.

The program progresses through the qroup list, checking
for the necessary and sufficient conditions of each groupe
Two lists are constructed for output: Goodlist enumerates
functional groups which might he present, and Badlist lists
Functional jroups which cannot be in the substance that was

introduced to the mass spectrometer.

Gooalist and BadJist are the inputs to the Structure
Generator, which is an algorithmic generator of all isomers
(topologically possible graphs) of a given empirical formula
(collection of atoms). Each Goodlist iter is treated as 4
"Super ator", so that any functional group inferred from the
data by the Preliminary Inference Maker will he guaranteed
to appear in the list of candidate hypotheses output by the

Structure Gennrator.
m

he Structure Generatorts operation is hased on the

od

DENDRAL algorithm for classifying and comparing acyclic

structures. (9) The al;

%
:
o

rithm guarantees a complete,

non-redundant list ef isomers of ap empirical formula. Tt

wh

is the foundetion for the @eveloprent of the whole mass

spectronetry program.

The third sub-program is the Hass Spectrum Predictor,
which contains what has been referred to as the "complex
theory of mass spectrometry". This is a mod21 of the
processes which affect a structure when. it is placed in a
hass spectrometer. Some of these rules determine the
likelihood that individual bonds will break, given the total
environment of the bond. Other rules are concerned with

larger frayments of a structure - like the functional groups

which are the hasis of the Preliminary Inference Maker. All

ty

these deductive rules are applied (recursively) to each
structural hypothesis coming fron the Structure Generator.
The result is a list of mass-intensity number pairs, which

is the predicted mass spectrum for each candicate molecule.

Any structure is thrown out which appears: to he
inconsistent with the original data (i-@., its predicted
Spectrum is incenpatible with the spectrum). The remaining
structures. are ranked from most to least plausible on the
basis of how well their spectra compare with the data. The

top ranked structure is considered to be the "best.
explanation®

Thanks to the collaboration of Dr. Gustav Scehroll, an
NER (Nuclear Magnetic Resonance) Predictor and Inference
Waker have heen aéded to the program. Thus tho program can
confirm and rank candidate structures through predictions
independantly of mass spectroscopy, bringing the whole
process more in line with standard accounts of "the
scientific method". Thus the Heuristic DENDRAL program is
expanding from the “automatic mass spectroscopist" to the
"automatic analytical chenist". Other analytical tools,
such as infra-red spectroscopy will be incorporatod

eventually.

Three papers have appeared in the chemical literature
(1, 2, 3) in the past year. The first paper describes the
Heuristic DENDRAL progran and tabulates numbers of isomers
for many conpounds. This is of particular interest to
chemists because it indicates the size of the search space
in which structures must be found to match specific data.
The second paper explains the application of the procram to
ketones: the subclass of molecular structures containing the
keto radical (C=0). The whole process from preliminary
inference (planning) through Structure generation and
prediction of theoretical spectra was applied to RANY
examples of ketone spectra. The results, in terms of actual
Structures identified, were encouraging. The third paper

explains the application of the program to ethers.
Introducing the HER Predictor contributed to the successful

results which are @escrihed in the ether paper.

Acceptance of these papers by a chemistry journal is
some measure of the progran'ts capahility, but indicates more
its novelty and potential. A better measure of its
performance level is provided by comparing the program with
professionals. In July (1959) Professor Carl Djerassi, an
eminent mass spectroscopist, asked the members of his
graduate mass spectrometry seminar to interpret three mass
spectra, giving them only the empirical formulas of tha
structures and stating the fact that they were acyclic
structures - just the information given to the program. On
the first problem, the program and one graduate student got
the correct structure; ancther graduate student and a
post-doctoral fellow were both close, hut not correct. On
the second problem, the program got the correct answer; two
graduate students included the correct answer in
undifferentiated sets of tyvo and four structures; while the
post-doctoral fellow missed the answer. On the last
problen, the program missed the correct structure and the
post-doctoral fellow included it in a pair of eyually likaly
structures. The copputer spent approximately two to five
minutes on each problem; the chemists spent hetween fifteen
and forty minutes on cach. From this small experiment and

their own observations, (admittedly sympathetic) mas

ve)

spectroscopists have said the program performs as well as

graduate students and post-doctoral fellows in its limited
task Gomain.

u

success of the mass spectrometry program is
encouraging. One reason for this success is the larye
amount of nass spectrometry knowledge which chemists have
iuparted to the program. Yet this has heen one of the

biggest bottlenecks in developing the proaran. When thera
E gj t

was only one theory of mass spectrometry in the program,
viz., the complex theory in the Predictor, we were
relativoly insensitive to the difficulty of adding new
information to the theory. Although it was a tine~consuning
process, it was still manageable by one programmer, working
with one chemist, with most of the tine spent prograuming as
opposed to criticizing. By the time the planping phase was
added to the program, it was easier to see how to shorten
the task of programming by separating the chemical theory
from the routines which work on the theory. The separation
was by no means complete here, but it vas successful enough
to reduce the programming tine drastically for the addition
of new picces cf theory. Because the theory could he
changed by changing an entry in a table, many iterations
With the expert were now possible ina Single one or two
hour session at the console. The preponderance of tine was
now spent by the chemist deciding how to change the rules in
the table to bring the program's behavior more in line with

real data.

The organization of the Preliminary Inference Maker
made the process of expandiny its chemical knowledge
relatively sinple, compared to the process of putting
knowledge into the Structure Generator and Predictor
programs. Roth of these programs are on their way to
becoming "table driven" in much the same way as the
Preliminary Inference Haker is now. (See Part IV.) Yet,
re-designing the programs to allow easy additions and
changes to the chemical knowledge will not solve all our
problens. Because mass Spectroscopy is a relatively young
discipline, the theory does not exist in any sort of
comprehensive codified form. Part IT will discuss some of
the problens of obtaining the chemical theory that has been
incorporated into the programs so far. Further, the
presence of any body of knowledge in the programs brings up
questions of how and where this knowledge is to be
represented, stored, and referenced within the programs.

Part III will elaborate on these issues.

PART II? @SLICITING A THEORY FROM AN FXPERE

As in the case of the Greenblatt chess program, the
proficiency of the mass spectrometry progran is due in large
measure to the great number of times the hehavior of the
program has been criticized by good "players", with
subsequent modifications to the program. In hoth cases, the

a

heuristics of good play were not born full-blown out of the
a i

head cf the programmer; they were built up, modified and
tuned through many interactions with persons who were in a
position to criticize the perfornance of the pLlograge Yet
one of the greatest bottlenecks in our total system of
chemists, programmers and program has been eliciting and
proyjrambing new pleces cf inforration ahout mass
Spectronetry. One problem is that the rate of information
transfer is much slower than we would like. And another
problem is that the theory itself is not as well defined as
we had hopec. Since these two problems are common to a
broad range of artificial intelligence programs, our

encounter with them will be descrihed in detail.

It should he understood from the start that there
presently is no axiomatic or even well organized theory of
mass spectroretry which we could transfer to the progra
From a text hook or from an expert. The theory is in very
much the sane state as the theory of good chess play: there
exists a collection of principles and empirical
generalizations laced throughout with Seemingly ad hoc rules
to take care of exceptions. No.one has quantified these
rules and only a few attempts have been wade to Systenatize
them. Thus the difficulty in eliciting rules of mass
Spectronetry From the expert lies only partly in the
clumsiness of the program; the primitive state of the theory
cartainly contributes te our difficulty too. In our case,
this problem hes been compounded by having the theory of

mass spectrometry in two different forms in the program: one

in the prediction phase, and a less conplex -- but hopefully
cw

compatible ~- theory in the planning phase. The
iL a 7s

is

implications of this added difficulty will be discussed in

Part IJI.

The folloving dialog illustrates some of the
difficnuities we encountered at the console, apart fron
machine troubles and programming problems. Tt is not a
literal transcript, but both parties to the actual dialog
ayree that it is a fair condensation of sone of the sessions
in which they focused on the Predictor's theory of mass
Spectrometry. The sessions, typically, were one or tyro
hours long. Because much of the process depended on what
the program could do, hoth parties sat at a teletype tied to
the PDP10 time sharing system in which the LISP prograns
resided. The expert in this dialog is A, the programmer is

B, and meta-comments are bracketed.

Az ‘Since El Supremo and the rest want us to work on
ketones, IT guess we should get started.

Bs OR. Incidentally, why are ketones important?

A: Besides being very common in organic chenistry we also
know something of their mass spectrometry because they've
been studied a lot.

B: What subgraph exactly will cause a molecule to he

‘classed as a ketone?
Az The keto, or carbonyl, radical. That is -csx9

{noticing puzzled look).

teea
ws
th

Bz: Then all of these are ketones?
CH3 - CH? - C=O - R
Ci3 - C=O - RP

E- C=9 -

md

A: Wait a minute. The first two are ketones, but the last
one iS a special case which ve should distinguish in the
progran. It defines the class of aldehydes.

B2 So can we formulate the geueral rule that a ketone is
any molecule containing C - C=0 - C {thinking of the LISP
list "(C (2 0) (1 C) (1 ¢)) ').

At That's it.

RB: Now what mass spectrometry rules do you have for
ketones?

Az Three processes will dominate: alpha-cleavage,
elimination of carbon monoxide from the alpha-cleavage
frajments, and the NcLafferty rearrangenents.

BB: Ok. IF wrote those down -- now tell me exactly what.
each one means. Start with alpha-cleavage -- do you mean
the bond next to the hetearoaton?

A: (Pigression on notation -- often alpha-cleavage would
mean this bond, but not for mass Spectrometry.). .«.. Here
alpha cleavage is cleavage of the C-C=0 bond, i.2., cleavage
next to the carbonyl radical -- on both sides don't forget.

Bz All right. That's an easy rnle to put in {translating
to a new LISP function which defines alpha-cleavage as

cleavage which results in a fragnent (i-e., a list) whose

- 11 -
First atom has a nou-caLuber atom on its property lis

nh

on
~

a

Shall we say the peaks are always high?

A: That will do as a start. We don't really pay euch
attention to intensities just as lony as the peaks show up.
(Peasons why exact intensities cannot he computed are
explained briefly -- B's interpretation is that chemists
just don't know enough about then.)

B: Now let's get on to the second process -- loss of
carbon monoxide from the alpha~-cleavage fragments. Would
you write that out in detail? Exactly what happens to the
fragment CH3-CH2-C=0 for instance?

A: Let's see, that is mass 57. You will see a high 57
peak for this fragment and vou'll also see a 29 peak hecause

of this process:

- CO CH2-CH3 = (m/e=29)
aoc
—"
CH3 - CH2 % C=O - CH2 ~ CH}

BR: Is that all there is’ to it, just drop off the C=O from
the fragment (thinking of making a second call to the LISP
function which breaks bonds and returns fragments). Does
this happen in every case?

A: Yes it's that simple.

B: What about the intensities of these new peaks?

A: Well, as far as we know they'1il be pretty strong all
the time. Let me check some spectra here. (A looks through

a notebook containing sone mass spectra of simple ketones to

check on the relative abundance of alpha-cleavage minus 28

~ J? -
peaks.) eee Fell some of the tine they're not recordsd
below mass 40 so it's a little hard to say. Put it looxs
like the alpha-cleavaye minus 28 peaks are about kalf as

strong as the alpha-cleavage peaks in most cases,

eee (A and PB digress on the generality of the process; A
thinks of the chemical Frocesses, while B thinks of their
LISP representation.)

Az (Finally.) Now the last important process for ketones,
and this also holds for aldehydes too, is the icLafferty
rearrangement. That is just beta cleavage with migration of
the ganna hydrogen.

Bz: You lost me again. YWounld you write down an example?

A: Take the case we've heen working with, but with a
hormal propyl on the one side. Herets how We would show

What's going on:

CH? CH2

/ Y W

C3 - CH? -c CH2 CH3 - CH2 - c

H t I
0 CH2 OH

~ / +

i
m/e=100 n/fe=72

Bz I guess I still don't understand. wWoula you mind going
through that step by step?

Az We can't really say what the sequence of events is,
just that from the molecular ion of mass 100 you get another

ion of mags 72 -- thea tchatferty rearrangement is just one

way of explaining how that happens, (Digression on how

~ 43 -
chemists can be confident of what the process is, including
some discussion of deutcorium labeling, and meta-stable
transition peaks.)

B: Suppose we wanted to tell the program about McLafferty
rearrangements, as T guess wo do. What do IT tell it in this
case?

eee (A and B work out the details step by step as best they
can. Both A and ® suffer from 3ts lack of experience.)

B: Let's see if I have this straight with another example.

ess (3B picks an example which is too difficult for the
first approximation to the rules which he understands at
this point. This leads to a lengthy discussion of the
conditions under which just one NcLafferty rearrangement
will occur, and conditions under which a "double NcLafferty"
will occur. At the end, B's most valuable possession is 4
piece of paper on which A has sketched several examples with
cryptic notes. B promises to program these three rules,
knowing full well that he won't get them right the first
time but knowing that it will be easier for A to correct
specific errors than to understand everything at once. A
promises to review the published spectra of simple ketones
to come up with sone closer estimates of the relative

intensities of the peaks resulting from these processes.)
Second Session:

B: The program and J are a little smarter than last tine.

But we both need some help. Let me show you what it does

- Fu -
With a few specific examples. (B&B calls the pregranm, and
types in a few examples.)

eee (At this point, A looks at the examples and their
corresponding entries in the notehook of actual mass
Spectra. As he looks he diagrams the processes -- typically
all processes for a muclecule are Superimposed on the grapk
Structure of the molecule, with arrows pointing out of the
main graph to the gtaphs of "daughter ions.)

Az: In all these cases the alpha-cleavages are pretty good,
the alpha-cleavage minus 28 peaks are OK most of the time,
but I don't understand what the progran is doing with
NcLafferty rearrangements. Also, there are a couple of
things that I didn't mention last time -- io rewenbered then
as I reviewed the ketone literature last night; so naturally
the program doesn't know about them.

Bz: Let ne write these down,

A: Two things: there is a difference in relative
abundance of the alpha-cleavage peaks depending on whether
it is major alpha or minor alpha, and second, very often you
will see a dicLafferty plus one peak after the McLafferty

rearrangements.

‘

Bz: Let's core back to those after you've told me what is

wrong with the program as far as it goes.

0

A: {Looking at the examples run by the program.) In th
first case you have the alpha-cleavage’ana alpha minus
carbon monoxide peaks. But what are these others?

B: Let's see, (B inputs the example again with a switch

turned on which allows him to seo which Major functions dat
executed and what their results are.) The proyram thinks it
can do a double McLaftferty rearranjJement --- isntt that
right?

A: It should do one NeLafferty rearrangement, but T doptt
see the right peak for that. Here is the one it should do
(sketching it out), Tt looks like you've tricd to do
something yguite different.

«aes (After much time the errors are traced to a hasic
misunderstanding on E's part and some programming errors.)

Bs Well I guess I'd hetter take care of those things
before you look at more examples, Perkaps I can add those
other things you mentioned earlier. What's this husiness
about major alpha and minor alpha?

Az It is just a way of bringing the intensities predicted
by the program more in line with the actual intensities. In
these examples the major alpha cleavage is the
alpha-cleavage in which the larger alkyl fragment is lost.
(A sketches several examples to illustrate his point.)

B: What sort of jeneral principle defines the minor alpha?

Az The larger alkyl frajment lost.

ee» (BRB agrees to put this in the program after getting it
clear. A new LISP function is mostly conceptualized by how.
Within a few months, however, some poor results’ were traced
to this Form of the principle, so it haa to be reformulated
to consider more than merely the size of the fragment.)

B: Now what about the other thing -- the
McLaiferty-plus-one-peaks?

A: Well, we don’t know much about it, but it seems that in

- 16 -
almost all causes where you see a “cLafferty rearrangement
peak you also see a peak at one mass unit higher. Of course
we can't say where the extra mass cones from, but it doesn't
Ceally matter.

B2 Suppose the program just sticks in the extra peak at
x+1 for every x fron a McLafferty rearrangement?

eee (B'S Suggestion is motivated by the existing LISP code.
The only time the program knows it has a McLafferty peak is
inside one function. After a brief discussion of this, both
A and B decide that the next step is to get the progran to
make more accurate predictions. The. discussion Switches,
then, to adding this ketone information to the planning

phase of tho program.)

After deciding upon an interesting class of organic
molecules, such as ketones, ethers, or anines, the first
step toward informing the program about the mass
Spectrometry theory for that class is to ask a mass
Spectroscopist what rules he generally uses when considering
molecules of the class. Nis first answer is that he expects
Specific fragmentations ana rearrangements to dominate the
entire process, with different mass numbers resulting in
different contexts. He expects just four processes to
explain all siynificant veaks in the mass Spectra of acyclic
ketones: (1) cleavage next to the C=0 (keto) group, i.es.,

alpha-cleavage, (2) loss of carbon monoxide (CO) from the

-~ 147 -
fons resulting from alpha-cleavage, (3) the rearrangement

rocess known as the "McLafferty rearrangenent" nhigration
J

of the gamma hydrogen to the oxygen with suhsequent beta
Cleavage), ard possibly (4) addition of a proton to ions
resulting from ticLafferty rearrangements. The last process
is given far less weight than the first three, seemingly
because there are still too many exceptions to put much
confidence in it. But it is still useful enough of the
time to warrant inclusion in the list. It is impossible to
identify a process with any specific mass numher because
these processes result in different spectral lines when
applied to different structures. For example,
alpha-cleavage (next to the C=0) in E-C-C-C-C=0-C-C€ results
in peaks at mass points 57 and 71 while in C-C-C-C-C-C=0-cC-C

the alpha-cleavaje peaks are at mass points 57 and 85.

These four rules were put into the Predictor's complex
theory and, in a different form, into the rough theory of
the planning stage. The problems we encountered with these
rules are typical of three fundamental problems we have
learned to expect: (1) unanticipated concepts require
additional programming, (2) counter-exanples to the first
rules force revisions, and (3) a false start leads to a

change in strategy.

The first difficulty is just a variation on the old
adage "Expect the unexpected". In our case one root of this

problem is lack of communication between expert and

-~ 13 -
hon~expert. Because the expert tries to make his
explanations simple enough for the layman he leaves out
relations or concepts which very often turn out to be

important for the performance of the program,

Initially the Fredictor's theory treated each cleavage
independently of the others. But the introduction of the
concepts of major and minor alpha-cleavaqas destroyed this
independence and forced revisions on the program. Since tha
expert measure). the relative abundance of minor
alpha-cleavaye peaks in terms of the major peaks, it was
essential to calculate the abundance of the ma jor
alpha-cleavage peaks first. The technique for handling this
was to introduce a switch indicating whether the major
alpha-cleavage had been encountered yet (with appropriate
tests and settings in various places). The underlying
reason for using this technique rather than another was to
plug the hole as yuickly as possible {and as a corollary to

fix things with a minimun of reprogramming).

In the planning stage, the anticipated form of a rule
was a list of peaks at characteristic mass points, (where
these could be relative to the nolecular weight). But in
order to identify alpha-cleavage peaks in ketones the
program needed to find a pair of peaks at masses x1 and x2
which satisfied the relation x1 + x2 = molecular weight +

28. So the program was extended in two ways to account for

this: first, a LISP function vas allowed to stand in place

- 19 -
of an x,y pair as an accoptable rule form in the table of
planning rules, and second, a Function was alded toa the set
of available rules. The function looks for n peaks x1,
eee, XN Which sum to the molecular weight plus kK, where nf
and k are different for different functional groups (n=2, k=

+28 for ketones).

The second fundamental difficulty in this whole process
has come after the additional programming was completed to
take care of new concepts, when we are in a position to try
out the programs on real data. Typically these first trials
uncover counter-examples to the initial set of rules:  #e
have often beon surprisel at the low quality of the
inferences on this first pass. For example, we guickly
Found that the theoretical ketone rules did not always hold
for methyl ketones, i.e., for structures containing the
radical -C-Ci3. The alpha-cleayage on the methyl side
produced a wuch weaker peak than was originally expectel,
and methyl ketones often failed to show significant
McLafferty rearrangement peaks, contrary to expectations.
Thus it was necessary to alter the original rule that both
alpha-cleavaye peaks for ketones must be high peaks, to
allow for the virtual absence of the peak corresponding to
loss of the methyl radical. Also, because of the methyl
case it was necessary to alter the conditions which
determined the strength of HcLafferty rearrangement peaks in

ketones.
1

Experirental mass spectra often contain peaks which the
theory either cannot account for or would have predicted
were absent and tha Spectra often fail to show peaks vhere
the theory predicts there Should be some. Because of this,
the first atteppts to use almost strictly theoretical rules
in the context of real data often reveal counter-examples to
the rules, A theoretical chenist, however, Wants to sweep
away these discrepencies -- wa have heard such comments as
"typing error", “recording error", "impure sample",
"insensitive instrument", "uncareful Operation of the
instrument", and so on. In tracking down the source of the
discrepancies we first check the original data to see that
the computer has looked at what we wanted it to,
Occasionally, our friends have even re-run Samples in their
Own laboratory to check the reliability of the data. But
Our limited experience indicates that the data are seldom in

error: it is the theory that needs more work.

From the chemists" point of view, the dialog process is
also helpful for discovering gaps in the theory, Only when
they started Stating their theoretical rules as precisely as

the computer program demands did they realize how little

{|

their theory of mass Spectroscopy says about sone simple
Classes of molecules. For example, when considering the
class of amines, a chenist wrote out 30 interesting amine

Superatons* which he believed exhausted the possibilities.

A

A program which was developed later to generate superators

convinced him there were, in fact, 31 Possibilities. Even

~ 91-
Professor Carl Djerassi, author of a comprehensive book on
mass spectroscopy, terus his exposition "woefully
inadequate" in places because of the gaps discovared in the
conputer model. (Research is underway to fill these gaps.)
* As readers of the Nachine Intelligence 4 description
of Heuristic DENDRAL will remember, a superat on 1s a
structural frayment which is treated as a single unit. For
exanple, when given the amine superaton ~CH2-VTH-CH3, the
projram will use this structure as an atomic element without
considering any structural variants of it such as
-CH2-CH2~NH2. Thus several atoms in the graph can be
replaced by a Single superatom, at a considerable Saving for

the Structure Generator.

Making a false start is the third type of problen,
which is usually discovered only after a few iterations of
exanining new results and patching rules. Because this
requires backtracking anl reprogramming, it is painful to
realize that sone early decisions were had in light of
subsequent developments. We have had courage enough to
label only a few of our decisions as false starts. For
example, in the planning phase we guickly got into trouble
with identification rules for ether subgraphs by
over-specifyinyg the subgraphs. We had successfully attacked
a previous class of molecules (ketones) by dividing the

class into an elaborate hierarchy of subgraphs, each with

~ 2?-
its own set of identifying rules. But this approach vas not
transferrable to the new class, apparently because the nass
spectrometry of ethers follows a @ifferent pattern. By the
tine we had defined rules for C~O-C, CH2-0-Ci2, Ci3-C-CH2,
CH3-CH2-0-CH2 we were no lorger able to make sound
inferences. Thus it was necessary to start at the beginning
and define a less hierarchical, broader and smaller set of

ether subgraphs,

Typically it has taken weeks of interaction with a
chemist at a console to proceed past the first two
difficulties never knowing whether we were making a false
Start. However, the iterative process itself is not
finished when a set of rules is found which seems to "do the
Tight thing". Because of the number and the complexity of
the subgraphs we often run into trouble because we do not
have the patience to grind out the conseguences of the
inferences which the planning phase makes. For nany
examples of spectra our rules excluded so many subgraphs
that, even though the program was properly instructeé to put
a particular superatoa into every structure generated, it
could not generate any structures at all. In these cases we
have had to weaken the identifying rules still more -- with
the result that we often let in incorrect classes of

molecules to insure that we never excluded the correct onas.

The end of the iterative process to establish planning

rules for a class of molacules comes when we have a set of

- 23 -
rules which correctly identifies substructures contained in
all available examples cf mass spectra for that Class, @.J.,
for all acyclic ethers. Similarly, the end of the process
to astablish the deductive rules comes when the chemists
satisfy themselves that the predicted mass spectra agree in

significant respects to the published mass spectra of a

broad ranye of examples.

It should be mentioned that we recognize the need to

>

clear up the bottleneck of getting new information into th

om

computer. Here, as elsewhere, many alternative designs are
open to us. For instance, we could get rid of the "middle
man” in the information transfer by educating a programmer
in mass spectroscopy or by educating a chemist in LIS?. 3r
we could replace the middle man with a program designed to
perform the same function as B (the layman/programmer) in
the dialog above. In effect, we have been moving slowly in
all three of these directions at once. But what we would
most like to pursue is the design of a progran to elicit
information from an expert who is not also a programmer.
(This ‘seems especially attractive to the real-life B,

needless to say.)

In many areas cf science -- especially the rapidly
expanding frontier areas -- the rules which will someday he
incorporated into a unified theory exist only in an
uncodified morass of recent papers and unpublished notas,

and in the heads of researchers on the frontier. Because of

- Pu -~
tne number and complexity of the rules, they are casy to
foryet, especially so in a collection that is hessy. The
process of codifying this collection is thus both tedious
and important. For this reason automation o£ the diaiog is
of yeneral interest: B is not the only one who stands to

gain.

Because B's function is more than translating from
cheiical language to LISP, the program must be more than a
compiler. Writing the compiler and, before that, designing
a rich enough chemical language Seem unavoidable in the
general problem. 3B does even more than an interactive
Compiler which asks for clarifications of statements. RB
also asks gquestions to fill in gaps, he uses analogies (and
Occasionally even sees one), he constructs possible
counter-examples, and he puts new information into ali parts

of the system which can use it,

Each one of these additional functions adds another
level of complexity to the problem of automating the dialog.
Yet the language of any particular science hay be
sufficiently formal and constrained that the whole problen
1S still tractable. In our task area these problems may be
as well in hand as anywhere. The next fou remarks will
briefly show how they are manifestea in the DENDRAL systen.
B's experience has heen that the expert can easily overlook
a logical possibility, for example, one of all possible

permutations of carbon, hydrogen and nitrogen atons in a
terminal radical. FPecause of the exhaustive Structure
Generator within the pregram ~- in fact, at the heart of the
progran ~- it is possible to enumerate all structures within
a specifies class. Thuase it is possible to use a program to
check for gaps in any list of structures provided hy a
chemist. An important hut non-trivial problen, then, is
finding heuristics which will select "interesting" WLSSiNg
structures, that is, structures the chemist would like to

know he missed,

Freyuently the discussion of a new functional group
will call in analogies with what has been discussed hefore.
"Amines are like ethers", was one specific remark that B had
to make sense of; a smart program should at least know what
questions to ask to make sense of the analogy. It will take
a much smarter projram to recognize these analogies itself.
The point is that the dialog will move much faster if the

program can at least use analogical information.

Constructing counter-exanples may often require a
thorouyh understanding of the theory. But B kas been of
some help to A even though he has only a little knowledge of
mass spectrometry. The dialog progran might easily watch to
see what kinds of cases the expert needs to patch up. This
strategy now leads B to ask "But what about the methyl
case?" for every set of rules that doesn't explicitly
consider methyls. And, surprisingly, this reminder is often

helpful.

- ?6 -
Finally, the "miédle man" in the process iS sonetires
expected to put picces of theory in appropriate places of
the program, and sonetires to shift information from one
place to another. The difficulty here, of course, is that
different parts of the program reguire different
representations of the knowledga: the planning phase is
written in terms of transforming spectral lines into
Structural pieces while the Predictor is written for
transforming structural pleces into spectral lines. As the
theory becones nore ccmplex anji as the representations
diverge, it becomes more difficult to assess the consistency
of the different representations. Human intelligence nov
decides the questions of where to put new information, how
to represent it, and how to make it consistent with other
Statements. These guestions will be discussed in the next
section. Let it suffice here to Say that a dialog routine
cannot be bling to how and where the information will be

7

uSse.1.

In sun, cliciting a theory from an expert is a tedious
process that is worth automating. It has heer our key to
the wealth of knowledge net yet accessible in texthook
packages. And it has benefited the scientist since it
provides a means of codifying a loose collection of
empirical generalizations into a theory. Automating half of
the information transfer shoula add confidence in results as

Well as speed to the process. Our concern is not so much
building a program which teaches itsel® mass spectrometry as

building one which has the capacity to be taugnt.

PARP TIT: GENERAL PROBLEMS OF DESIGN, SEARCH, AND

s

REPRESENTATION

Behind the discussion of the information transfer
process is the unquestioned assumption that the performance
of the Heuristic DENDRAL system depends critically on the
amount of knowledge it has ahout mass spectrometry. Thus it
is necessary to be ahle te add more and more theory to the

program in the easiest bos

2

sible way -- through some such

process as the dialog just discussed.

In addition to the amount of information the system
has, the performance of the system also depends upon how and
when that information is used during the problem solving
process. Writing a progran to use the theory of mass
spectrometry presupposes making a choice about how and where

to reference the theory. That is, it presupposes choosin

6
Q

one design for the systen over others, choosing an efficient
search strategy, and cheosing appropriate representations

for the theory.

in systers science the best design is the one which
maximizes the stated chjective function. Thus an objective

function provides a measure of performance for any design of
the system, when the Function iS available. Unfortunately,

there is n epistenological theor Which allows us to define
Y

my

one objective function and alter the design of Heuristic

DENDRAL systematically to bring its level of performance
Closer and closer to the objective. Our criteria for
evaluating the performance of tha System are admittedly
intuitive: we say that a design, manifested in a computer
projran, is better the less time the program takes, the more
compact the program is, and the nore problems it can solve,
(Also, an intuitive concept of elegance may lie below the

performance measure as a means of judging between prograns

which seem to perform equally well with respect to the other

The larger problem of designing the system efficiently
cannot be ignored by anyone writing complex computer
programs. But design juestions involve more than just
projrawming considerations. As with other large prograns,
Heuristic B3rORAL is broken into segments, with each segment
expected to contribute to the solution of the whole problem
in such a way that the performance of the entire system is
efficient over a broad class of problems. If we were givan
just one desiyn to inplement ona computer, the guestions
would be questions of coding and running efficiency. But we
have been forced to realize that our first choice of design
was not the hest one after all, that we must concern
ourselves with choosing among all possible designs for

Systems which perform the same task.

-~ ?9 -
Apart. fron the fact that no completely satisfactory
measure of performance is forthconing, there remains a
problen of relating the performance of the Cconponents of the
systen with the perforwance of the whole systen. In sore
systems the parts are conpletely independent; thus
maximizing the performance of each part results in
maximizing the performance of the whole system. But in the
case of this program, as in other complex systems, the
cohponents are so interrelated that the best total system is
different from a collection of the "best" independent parts,
hecause the measure of each part's contribution must bring

in the goals of the other parts.

The problem of where to pat theoretical knowledge into
the system is one aspect of the design problem which is of
particular interest to us. There are several components of
this system which might profit from access to the theory of
mass spectronetry if we chose to represent the theory
suitably for each part. Rut we must balance henefits to a
part of the systen against cost to the whole system. For
exanple, the addition of theory to the planning stage >
increases its contribution, and benefits the total system,
as mentioned earlier, with only a small increase in progran
space. Approximately three-quarters of a second spent
scanning the data to make a rough plan resulted in the
saving of ten or more minutes of computer time in the

successive stages of the proyram. By our intuitive measures

-~ 30 -
of good performance, we took that as an improvement, as long
as the reliability of the later parts was not undermined by
nasty planning, However, in the case where ve gave the
planning program indentifying conditions for thirty amine
Subjraphs we did run into serious time trouble, but not
where we expected it. We expected trouble to show up in a
Slow-down of the planning prograr, when it showea up at all.
But in the amine case, the Slow-down came in the generator
because of the nurber of generation constraints added by the
Planning program: three to eight subgraphs, typically,
would be added to Goodlist and the rest of the thirty
Subyraphs added to Radlist. The generator just had too auch
information to process. our solution was to reduce the
number of BadjJist additions, since (a) this was the iajor
source of trouble in the generator, and (b) we could be
assured that we never deleted correct ansvers this way.
Although we did increase the number of wrong answers from
the generator, they would be ruled out when the predictive

theory of mass Spectrometry was applied later,

Woven through the pattern of alternative designs for
the system are alternative search strategies which are
available to the systen dlesiyners. In the desigus actually
programmed, the over-all search Strategy has heen to define

a subspace, generate all hypotheses in that Subspace, and

a

cr

test each. But at least two different strategies are
available to the preyram: (A) test each node in the subspace

during generation {i.e., test partial hypotheses), and (RB)

- 31 -
generate one candidate hyrothesis then use a GPS-like
difference-reduciny strategy to generate better hypotheses.
Both of these alternatives will be discussed as a means of
bringing out some of our design problens, and as a weak

means of justifying the strategy used in the progran.
J L J

The alternative strategy {A) has, in fact, been tried
in one version of the program with only incomplete results
so far. In the simplest application of this strategy, the
generator consults the deductive theory at each node in the
generation tree to determine whether the data indicate that
an unproductive branch has just been initiate@. That is,
the theory is consulted to determine which partial
hypotheses are hot worth expanding. Unproductive branches
are pruned, another node is added to each partial
hypotheses, and the test is repeated. For example, part way
down the search tree one branch (partial hypothesis) might
be an oxygen atom with unbranched carbon atoms on either
side (-CH2 - 0 - CH2-), and the next move for the generator
might be to attach a terminal carbon to one of the carbons
resulting in the partial hypothesis -CH2 - 9 - CH2 - CH3.
Consulting the theory will tell the generator that this is a
fruitful branch only if the data contains peaks at 59 and
the molecular weight minus 15 (4-15), otherwise the branch
would be pruned at this point. Because of the large nurhber
of nodes in “the unconstrained hypothesis space, it was
quickly evident that this strategy could be applied in this

simple way only when the planning phase had indicated a

~ 3? -
relatively small Subspace,

One reason why this alternative Strateyy (A) will not
work well in this task area is that the theory of mass
Spectrometry in the Program, aS in the heads of chemists, is
highly context fopendent. The theory can say very little
about the behavior of isolated atoms or Small groups of
atons in the mass spectrometer without knowing their
environment in the molecule. An ethyl group, (CH3-CH2-) for
instance, usually produces some small peaks in the spectrun
at masses 29 and N-29, but when it is adjacent to a keto
radical (C=0) it will produce strong M-29 and 29 peaks
(depending, of course, cn the structure attached to the
other Side of the keto radical). When an ethyl is attached
to an oxygen in an ether {(CH3-CH2-0-), on the other hand,
the theory predicts a peak at M=15 but not at N-29, and no
peak at mass 29.) Mere imbortantly, the theory can say very
little about pieces of structure which do not contain at
least one terminus, But the canons of structure generation
begin with a node at the center of the structure, Working
down toward the termini. The theory can say almost nothing,
for example, about a chain of Carbon atoms in the center 9
a tolecule without knoving what is at the ends of the chain,

In short, it must know the context.

For any class of problems where it is difficnlt to

validate partial hypotheses, the node~-by-node search

Strateyy is not the hest of alternatives. The current
design with no theory used inside the generator (an? thus no

~

node-by-node testing) is superior to the node-by-node test
strategy with respect toe confidence, and alrost certainly
with respect to time.* Only after branches of the search
tree terminate, i.e., when conplete chenical structures are
generated, can the theory be called with confidence, Fou
only then is the context of each piece of the molecule
completely; determined. But the intermediate calls to the
theory will then either he incorrect or a waste of tine.

* Those familiar with earlier versions of the Heuristic
DEKDRAL system may recall that a rough deductive test was
once applied at each node, using what we called the
"yero-oreéer theory of mass spectrometry". The simplicity of
the tests was both the beauty and the downfall of the
zero-order theory. Because it was not a complex theory, the
test was very cheap, and thus could be applied to every
node. But it was such an oversimplified theory that it vary
often returned incorrect answers to the tests. We have not
abandoned hope of finding. heuristics which indicate
circumStances under which cheap tests are reliable. We are
also asking ourselves how to call the conplex theory
efficiently, as described in (A1) and (A2) of the text to
Follow. Just asking guestions of this sort, and asking how
to incorporate their answers (if found) into the LIS?
program, incidentally, have led to a successful
reformulation of the pregram. The new code, designed to

allow reference to a more general theory than the zero-orjer
about three-fourths

Ui
ct
oe
cr
or
ny

theory, runs ahout twice as fas

rc

the nunhber of instructions,

Adding one or both of two levels OF complexity to the
node-by-node testing strategy (A), however, may make it
competetive with the current test-at-the-end strategy for
our problen. First, we can add some meta~theory to the
testing routine or, second, we can reorganize the generator
to nake the theoretically significant nodes come at the top

of the generation tree.

{A1) Adding eta-theory to the testing routine is
relatively simple since it is possible to say a priori that
the theory itself is uninformative or perhaps misleading on
certain classes of partial Structures. Thus the first test
On a partial hypothesis is to determine whether the theory
can Say anything about it -- whether this partial hypothesis
Warrants the expense of calling the full deductive theory.
Tn this way, the number of calls to the theory is
considerably reduced. The moral seens to be that a little

meta~theory goes a long way,

{(A2) Reorganizing the Structure Generator is a second

Way to maximize the pruning ability of the deductive theory

in node-by-node Checking. As mentioned earlier, the canons
“A

of generation initiate each structure at the center so that

geheration is from the center out to the termini. So in
most cases near the beginning of the ganeration process the
testing routine provides no information which allows
pruning. Testing beyins to pay off only after termination
of one of the branches of the partial structure. By
starting the generator at a terminal aton (instead of at a
central atom) the deductive theory could often prune vary
effectively at the top of the search tree where it is most
desirable. One reason why we have not pursued this
strategy, however, is that we now have no way to decide
which end of the structure will nake the most informative
termial radicals. In those cases where the oxygen of an
ether molecule, for example, lies close to one end and far
fron the others, as in CH3-C1i2-0-CH2-CH2-CH2-CH2-CH3, the
savings would he. positive for the termial atom near the

oxygen, hut negative for the other choice.

(B) Another completely different search strategy which
the program wight have used is a GPS-like difference
reducing strategy, mentioned above as the second alternative
to the current test-at-the-end strategy. The Structure
Generator could construct any molecule as an initial
hypothesis -- preferably within sore constraints set by a
smart planning program -- and the rest of the time would he
spent finding differences between the predicted and actual
mass spectra and then reducing those differences by changing
the structure of the canlidate. Chemists find this
suyyestion attractive because they use somewhat the sane

strategy in analyzing mass spectra, since they are vithout

!
nd
a

t
the benefit of an exhaustive generator. ‘lovever, they have
been unable to articulate a measure of progess toward the
goal or a description ef the process of finding relevant

differences,

Another reason the Gps Strategy does not fit Our
probler is that unless the prograk keeps a precise recora of
hypotheses already considered, it will have trouble avoiding
loops. The structural Changes would be made in pieces, in
response to the salient differences at any level. Thus it
is guite likely that a Sequence of changes, each meant to
reduce one of a set of aifferences, would soon ba in a loop
because Chauging one piece of structure to reduce the one
difference might well introduce other differences in the

mMaSs spectra.

Another important reason why the GPS Framework is not
Suited for this prohlem is that the chemist does not

hecessarily work incrementally toward the goal, as GPS apo

9

cr
Se

He may a

~

id a feature to the hypothesis at one Stage which
seens to introduce more differences than it reduces. And
then, because of that, he may finish the problem in a few
Swift strokes. For example, shifting the position of a
functional group in a candidate nolecule hay explain some
puzzling spectral lines but introduce puzzles about other
lines that the previous structure had explained. This
Strategy of temporarily retreating from the goal, so to

Speak, is also common in Synthetic chemistry and in theoren
proving. In both cases, expressions (or molecules) are
introduced at one stage which are more corplex than the one
at the previous step, because the remainder of the
problem-solving activity is thus sinplified. In other
words, there are certain problems for which step-by-step
movement toward a goal is not the best strategy; mass

spectrum analysis appears to be one of then.

Although the two alternative search strategies A and B
introduce new @ifficulties, modifying the current strategy
may well improve the program without adding serious
problems. One extreme is to use a powerful enough theory in

‘the planning stage to produce only a Single unambiguous

>

hypothesis. That is, plan the hypothesis generation process
so carefully in light of data and theory that just one
structure meets the constraints. This means adding much
more new theory to the planning program. The planning stage
now has a table of interesting and relatively common
subjraphs each coupled with a set of identifying conditions.
Pieces of structure for which the theory has too little
context to identify their presence or absence are left out
of the table entirely. The rest of the table is organized

hierarchically.

However, using a poverful enough theory requires
enunerating whole molecules (because the theory cannot he
applied unambiguously to pieces of molecules out of the

total context), resultiny in an enumeration which would he
far too larye to catalog or search. On the other hand,
enumerating subgravhs -- or pleces of molecules -- in a mach

more Manaygeahle list leaves ambiguities in the ways the

pLleceS can de put toyether ina conplete molecule. That ls,

¢

if wa want to plan carefully enough to isolate exactly one
Structure for any nurzber of ators, the entries in the table
must specify the total context for each piece of structure,
In this case the planning program must do a table look-up on
Spectrun-molecule pairs, obviating the need for the
Structure Generator or Predictor at all. {Much work in the
application of computers to analytic chemistry has this
flavor.) Cataloging anything less than whole structures
Will result in looser constraints, since some contextual
information must be omitted, and thus will result in
generating more than one whole structure in those cases
where there is pore than one way to put the identified

pieces together.

While we cannot rigotously justify our design
decisions, and in particular our decision to use one search
Strategy over another, we have been able to explore sore
alternative @esigns. Perhaps more importantly, we have
Found that the Neuristic DENDRAL system is fertile ground

for exploring these general problems.

Another class cf problems which the system forces on us
has been called "the Representation Prohlem"., There appear

to be several problems under this vubric: choosing a
convenient representaticn for the theory, deciding when to
proliferate representations, Aeciding when two
representations are consistent, and switching from one
representation to another. None of these appears to warrant
the title 'the problem of representation’ any more than tho
others; they all rejuire solution in any systen which admits

any of then.

Initially, the only theory of mass spectrometry of any
complexity in the program was the deductive theory in the
Predictor. The most crucial aspect of the representation
problem at that time ~~ ani probably the orly aspect we saw
-- was choosing a convenient representation. And then,
also, we held a simplistic view of what made a
representation convenient. We meant, roughly, a

representation that was easy to code and write programs for.

Since then it has hecone ohvious that convenience is

also conditional on the persons adding statements to the

wld

theory, as discussed in the second section. For the sake of
communicating with the expert, for example, it may be
necessary to cast the theory in terms of bonds and atoms at.
the level of the dialog, but then transfer those statenents
to a representation in terns of electron clouds and charge
localization for the efficient operation of the progran.
That is, there may be a need for two representations even

though there is only one theory. With only one

representation it is very possible that either communication

- uo -
with the expert or execution of the program will becane
cumbersome, On the other hand, separating the internal
representation from the one which is convenient for
coumunication makes it more difficuit to find mistakes in

the program and to explain mistakes to the expert who must

ultimately correct then,

With the addition cf planning to the program, it was
expedient to introduce a new representation of mass
Spectrometry theory which could be easily reaa by the
planning program. Fven though all of the information was
already in the Pradictorts theory, it was not in a Form
which could he easily used for Planning. For example, the
Predictor'ts theory indicates that a pair of peaks (at least
one Of which is high) will appear in the mass spectra of
ketones as a result of breaks on either side of the keto
(C=O) group, Thus, because of the appearance of C=O (mass
28) in each resulting £ragment, the peaks will add up to the
molecular weight plus 28. The theory in the planning
Program also knows this, but it uses the theory in reverse,
The planning Program looks for a pair of beaks in the data
(at least one of which is high) which sum to m42@ asa
necessary condition for the appearance of the keto group,
That is, the Predictor USes structural information to infer
pieces of the har jraph, while the planning program uses hap

gtaph information to infer pieces of structure.

Duplication of information May be the preferred means

- 44 -
to processing efficiency, even at an obvious cost in space,
as it almost certainly is in this case where conditionals
are read left to right in the prediction (deductive) phase
and re-representations are read the other way in the
planning phase. Aven more critical than the space vs
processing time question, though, is the guestion of
consistency. The system has no way of checking its own
theories for inconsistercies. Worrying about the
consistency of different representations of the theory may
be considered a waste of tine, but we see this as a serious
issue because of the complexity of the body of knowledge
about mass spectrometry. We even have to be careful now
with the internal consistency of each representation because
of complexity. For example, the tules of the planning
program have occasionally put a subgraph on Goodlist and a
more general form of that subgraph on Badlist: to say
something like "this is an ethyl ketone but it is not a
ketone". Our solution to this particular prohlen avoids the
consistency issue by allowing the planning prograr to check
only as far as the first "no" answer in the family tree. In
general, however, because of the complexity of the theory we
are not confident that the prograns are internally

consistent, let alone consistent with each other.

The consistency problem would evaporate iz there were
just one representation of the theory which could be read by
all parts of the system which use the theory. But it may he

unreasonable to expect to Find one representation which is

-~ UD -
Suitable for all DYEPOSCS, Another solution to the
Consistency yuestion is to add either (1) a progran which
Can read both representations of the theory to check for
inconsistencies, OY (2) a different representation to which
modifications will he mace anda Program Which writes the
other tuo representations from the third after each set oF
Chanyes. At the least the consistency of the whole system

can be checked elwbirically by running exatiples, It may well

be that th

be

S is also the hest that can he Gone; there Ray he
no logical preof of consistency for this vaguely stated body

Oo

‘

Z knowledge. In any case, the systen should be designed in
Such a way that the cpportunities for introducing

inconsistencies are minimized,

if the consistency prohlem is dismissed hy disposing of
all but one Pepresentation of the theory in a System, then
the problems of representation become vacuous For that
System. When different representations of the same body of
kKnowledye remain, however, it is possible that SWitching
from one to another laside the program will he desirable,
In this Systen, for instance, it would be very desirable to
be able to move information automatically from the
Predictor's complex theory of mass Spectrometry to the
Planning vroytan's theory. The Convenionce ard consistengy
questions just nentionad have directed attention to the
benefits of Switching representations. There are at least
two ways of Carrying it out here, First, ane rore

generally, if the theory wera Suitably represented, for

- U3 -
exauple in a table, a program could conceivably move pieces
of information from one place to another making appropriate
transformations on the way. This is very difficult for any

complex body of knowledge, though,

eri

ince it is difficult to
put it into a perspicuous form and to write a progran which
can interpret it. The less general way of moving mass
spectrometry theory from Predictor to Preliminary Inference
Maker also appears Slightly less difficult. In effect, the
'
program can be asked to perform a "Gedanker experiment",
i.se., to pose guestions about mass spectrometry and answer
them itself without outside help. The program already has
almost all the necessary equipment for such an experiment.
The major power of the idea is that there is already a
systematic Structure Generator for producing the instances

os

of molecules of any class, for example, all rethyl ketonas.

Moreover, the Structure Generator can also produce the

exenplars, or superatoms, which define the class. The

Predictor tells what happens to each particular molecule in

the mass st

vectrometer. All that remains is a program to
classify the predicted mass spectra and find the comnon
spectral features. These features are just what the
planning program needs to identify the class. In this way
the Predictor's theory is transferrable to the planning

program.

Much of -our current effort is directed to just these
oints: set up one central theory which the expert nodifies
E i

and automatically move the new information to appropriate

- 4u -
places. This effort requires much Leprogramming, some of

which is deserihed in the next part of the paper, i

ot

in

reullires improving the cokmunication with experts a
described in the second part, and it requires answering the

critical dasign questions just discussed,

PAR? TVi TABLE DRIVEN PROGRAMS AND RECENT
FROGRAMMING CHANGES TK HEVRISTIC DENDRAY,

Parts IZ and III have @iscussed tho prohlens of
obtaining and representing scientific theories for a
computer progran. Pesigring the actual computer programs to
acc3ss the theory is ancther problem, which, fortunately,
seens easier to solve than the others. The general
prograrniny approach, adopted after several trials, is
Summed up in the phrase "table driven program", The ideat

is to separate the theory from the program which works with

bh

the theory hy putting specific items of theory on lists ana
in global variables. Changing the theory, then, involves
little actual Ye~programming. This allows experiments to be
carried out with different versions of the theory, a very
useful feature when dealing with a subject which is as
uncodified as mass Spectrometry.

* This idea is worked out in detail in Dorald vWatermants

rogram to learn the heuristics Of draw poker 10).
J
A. The first of the PENDRAL programs to be written as a
table driven program vas the planning prograr (Preliminary
Inference Haker) which bases most of its operation on a list
of names and their asseciated properties. The planner has a
list of Functional groups and subgroups arranged in family
hierarchies, e.g., {A) ketone, {A1) methyl-ketone, {A2)
ethnyl~ketone, etc. Associated with each group and subgroup
is a set of identifying conditions. The program picks the
first main functional group on its list and checks its
identifying conditions against the given mass Spectrum, ©.9.
for the subyroup C215 - C=O - C2 - C - Cl, we have ¥1 ¢
X2 = M + 28 (alpha cleavage) and 72 high (NcLafferty

rearrangement). TE any condition fails to he satisfied, the

uw

group ani all its subgroups are ruled ont - their structures
are put on Badlist. Tf all conditions are satisfied, the
structure of this jroup is put on Goodlist - a list of
preferred subgraphs. Then subgroups will be checked ina
similar way. All groups known to the prograr are thus
considered either explicitly or implicitly. Modifying
either the List of subgroups or their properties will
drastically affect the hehavior of the program. Yet ail the

theory of mass spectrometry in this program is contained in

one or the other place.

B. The Structure Generator program has been table driven
to a small extent; in particular, three lists, Orderlist,

Badlist, and Goodlist, function as tables which Jetermine

~ UB —
the structures which will be generated and their ordor,

Orderlist contains a list of all Chemical atons which theo

co

Program Cah use. Hach atom has properties such as Valenca,
weight, symmetries, otc. Romoving an atom from Orderlist
ffectively removes it frog the domain of the Structure
Generator. The relative order of atoms on Orderlist
Getermines, to a small extent, the oréer of structures in
the output list. MSadlist is another tahle which controls
Output of the Structure Generator. If Radlist is nil, all
topolojyically possible Structures will appear. otherwise,
any structure containing one of the Badlist subgraphs is
pruned from the generation tree as soon as the Badlist item
first appears. This foes not Change the generating
Sequence, Dut rather eliminates structures fron the
unfiltered output list. Goodlist serves two purposes: it
Can determine the order in Which structures are generated
and it can limit generation to a specified class of
Structures, Those Structures containing preferred
Substructures present on Goodlist will be generated first,
While structures containing none of the preferre3
Substructures vill he generated last or not at all if

generation is to he limited,

ry

One of the basic problems inherent in the Structure

Generator, however, has been its rigid insistence on

following the canons of DENDRAL order as they existed four
nN

Cars ago when the vrooram WaS written. These canons
a a

Specified the canonical Form of a Structure, and thus the
implicit goneratiny seyuence, by stating the following

rules:

Count, degree, apical node, and afferent link are the
attributes in decreasing order of importance.
1 is lowest count, increasing integer values are higher
The value of apical nodes follows Orderlist, usually
c<u<o< P< S, with superatons added at the end
1 is minimum deyree, the highest degree is the maximum
valence of ali the atoms on Orderlist)
1 is the minimum link, 3 is the highest link
These specifications were programmed into the Structure
Generator LISP cof@e in such a widespread way that changing
even the allowable ranges for attributes (let alone trying
to change the order of attributes) required many separate
small programming changes. Thus, it was AifFicult to
determine all the places to change the code whenever even
Slight variations of generating strategy were desired.

The rigidity of the program in this respect made it
very difficult to chanye the generating order for
structures. It had occasionally been suggested that
non—branching structures should be given preference, but
such a suggestion was difficult to implement with the forner
Structure Generator. This prohlem has now been overcome oy
a substantial reworckiny of the Structure Generator program.

A basic change in operating procedure made this possible.

m- HR -
This is tho CValuation, at cach level of structure
generation where a node and link are picked and recursion is
about to occur, of each choice of partial structure, and 4
consequent ordering of choices in a plan list. The plrograr
follows the DENDRAL canons through all values of node, link,
and déyree and makes a plan List of all poSsihle ways to adé
the next node to the energing Structure. It orders these
plans according to plausibility scores calculated hy a
Single LISP function. some plans nay be eliminated because
of "implausibility", Only then does the recursion take
place, Operating accoréing to a Single one of those plans,
and then the process is repeated for the next nole to be

added to the emerging structure,

The result of this reorganization is a tremendous
Simplification of the generating algorithm. Instead of
having six functions to generate the complete list of
structures, two are now sufficient. Of the six Functions
{Genrad, Makerads, Uprad, Uplinknode, Upcompnode, and
Updegnode), only two renain. The other Four, whose jobs
were to change a Single Structure, have disappeared,
Previously Genrad constructed the single "lowest" canonical
Structure which could be made from an empirical formula.
This structure had to be "incremented" by Uprad many times
in order to obtain the entire output list. The current
version of Genrad does all this for itself and returns a

a
list of structures as its answer. Incidentally, this

reduced the size of the Structure Generator hy about 25%, a

-~ ug —
substantial savings; and cut execution time about in half.
This reorjanization quickly caused us to notice that it

would now be relatively easy to make the generator into an

almost conpletely table driven progran, by putting the

DENDRAL canons (attributes and their values) on a global

?,

list. This is now possible because the canons are mainly
invoked hy the function Genrad and only a few other utility
Functions. The ney idea is to form a global list of the
Foran

((link 1 2 3) (node C WB 9) (degree 72 3 4))

which will be accesseé@ during the process of making plans
about how to enlarge the structure that is being built. En
the example of the list above, the link is the least
important attribute, and 1 is its least value; thus link=1
is always the first thing to he tried in generating
structures. If, for seme reason, it vas felt that highly
branched structures with heteroatons (non-carhon atoms) near
the center of the structure were the most likely, the

revised form of this global list might appear as
({degree 4 3 2 1) (node 0 C) {link 1 2 3))

4

or if desired, unbranched structures could be eliminated
f

entirely hy revising the list as

((degree 4 3 2) (node C MC) (link 1 2 3)).
This table driven Prograr will have great use Whenevar
sowe data or sone chemistts Special application indicate
that Structure generation should be limited to a very

Specialized class of structures,

C. The Predictor Frogram is currently peing revised in the
form of a tahle driven Progra. This will permit a great
Simplification in the process of adding new chemical theory,
as well as making the program easier to understand and
correct. One large part of the effort of re~ programming the
Predictor is in Switching representations of structures.
Previously, three different Tepresentations of structures
had existed there: the list notation which is characteristic
of the Structure Generator (and the graph matching algorithn
which the Predictor inherited), a variant of the list
notation with unigue numbers assigned to the nodas of tha
graph, and a connection list representation of structures,
In the connection list representation the unique names of
nodes are stored as global LISP atons with properties
declaring the bonds coming to and from each atom. Five
reasons are Jiven for Switching to a conplete connection

list representation in the Predictor:

1. Keep the legal nove generator simple.
The primary motivation for using connection lists was to
ry a
represent bonds uniquely, because the legal move generator

in the Predictor is of the form "nove toa the hext bond ani

- St +
decide whether it breaks." In the connection List, the
directedness of acyclic chemical graphs is maintained with
separate indicators for the links to other nodes and the one
link fron another node. The list of links under the "fron"
indicator for all nojJes, then, is a conplete and irredundant
list of the links in the jyraph. The list notation puts
bonds and atoms in a hierarchy which makes this process

difficult.

2. Represent Fragments uniformly.

Since the Predictor sometimes needs to know what vas
connected to a new fragment over the broken hond, it was
necessary to keep track of the names of the atoms connected
by that bond. So connection lists were necessary even whan
the list structure of a fragment was available. Rut the
connection list representation of structures alone is

sufficient for these purposes.

3. Avoid building up and tearing apart list structures.

All connections are represented once and for all in the
connection lists; temporary changes, e.g. the result of
removing an atom and breaking a bond, can he represented hy
temporarily "pushing down" the appropriate properties.
Previously, the Predictor built new list structures for each
primary cleavage result and for each result of
rearranyerents., Then each of these had to be searched For
such features as the number of double honds one or two honds

removed From any atom in the structure. Even the common

- 5? -
function of assigning a amass number to a fragment was messy
in the list Structure, partiy because of the beanching list

PI

Structure and partly because the nurher of implicit

hydroyens in the list structure had to he calculated each

time,

4. Speed up glaph matching.

in the Predictor, atens in the list structure heereg
hode numbers in order to Specify the places at which a match
occurred. This was essential because the secondary
processes heing modeled in the Predictor affect Specific
atoms, And the structure of the result is important because
the result is itself checked for important Subgraphs,.
eS adding node numbers to the atons in the list, it was
also essential to put all hydrogen atons into the list
explicitly each time a naw fragment vas produced. Hydrogen
atoms are often important conditions for the occurrence of
Secondary processes. So the list structure was no longer
easy to search with the moGified graph matching algorithm of
the Structure Generator. A new algorithm has bean Written

for the connection list representation.

5. Pepresent rings in the sane notation as trees.

Since circular lists are yenerally unéesirable, a
fragment containing a ring could not be represented in the
Sane way as an acyclic fragment. Thus the functions which
Searched for structural Features could not he the same in

both cases, Adding one additional property to show the

~ 53 -
links which maze the acyclic structure into a cycle allow us

to retain a list of unisue bonds. At the sane tine, we can

still find all connections for any atom quickly.

D. interaction and interdependence of the three
sub-prograns of Heuristic DENDRAL must also be considered
when writing and revising these computer programs. Because
of the size of the comsbined programs, it is more practical
to run them separately than to run them together. One
supervisor takes care of the interaction by having each
sub-program write an output file which is then the input
file for the next phase of program operation. The
Proliminary Inference Maker writes the file containing the
empirical formula and the Goodlist and Radlist to be used by
the Structure Generator. That prograt, in turn, reads this
file, an? writes another file containing the single output
list of structures which it generates according to the
Goodlist and Radliist specifications. The Predictor, then,
reads this file to obtain its input, and calculates a mass
spectrur for each structure in the File. Jf other tests
such as ah NER predicticn are to be made on the candidate
structures, the supervisor interfaces the appropriate

projgran to these others in the same Way.

Although it is painful to rewrite a set of prograns as

large as those in Heuristic DENDRAL, the cost of modifying
old prograns scens to increase Sharply as the nusher of pew
ideas increases, Phe primary motivation for conoletely
rewriting large portions of the LISP code is to increase the
program's flexibility. che uajor erphasis is on Separating
the cherical theory and heuristics Fron the rest of the code

by putting cherical information into tables,

PART V: CONCLUSICN

A few general points of Strategy have emerged fron the
J E GY
DENDRAL effort for designing a progran which will explain
pieces of empirical data. #ith regard to the theoretical

knowledge of the task domain in the Program, we believe that

the following six considerations are important.

(1) CORVENTEur REPRESENTATION, AS discussed in Part TWO,
the effort of eliciting a theory from an expart can be
alleviated by Choosing a represantation of the theory in
Which he can converse easily, Although this may not he the
best representation for internal Processing, our experience

OuS to write interface routines

pis
baa
ct
fee

has been that it is ox rer
between the communication language and the internal one,
Father than force the expert to converse in the schene which
suits the tachine. This is also preferable to forcing the

machine to carry on its problem solving in the Franevork of

the dialog,
(2) UNEPTED THsory. For reasons of consistency, the
theory (or set of facts, or axioms) should be collected? if
ane place in the program, with modifications made to this
unigied collection. This is compatible with having
different representations of the theory for different
applications, if this is desirable, as long as there are
lines of communication between the special representations
and the central one. If changes to the theory must be made
by hand to every special representation there is a strong
possibility that inconsistencies will be introduced hetween
two representations which are intended to he equivalent.
Having just one central theory to change from the outside

will greatly reduce this possibility.

(3) PLANNING, In this progran there is no question of
the desirability of using some knowledge of the task domain,
mass spectrometry, to construct a plan for hypothesis
generation. However, it is not clear how much knowledge to
use nor where to use that knowledye. Our one experience
with using too mucn knowledge in the planning stage, when we
were using 31 amine (nitrogen-containing) subgraphs,
indicated that the planning stage could accomodgate a great
number of rules; hut the generator was the part which hecane
overburdened. This is cnly one example of the problems

caused by the lack of a meta~theory for system desiqn.

(4) DEDUCTIVE frst

Ww

. NPespite the efficacy of the

planning stage, there renain ambiguities in the data which
Cannot easily be resolved prospectively. In task areas such

as this one, where testing at each noda in the Search space

LS not feasible, daductive tests on the terminal nodes
hecone especially important. The Structure Generator often
constructs several structures consistent With the plan
because the planning stage does not reference an exhaustive
table of subgraphs. Thus it is necessary to bring in
daductive tests upon specific hypotheses to resolve
anbiguities. The program deduces conseguences of a

hypothesis (toyether with the theory) and looks at the

available Gata for confirmation or disc onfirmation.

(5) GENERATION OF PLANNING cyRs, Because the theory in
the planning phase is part of the more complex theory in the
Predictor it should ke possible to generate planning cues
automatically from the more couprehensive theory. Not only
does this relieve (if not remove) tke consistency WOrLy, it
also opens the possibilit ty of generating cues which night.
not otherwise have been neticed,. Although onr own work is

barely under way on this problea, the potential benefits are

encouraging. In effect the program 1S asked to look at its

bas
ro)
D
Coa

theory to say what would happen if structures of a speci
class were put in a mass Spectrometer, Its answer is a sé
of identifying conditions for structures of the given class.
Hitherto it has been necessary to gather experimental data
to answer this question, but here exists the apparatus to

generate identif ying rules independently of the laboratory

data.
(5) TABLE DPIVEN PROGRARS. Separating the theory fron
the routines which use it facilitates changing the theory to
jmprove it, on the one hard, or to experiment with
variations of it, on the other. Although embedding the
theory in the program's LISP code increases running
efficiency, it seems more desirable, at this point, to
increase the program's flexibility. In the Structure
Generator it is useful to be able to change the canons of
generation, In the Preliminary Inference Maker, the
identifying rules for groups, as well as the groups
thenselves, change frequently and so should he easily
manipulated. The Predictor's theory also needs modifying
fFreguently, which cannot easily be done if all the
theoretical statements are scattered throughout the code. A
complex body of knowledge is rarely easy to modify with
confidence that the result is accurate and consistent. But
the confidence should increase if the statements of the

theory are at least separable from the rest of the progran.

Although each cne of these general points provides
direction for future research, each gives rise to hurerous
problems ranging frem glohal design, search and
representation probliens to minute prograrming

S an

YQ

consiGerations. Wetll know we are making proge
artificial intellijyence when we can look back on these

proklems and wonder why they seened difficult.
RIBLIOGRAPHY

(1) J. Lederberg, G. L. Sutherland, B. G. Puchanan, E. A.
Feigenbaum, A. V. Robertson, A. My Duffield, and Cc.
Djerassi, "Applications of Artificial Intelligence for
Chemical Inference I. The Number of Possible Organic
Compounds: Acyclic Structures Containing C, H, O and uM,
Journal of the Americal Chemical Society, 94:11 (May 21,

1969).

(2) A. 4. DuFfiel@, aA. v. Robertson, C. Djerassi, RB. G.
Buchanan, G. Ie Sutherland, E. A. Feigenbaur, and da.
Lederberg, "Application of Artificial Intelligence for
Chemical Inference TI. Interpretation of Low Resolution
Mass Spectra of Ketones", Journal of the American Chemical

Society, 91:11 (fay 21, 1369),

(3) G. Schroll, A. HM. Duffield, C. Djerassi, B. G.
Buchanan, G. L. Sutherland, #. A. Pelgenbaun, and J.
Lederberg, "Application of Artificial Intelligence for
Chemical Inference IIT. Aliphatic Ethers Diagnosed hy Their
Low Resolution ‘ass Spectra and KMP Data", Journal of the

American Chemical Society (in press).

- 59 -
(4) G. Sutherland, "A Family of LISP Prograns"s to appear
. 7 Le : =
in D. Bobrow (ed), LISP Applications (4lso Stan*orad

Artificial Intelligence Project Nemo Vo. 80).

(5) EB. G. Buchanan, G. L. Sutherland, and F. A. Feigenhaun,
Nieuristic DFNYDRAL: A Programs for Generating Explanatory
Hypotheses in Organic Chemistry". In Machine Intelligence &
(3. Meltzer ani D. Michie, eds) Edinburgh University Press

(1969), (also Stanford Artificial Intelligence Project Meno

No. 62).

(6) C. W. Churchman and B. G. Buchanan, "On the Desion of
Inductive Systems: Sone Philosophical Problems". British

Journal for the Philosophy of Science, (Autumn 1969).

(7) dg. Lederberg and F. A. Feigenbaum, "Nechanization of
Inductive Inference in Crqanic Chemistry", in B. Kleinmuntz
~ €
(ed) Formal Representations For Human Judgnent, (filey,
1968) (also Stanford Artificial Intelligence Project Memo

No. 54).

(8) E. A. Feigenbaum, "Artificial Intelligence: Thomes in
the Second Pecada",. Proceedings of the IFIP68 International
Congress, Edinburgh, August 1968 (in press), (also Stanford

x

Artificial Intelligence Project Hemo No. 67).

(9) J. Lederberg, “ORNDPAIL-64 - A System for Computer

- 60 -
Construction, Enumeration and Notation of Oraanic Molecules
as Tree Structures ane Cyclic Graphs", (Teports to MASA,

unpublisted).

(10) D. A. Waterman, "Machine Learning of Heuristics", Phd
Dissertation (Stanford University Computer Science
Department), lalso Stanford Artificial Intelligence Projest

Nemo No. 74),