Relative System Loading by Community

Section 2.2.2

Nattronal AIM

 

4000+
3000-
2000+
1000-

C4 UuoD

 

Stantord

 

4000+

3000-
2000-
1000-

SJH UUOD

 

Staff

 

4000-

3000-
2000,
1000-

SJH uUu0D

 

Monthly Terminal Connect Time by Community

13.

Figure

Feigenbaum

E. A.

51
Individual Project and Community Usage Section 2.2.3

2.2.3 Individual Project and Community Usage

The table following shows cumulative resource usage by project during the
past grant year. The entries include a summary of the operational funding
sources Coutside of SUMEX-supplied computing resources) for currently active
projects, total CPU consumption by Project (Hours), total terminal connect time
by project (Hours), and average file space in use by project (Pages, 1 page = 512
computer words). These data were accumulated for each project for the months
between May 1978 and April 1979. Again the well developed use of the resource by
the Stanford community can be seen. It should be noted that the Stanford
projects have voluntarily shifted a substantial part of their development work to
non-prime time hours which is not explicitly shown in these cumulative data. It
should also be noted that a Significant part of the DENDRAL and MYCIN efforts,
here charged to the Stanford aliquot, support development efforts dedicated to
national community access to these systems. The actual demonstration and use of
these programs by extramural users is charged to the national community in the
"AIM USERS” category, however.

E. A. Feigenbaum 52
Section 2.2.3 Individual Project and Community Usage

RESOURCE USE BY INDIVIDUAL PROJECT - 5/78 THROUGH 4779

 

 

CPU CONNECT FILE SPACE
NATIONAL AIM COMMUNITY (Hours) (Hours) (Pages)

1) ACT PROJECT 111.39 1497.82 2555
"Acquisition of
Cognitive Procedures"
Jahn Anderson, Ph.0.
Carnegie-Mellon Univ.

2) CHEM SYNTHESIS PROJECT 370.90 5730.58 8339
"Simulation & Evaluation
of Chemical Synthesis"
W. Todd Wipke, Ph.D.
U. California, Santa Cruz

3) MOD HUMAN COGN PROJECT 38.26 654.28 223
(since 12778)
"Hierarchical Models
of Human Cognition"
Peter Polson, Ph.D.
Walter Kintsch, Ph.D.
University of Colorado

53 E. A. Feigenbaum
Individual Project and Community Usage

4)

5)

6)

7)

8)

E. A.

HIGHER MENTAL FUNCTIONS 30.890
"Intelligent Speech
Prosthesis"
Kenneth Colby, M.D.
UCLA

INTERNIST PROJECT 196.99
"DIALOG: Computer Model

of Diagnostic Logic”

Jack Myers, M.D.

Harry Pople, Ph.0.

University of Pittsburgh

MISL PROJECT 3.50
"Medical Information

Systems Laboratory"

Morton Goldberg, M.D.

Bruce McCormick, Ph.D.

U. Tllineis, Chicago Cir.

PUFF/VYM PROJECT 97.48
"Biomedical Knowledge
Engineering in
Clinical Medicine"
John Osborn, M.D.
Inst. Medical Sciences,
San Francisco
Edward Feigenbaum, Ph.o.
Stanford University

RUTGERS PROJECT 30.63
“Computers in Biomedicine"
Saul Amarel, 0.Sce.

Feigenbaum

54

490.29

2658.47

132.47

3351.63

868.12

2687

7832

1120

2222

10093

Section 2.2.3
Section 2.2.3 Individual Project and Community Usage

9) SCP PROJECT 18.39 436.90 275
"Simulation of
Cognitive Processes"
James Greeno, Ph.D.
Alan Lesgold, Ph.D.
University of Pittsburgh

10) AIM PILOT PROJECTS

Psychopharm. Advisor 25.63 537.73 773
Organ Culture 24.35 449.21 924
Commun. Enhancement 1.83 121.71 329
KRL Demonstrations 2.53 54.06 388
AIM Pilot Totals 54.34 1162.71 2414
11) AIM Administration 14.58 461.15 5808

12) AIM Users on Stanford Projects

AGE 1.17 82.22 14
DENDRAL 44.37 860.51 1992
MOLGEN 20 6.39 24
MYCIN 5.12 137.33 295
Guest Call projects) 47.01 812.21 189
Other -63 27.74 144
AIM User Totals 98.50 1927.00 1762
COMMUNITY TOTALS 1065.67 19371.42 45330

55 E. A. Feigenbaum
Individual Praject and Community Usage

CPU

STANFORD COMMUNITY (Hours)

1)

2)

3)

4)

5)

6)

E. A.

AI HANOBOOK PROJECT 80.69
Edward Feigenbaum, Ph.0.

DENDRAL PROJECT 1315.63
"Resource Related Research
Computers and Chemistry"

Car! Ojerassi, Ph.D.

AGE PROJECT 28.76
"Generalization

of AI Tools"

Edward Feigenbaum, Ph...

HYDROIO PROJECT 39.65
"Distributed Processing

and Problem Solving”

Gio Wiederhold, Ph.D.

MOLGEN PROJECT 384.31
"Experiment Planning System
for Molecular Genetics"
Edward Feigenbaum, Ph.D.
Laurence Kedes, M.D.
Douglas Lenat, Ph.D.

Nancy Martin, Ph.D.
U. New Mexico

MYCIN PROJECT 499.07
"Computer-based Consult.
in Clin. Therapeutics"
Bruce Buchanan, Ph.D.
Edward Shortliffe, M.D0., Ph.D.

Feigenbaum 56

CONNECT
(Hours)

1935.01

19639.31

1022.46

1725.03

6954.92

8384.56

Section 2.2.3

FILE SPACE
(Pages)

2021

21517

1344

789

5730

8687
Section 2.2.3

7) PROTEIN STRUCT MODELING
"Heuristic Comp. Applied
to Prot. Crystallog.”
Edward Feigenbaum, Ph.D.

8) RX PROJECT (since 2779)
Robert Blum, M.D.
Gio Wiederhold, Ph.D.

9) STANFORD PILOT PROJECTS
Genetics Applic.
Quantum Chemistry
Ultrasonic Imaging
Miscellaneous

Stanford Pilot Totals

 

 

10) SU-ASSOCIATES
COMMUNITY TOTALS
SUMEX STAFF
1) Staff

2) MAINSAIL Development
3) Staff associates, misc.

COMMUNITY TOTALS

206.48

7.57

104.50
178.64
5.32

- 43

288.89

22.06

2873.11

CPU
(Hours)

953.68

446.39
65.62

1465.69

57

Individual

2958.98

608.94

CONNECT
(Hours)

28941.65
9045.69
2776.72

40764.06

Project and Community Usage

4392

312

482
810
85

1384

1557

FILE SPACE
(Pages)

9028
3804
4503

—. A. Feigenbaum
Individual Project and Community Usage Section 2.2.3

CPU CONNECT FILE SPACE
SYSTEM QPERATIONS CHours) (Hours) (Pages)
1) Operations 1949.22 78944.64 81114
RESGURCE TOTALS 7353.69 187036.64 191512

E. A. Feigenbaum 58
Section 2.2.4 Network Usage

2.2.4 Network Usage

The following plots show total terminal connect time per month for TYMNET
and ARPANET users since initial connection. No corresponding plot is presented
for the experimental TELENET connection because of frequent line configuration
changes during the connection period and the short pertod of active use.

 

 

12004 TYMNET Usage
1000+
n 800+
.
x=
G
vu 600-
c
Cc
°
O
400 -
200-
0 ' ] ' ee | Ls , qt 7 im v i t T tS f ] 3
QOFTATOIATOITATOITAITOIA
1975 1976 1977 19782 1979

Figure 14. TYMNET Usage Data

59 E. A. Feigenbaum
Network Usage Section 2.2.4

12004 ARPANET Usage

1000-

8004 |
6004
4004

" An y(t

Connect Hrs

 

 

Figure 15. ARPANET Usage Data

E. A. Fetgenbaum 60
Section 2.3 Network Usage
2.3 Resource Equipment Summary

A complete inventory of resource equipment is being submitted separately
along with the budget material.

61 E. A. Feigenbaum
Network Usage Section 2.4

2.4 Publications

The following are publications for the SUMEX staff and have included papers
describing the SUMEX-AIM resource and on-going research as well as documentation
of system and program developments. Publications for individual collaborating
Projects are detailed in their respective reports (see Section 4 on page 64).

[1] Carhart, R.E., Johnson, $.M., Smith, D.H., Buchanan, 8.6., Dromey, R.G., and
Lederberg, J, Networking and a Collaborative Research Community: A Case Study
Using the DENDRAL Programs, ACS Symposium Series, Number 19, Computer
Networking and Chemistry, Peter Lykos (Editor), 1975.

(2] Levinthal, E.C., Carhart, R.E., Johnson, S.M., and Lederberg, J., When
Computers Talk to Computers, Industrial Research, November 1975

 

 

{3] Wilcox, C. R., MAINSAIL - A Machine-Independent Programming System,

Proceedings of the DEC Users Society, Vol. 2, No. 4, Spring 1976.

[4] Wilcox, Clark R., The MAINSAIL Project: Developing Tools for Software
Portability, Proceedings, Computer Application in Medical Care, October,
1977, pp. 76-83.

 

[5] Lederberg, J. L., Digital Communications and the Conduct of Science: The New

— eee Ee OEE COS

Literacy, Proc. IEEE, Vol. 66, No. 11, Nov 1978.

 

[6] Wilcox, C. R., Jirak, G. A., and Dageforde, M. L., MAINSAIL
Software Portability, in preparation.

= An Approach to

[7] Rindfleisch, T. C., Feigenbaum, E. A., and Lederberg, J., SUMEX-AIM ~ A Mode]
for Resource Sharing and Scientific Collaboration, in preparation.

Mr. Clark Wilcox also chaired the session on "Languages for Portability" at
the DECUS DECsystem10 Spring '76 Symposium.

In addition, a substantial continuing effort has gone into developing,
upgrading, and extending documentation about the SUMEX-AIM resource, the SUMEX-
TENEX system, the many subsystems available to users, and MAINSAIL. These
efforts include a number of major documents (such as SOS, PUB, and TENEX-SAIL
manuals} as well as a much larger number of document upgrades, user information
and introductory notes, an ARPANET Resource Handbook entry, and policy
guidelines.

E. A. Feigenbaum 62
Section 2.4 Network Usage

3 Resource Finances

3.1 Budget Information

The budget for the SUMEX project detailing past actual costs, current year
status, and estimates for the next grant year are submitted in a separate
document to the NIH.

3.2 Resource Funding

The SUMEX-AIM resource is essentially wholly funded by the Biotechnology
Resources Program (6). The various collaborator projects which use SUMEX are
independently funded with respect to their manpower and operating expenses. They
obtain from SUMEX, without charge, access to the computing and, in most cases,
communications facilities in exchange for their participation in the scientific
and community building goals of SUMEX.

(6) Except for participation by Stanford University in accordance with
general cost-sharing and for assistance to SUMEX from other projects with
overlapping aims and interests.

63 E. A. Feigenbaum
Collaborative Projects

4 Collaborative Project Reports

The following subsections report on the collaborative use of the SUMEX
facility. Descriptions are included for the formally authorized projects within
the national AIM and Stanford aliquots and the various "pilot" efforts currently
under way. These project descriptions and comments are the result of a
solicitation for contributions sent to each of the project Principal
Investigators requesting the following information:

I. SUMMARY OF RESEARCH PROGRAM

Technical goals

Medical relevance and collaboration

Progress summary

List of relevant publications

Funding support status (see below for details)

manow >

II. INTERACTIONS WITH THE SUMEX-AIM RESOURCE
Collaborations and medical use of programs via SUMEX
B. Sharing and interactions with other SUMEX-AIM projects
(via workshops, resource facilities, personal contacts, etc.)
C. Critique of resource management
(community facilitation, computer services, capacity, etc.)

»

TTI. RESEARCH PLANS (8779 - 7781)

A. Long range project goals and plans

B. Justification and requirements for continued SUMEX use
[This section will be of special importance to the Advisory
Committee and is your application for continued access. ]

C. Your needs and plans for other computational resources, beyond
SUMEX/AIM

D. Recommendations for future community and resource development

We believe that the reports of the individual projects speak for themselves as

rationales for participation; in any case the reports are recorded as submitted
and are the responsibility of the indicated project leaders.

E. A. Feigenbaum 64
Section 4.1 National AIM Projects

4.1 National AIM Projects
The following group of projects is formally approved for access to the AIM

aliquot of the SUMEX-AIM resource. Their access is based on revien by the AIM
Advisory Group and approval by the AIM Executive Committee.

65 E. A. Feigenbaum
Acquisition of Cognitive Procedures CACT) Section 4.1.1

4.1.1 Acquisition of Cognitive Procedures (ACT)

Acquisition of Cognitive Procedures (ACT)

Dr. John Anderson
Carnegie-Mellon University
Pittsburgh, Pennsylvania

I. Summary of Research Program
A. Technical goals:

To develop a production system that will serve as an interpreter of the
active portion of an associative network. To model a range of cognitive tasks
including memory tasks, inferential reasoning, language processing, and problem
solving. To develop an induction system capable of acquiring cognitive
procedures with a special emphasis on language acquisition.

B. Medical relevance and collaboration:

1. The ACT model is a general model of cognition. It provides a useful
model of the development of and performance of the sorts of decision
making that occur in medicine.

2. The ACT model also represents basic work in AI. It is in part an
attempt to develop a self-organizing intelligent system. As such it
is relevant to the goal of development of intelligent artificial aids
in medicine.

We have been evolving a collaborative relationship with James Greeno and
Allan Lesgold at the University of Pittsburgh. They are applying ACT to modeling
the acquisition of reading and preblem solving skills. We have made ACT a guest
system within SUMEX. ACT is currently at the state where it can be shipped to
other INTERLISP facilities. We have received a number of inquiries about the ACT
system. ACT is a system in a continual state of development but we periodically
freeze versions of ACT which we maintain and make available to the national AI
community.

C. Progress and accomplishments:

ACT provides a uniform set of theoretical mechanisms to model such aspects
of human cognition as memory, inferential processes, language processing, and
problem salving. ACT's knowledge base consists of two components, a
propositional component and a procedural component. The propositional component
is provided by an associative network encoding a set of facts known about the
world. This provides the system’s semantic memory. The procedural component
consists of a set ef productions which operate on the associative network. ACT's
production system is considerably different than many of the other currently
available systems (e.g., Newell's PSG). These differences have been introduced
in order to create a system that will operate on an associative network and in
order to accurately model certain aspects of human cognition.

£. A. Feigenbaum 66
Section 4.1.1 Acquisition of Cognitive Procedures (ACT)

A small portion of the semantic network is active at any point in time.
Productions can only inspect that portion of the network which is active at that
time. This restriction to the active portion of the network provides a means to
focus the ACT system in a large data base of facts. Activation can spread down
netuork paths from active nodes to activate new nodes and links. To prevent
activation from growing continuously there is a dampening process which
periodically deactivates all but a select few nodes. The condition of a
production specifies that certain features be true of the active portion of the
network. The action of a production specifies that certain changes be made to
the network. Each production can be conceived of as an independent “demon.” Its
purpose is to see if the network configuration specified in its condition is
satisfied in the active portion of memory. If it is, the production will execute
and cause changes to memory. In so doing it can allow or disallow other
productions which are looking for their conditions to be satistied. Both the
spread of activation and the selection of productions are parallel processes
whose rates are controlled by "strengths" of network links and individual
productions. An important aspect of this parallelism is that it is possible for
multiple productions to be applied in a cycle. Much of the early work on the ACT
system was focused on developing computational devices to reflect the operation
of parallel, strength-controlled processes and working out the logic for creating
functioning systems in such a computational medium.

We have successfully implemented a number of small-scale systems that model
various psychological tasks in the domain of memory, language processing, and
inferential reasoning. There was a larger scale project to model the language
processing mechanisms of a young child. This includes implementation of a
production system to analyze linguistic input, make inferences, ask and answer
questions, ete.

The current research is focused on developing mechanisms for the
acquisition of skills. In the framework of the ACT system this maps into
acquiring new productions and modifying old productions. We have developed
learning devices to enable existing productions to create new productions, to
adjust the strengths of existing productions, to produce more general variants of
existing productions, to produce more discriminant variants of existing
productions, and to combine a number of existing productions into a single
compact production. We have developed the F version of the ACT system which has
these learning facilities. We have so far tested out the system in a number of
small learning examples. Current goals involve applying the system to the
acquisition of language skills, development of mathematical problem solving
skills, and acquisition of initial programming skills.

The basic insight in this research is to model skill acquisition as an
interaction between deliberate learning and automatic induction. To the extent
that the teacher or the learner is able to understand the skill to be acquired,
it is possible for ACT to directly create the necessary preductions. However, as
a fallback for less structured situations, ACT has automatic induction mechanisms
that try to develop the necessary mechanisms by an intelligent trial-and-error
inductive process. Much of our research has gone to identifying the heuristics
used by this inductive process. Traditionally, there has been a contrast in
psychology between learning with understanding and learning by trial and error.
It is now clear to us that most real learning situations involve a mixture and
the key to understanding skill acquisition is to understand that mixture.

67 E. A. Feigenbaum
Acquisition of Cognitive Procedures (ACT) Section 4.1.1

Gne major project is the investigation of the learning of skills in
Geometry. We have written several versions of a program that provides reasons,
j.e. postulate names, to worked-out proofs. A number of new mechanisms were
developed for this program. For instance, we developed a semantic net
representation of the goal tree for problem solving. We also developed ways for
the program to automatically shift from a serial search to a parallel search for
relevant postulates. There were also several applications of ACT’s general
learning mechanisms to learn and speed up the use of postulates.

D. Current list of project publications:

[1] Anderson, J.R. Lanquage, Memory, and Thought. Hillsdale, N.J.: L. Erlbaum,
Assoc., 1976.

[2] Kline, P.J. & Anderson, J.R. The ACTE User's Manual, 1976.

[3] Anderson, J.R., Kline, P. & Lewis, Cc. Language processing by production
systems. In P. Carpenter and M. Just (Eds.). Cognitive Processes in
Comprehension. L. Erlbaum Assoc., 1977.

{4] Anderson, J.R. Induction of augmented transition networks. Cognitive
Science, 1977, 125-157.

[5] Anderson, J.R. & Kline, P. Design of a production system. Paper presented
at the Workshop on Pattern-Directed Inference Systems, Hawaii, May 23-27,
1977.

[6] Anderson, J.R. Computer simulation of a language acquisition system: A
second report. In 0. LaBerge and §$.J. Samuels (Eds.). Perception and
Comprehension. Hillsdale, N.J.: L. Erlbaum Assoc., 1978.

{7] Anderson, J.R., Kline, P.J., & Beasley, C.M. A theory of the acquisition of

cognitive skills. In G.H. Bower (Ed.). Learning and Motivation, Vol. 13.
New York: Academic Press, 1979.

[8] Anderson, J.R., Kline, P.J., & Beasley, C.M. Complex Learning. In R-. Snow,
P.A. Frederico, € W. Montague (Eds.). Aptitude, Learning, an Instruction:

Cognitive Processes Analyses. Hillsdale, N.J.: Lawrence Erlbaum Assoc.,
1979.

Il. Interaction With the SUMEX-AIM Resource
A. &€ 8. Collaborations, interactions, and sharing of programs via SUMEX.

We have received and answered many inquiries about the ACT system over the
ARPANET. This involves sending documentations, papers, and coptes of programs.

E. A. Feigenbaum 68
Section 4.1.1 Acquisition of Cognitive Procedures (ACT)

The most extensive collaboration has been with Greeno and Lesgold who are also on
SUMEX (see the report of the Simulation of Comprehension Processes project).
There is an ongoing effort to assist them in their research. Feedback from their
work is helping us with system design.

We find the SUMEX-AIM workshops ideal vehicles for updating ourselves on
the field and for getting to talk to colleagues about aspects of their work of
importance to us.

Due to memory space problems encountered by ACT (see section III.A.2) we
expect that soon we will need to make use of the smaller version of INTERLISP
developed at SUMEX for use in the CONGEN program.

C. Critique of resource management.

The SUMEX-AIM resource has been well suited for the needs of our project.
We have made the most extensive use of the INTERLISP facilities and the
facilities for communication on the ARPANET. We have found the SUMEY personnel
extremely helpful both in terms of responding to our immediate emergencies and in
providing advice helpful to the long-range progress of the project. Despite the
fact that we are not located at Stanford, we have not encountered any serious
difficulties in using the SUMEX system; in fact, there are real advantages in
being in the Eastern time zone where we can take advantage of the low load on the
system during the morning hours. We have been able to get a great deal of work
done during these hours and try to save our computer-intensive work for this
time.

Two location changes by the ACT project (from Michigan to Yale in the
summer of 1976 and from Yale to Carnegie-Mellon in the summer of 1978) have
demonstrated another advantage of working on SUMEX: In both cases we were back to
work on SUMEX the day after our arrival.

III. Research Plans (8/79-7/781)
A. Long-range user project goals and plans:

Qur long-range goals are: (1) Continued development of the ACT system; (2)
Application of the system to modeling of various cognitive processes; (3)
Dissemination of the ACT system to the national AI community.

1. System Development Efficiency problems are the most serious ones currently
facing the ACT system. Even, the modest-size simulations of learning we
have done Cabout 100 productions) run out of space in INTERLISP after 200
cycles and each cycle may take almost a minute of real-time during periods
of moderate system load. We are developing the capability to represent
productions as compiled LISP code which should significantly improve the
speed of the system and, perhaps even more important, should alleviate
space problems because of INTERLISP's ability to overlay compiled code.

We also hope to implement ACT in the smaller versions of INTERLISP that
have been developed at SUMEX.

69 E. A. Feigenbaum
Acquisition of Cognitive Procedures (ACT) Section 4.1.1

2. Application to Modeling Cognitive Processes. We anticipate a gradual
decrease in the amount of effort that will go into system development and

an increase in the amount of effort that will go into application of the
system for modeling. We mentioned above the modeling efforts that we are
using to assess the suitability of the ACTF system. We have long-range
commitments to apply the ACT learning model to the following three topics:
Acquisition of language (both first and second language acquisition);
acquisition of programming skills; acquisition of problem solving skills
in the domain of geometry. We find each of these topics to be
considerable interest in and of themselves, but they also will serve as
strong tests of the learning model. We are hopeful that the systems that
are acquired by ACT will satisfy computational standards of good
artificial intelligence. Therefore, in future years we would also be
interested in applying the ACT model to acquisition of cognitive skills in
medically related domains such as diagnosis or scientific inference.

SUMEX would be an ideal location for collaboratian on such a project.

We are also designing a system that will learn to give reasons to proofs.
It will have the ability to use existing knowledge about such things as
iteration, to accept instructions from a textbook, and to automatically
become more efficient as it works on proofs. One learning mechanism we
are very interested in is composition, a more general version of the
transitive rule of inference used to combine productions. It promises to
be interesting in its ability to change goal trees while problem solving.
We will investigate it further.

3. Dissemination of the ACT project Although a guest version of ACT has been
implemented, a user manual will have to be completed for this version
before it is truly accessible to guests. A manual for the E version of
ACT has existed for some time, but a manual for the F(learning) version of

ACT is currently in preparation.

 

B. Justification for continued use of SUMEX:

Qur goal for the ACT system is that it should serve as a ready-made
“programming language" available to members of the cognitive science community
for assembling psychologically-accurate simulations of a wide range of cognitive
Processes. Our intention and ability to provide such a resource justifies our
use of the SUMEX facility. This facility is designed expressly for the purpose
of developing and supporting such national AI resources and is, in this regard,
clearly superior to the (otherwise outstanding) facilities we have available
locally from the Carnegie-Mellon computer science department. Among the most
important SUMEX advantages are the availability of INTERLISP on a machine
accessible by either the ARPANET or TYMNET and the existence of a GUEST login.
It appears that, at least for the time being, ACT has no hope of being a national
resource unless it resides at SUMEX and, given the local unavailability of a
network-accessible INTERLISP, it would even be very difficult to shift any
significant portion of our development work from SUMEX to CMU.

C. Needs and plans for other computational resources

Carnegie-Mellon'’s plans to begin upgrading its PDP-10 hardware to emerging
state-of-the-art machines (VAX, LISP machines, etc.) promises to provide a

E. A. Feigenbaum 70
Section 4.1.1 Acquisition of Cognitive Procedures (ACT)

excellent resource eventually, and we hope to have access to that resource as it
develops. However, given that a considerable amount of software development will
be required, a sophisticated LISP system such as INTERLISP is not likely to be
available on this hardware in the near future.

D. Comments and suggestions for future resource goals:

We would, of course, be delighted if the computational capacity of the
SUMEX facility could be increased. The slowness of the system at peak hours is a
limiting factor although it is not grievous. This problem is perhaps less
grievous for us than Stanford-based users because of our ability to use morning
hours. We do not feel any urgent need for development of new softuare.

71 E. A. Feigenbaum
Chemical Synthesis Project (SECS) Section 4.1.2

4.1.2 Chemical Synthesis Project (SECS)

SECS - Simulation and Evaluation of Chemical Synthesis

Principal Investigator: W. Todd Wipke
Board of Studies in Chemistry
University of California at Santa Cruz

Coworkers: (Postdoctoral Fellows) S. Krishnan, C. Buse, and M. Huber
(Graduate Students) G. Ouchi and DBD. Dolata
(Programmers) T. Blume, M. Toy, and M. Case

I. SUMMARY OF RESEARCH PROGRAM
A. Technical Goals.

The long range goal of this project is to develop the logical principles of
molecular construction and to use these in developing practical computer programs
to assist investigators in designing stereospecific syntheses of complex bio-
organic molecules. Our specific goals this past year focused on basic research
into representation of strategies, incorporation of automatic processing of
functional group interchange, and preparing a robust version of SECS for updating
the ADP network copy and prerelease to NIH and other collaborators.

B. Medical Relevance and Collaboration.

The development of new drugs and the study of how drug structure is related
to biological activity depends upon the chemist's ability to synthesize new
molecules as well as his ability to modify existing structures, e.g.,
incorporating isotopic labels or other substituents into biomolecular substrates.
The Simulation and Evaluation of Chemical Synthesis (SECS) project aims at
assisting the synthetic chemist in designing stereospecific syntheses of
biologically important molecules. The advantages of this computer approach over
normal manual approaches are many: 1) greater speed in designing a synthesis; 2)
freedom from bias af past experience and past solutions; 3) thorough
consideration of all possible syntheses using a more extensive library of
chemical reactions than any individual person can remember; 4) greater capability
of the computer to deal with the many structures which result; and 6) capability
of computer to see molecules in graph theoretical sense, free from bias of 2-D
projection.

The objective of using SECS in metabolism is to predict the plausible
metabolites of a given xenobiotic in order that they may be analyzed for possible
carcinogenicity. Metabolism research may also find this useful in the
identification of metabolites in that it suggests what to look for. Finally, it
seems there may even be application of this technique in problem domains where
one wishes to alter molecules so certain types of metabolism will be blocked.

E. A. Feigenbaum 72
Section 4.1.2 Chemical Synthesis Project (SECS)

C. Progress and Accomplishments.

Research Environment: At the University of California, Santa Cruz, we have
a GT40 and a GT46 graphics terminal connected to the SUMEX-AIM resource by 1200
baud leased lines (one leased line supported by SUMEX). We also have a T1725,
T1745, CDI-1030, DIABLO 1620, and an ADM-3A terminal used over leased lines to
SUMEX. UCSC has only a small IBM 370/145, a PDP-11745 and 11/70 (the latter are
limited to small student time-sharing jobs of 12 K words per user), all of which
are unsuitable for this research. The SECS laboratory is located in the same
building as the synthetic chemists at Santa Cruz so there is very facile
interaction.

THE SECS PROGRAM is a large interactive program. On SUMEX it occupies
about 150K words if not overlayed and about 68K when overlayed. SECS is
generally used from a GT4X terminal, but can with less convenience be used from a
teletype. In the former case, the chemist draws in ‘the target molecule to be
synthesized using the light-pen. The basic sequence then is that the program
analyzes the structure for rings, functional groups, stereochemistry, etc.,
builds a three-dimensional model, and if appropriate also a Huckel Molecutar
Orbital model of the pi-systems, and finally on the basis of this knowledge,
selects from a library of chemical transforms those reactions which could be used
in the last step of the synthesis of this target. First the program revieus the
generated precursors to see that they do not violate simple chemical rules of
valence and stability, then the chemist reviews the precursors to delete those
that seem uninteresting, and to select one for further processing in the same way
the original target structure was processed.

Bug Fixes, Additions and Modifications: In the past year considerable
effort has been devoted to the elimination of bugs and improvement of human
engineering features. All bugs which had been found by us or reported by other
users have been corrected. By deliberately requesting SECS to perform
contradictory or ambiguous tasks, several additional bugs were uncovered and
fixed. The addition of some simple routines to handle input has made it
virtually impossible for the user to crash the program by giving it incorrect
input. The overall result is that SECS 2.7 is by far the most robust version of
the program ever produced and is the pre-release version being made available to
those who request it.

SECS Users Manual: The previous SECS Users Manual (version 2.0) has been
completely rewritten to include the extensive additions and modifications which
have been made since the release of version 2.0. The manual provides not only
operating instructions, but background information and examples to show users how
best to use SECS 2.7.

 

 

Hardcopy of the Synthesis Tree: A user can now specify structures in the
synthesis tree to be plotted. This can be by individual structure, the lineage
of a structure, or conditions such as all structures with a priority value
greater than 60 or that have been rated "GooD". A separate program then drives a
local Zeta plotter to plot the synthesis tree with structures, transform names
and priorities. The user specifies the format of the tree. Trees containing
thousands of structures can be plotted--the plot is simply generated in strips
that are later pasted together. This facilitates sending a chemist a permanent
record of the synthesis tree that can be mounted on his wall and provide guidance
to his ongoing experimental project.

 

73 E. A. Feigenbaum
Chemical Synthesis Project (SECS) Section 4.1.2

Alehem Library: We received a number of transforms which had originally
been written by the SECS group and subsequently modified by chemists at Merck.
Most of these transforms are tremendous improvements. However, some transforms,
particularly those involving bond migrations, had been modified in such a way
that chemically reasonable transformations could be suppressed for what are
purely strategic reasons. Our philosophy has always been to keep chemistry and
strategic considerations separate. The Merck-modified transforms have been
included in our chemistry library. Our current focus is on strategic control,
but we are correcting ALCHEM transform errors when they appear. It is hoped that
as SECS is used by more sites, we will receive additional input to our current
library of approximately 400 transforms.

Strategic Control: In the early days of computer synthesis, the major
problems were in representing reactions so the computer could carry them out
correctly. The problem has now shifted to the question of how to properly guide
the program efficiently toward pathways which are not only chemically plausible,

but are also synthetically significant. We refer to this guiding as strategic
control. Without strategic control, SECS applies all reactions that "fit" the
target, which generates one level of the synthesis tree. Although in theory the
chemist could select appropriate precursors and still find many good syntheses,
in practice so many precursors are generated that it is difficult to pick cut the
"good" precursors, it is difficult to foresee where a given precursor might
ultimately lead, and it is so tiring that one doesn't explore the synthesis tree
as completely as one should. Feedback from users of SECS indicates they too
recognize that strategic contro! is a major urgent need for this research.

The problem is to control the program without introducing unnecessary bias,
since freedom from bias is the computer's advantage over manual analysis. We
have developed a philosophy and an implementation which we feel may solve this
problem. We define strategy as a general Principle which helps guide one in
generating a simple synthesis. Strategies are based on symmetry, mathematical
considerations of yield, economy of operations, etc. We prevent strategies from
being based on any particular reaction. then a strategy is applied to a
particular synthetic target molecule, it generates goals. Goals are described
only in terms of molecular structural changes or features, and may not, for
example, refer to reactions. Thus, strategies create goals, and both are
completely independent of the reaction library.

 

Our list structured language continues to evolve as need for new
expressions occurs. We have generalized its structure to allow for any number of
machine generated goals and improved the human interface to the goals, preventing
accidental recursive goals, and providing extensive help and explanation of how
to create and modify goals. Much of our effort has been directed toward creating
goals to save the chemist time and to assure that good goals are not accidentally
overlooked.

The following paragraphs describe some of the current strategy work.

Subgoals. When a chemical transform has a high priority and seems to be
able to satisfy a goal on the goal list the transform is "relevant", but still
may not be “applicable” owing to some mismatch between what the transform
requires and what the operand structure has. This mismatch can spawn a SUBGOAL
to change the structure until this transform is applicable. The first

E. A. Feigenbaum 74
Section 4.1.2 Chemical Synthesis Project (SECS)

utilization of subgoals in SECS is for automatic functional group interchange
CFGI).

The new subgoals have been expanded to encompass enough information to
allow the program to continue from the point where a structural mismatch forced
the initial halt. After the subgoal has been satisfied, and the FGI intermediate
has been created, SECS then returns to the originating transform and proceeds
with the application of that transform. After this has been done for all subgoal
created intermediates, SECS then presents the chemist with the multi-step tree
that is produced.

On complex molecules with large number of functional groups many subgoals
are created, even when duplicates are prevented. This caused problems due to
storage limitations. This problem has been partially solved by enabling SECS to
estimate the likelihood of success of the subgoal originating transform before
generation or application of the subgoal. This not only saves space by preventing
the creation of subgoals who's creating transform will predictably fail, but also
saves CPU time by eliminating the need to try to satisfy these fruitless
Ssubgoals. In test cases, from 50% to 75% of the originating transforms could be
shown to predictably fail, thus saving that much space and time.

Since this process involves looking at transforms in an uncertain
environment, not all failures can be predicted. Approximately 10% of the
subgoals created still lead to “useless” intermediates. However, none of the
eliminated subgoals would have led to "fruitful" intermediates, so the process is
. quite acceptable.

A Functional Group Oriented Strateay. Another machine-generated strategy
based on the functional groups present in the target molecule has been
implemented in the SECS program. In its present form, those transforms which
utilize functional groups regarded as sensitive are favored over those which do
not. The effect is to focus the attention of the program on one part of the
molecule until the sensitive functional group(s) are removed or altered or until
that part of the molecule is removed completely. At present, three levels of
functional group sensitivity have been defined for this Purpose: very sensitive,
sensitive and not sensitive. The classification of a particular functional group
depends on its sensitivity toward a range of reaction conditions and its
"“protectability"™.

Similarity. We have previously reported the development of an algorithm for
determining the degree of similarity between two chemical structures. Although
that algorithm was mathematically satisfying in that s=1.0 only when the two
structures were identical, it was time consuming to calculate. We have now
developed a second algorithm, which is more empirical, but very rapidly computed.
This second algorithm has been compared with the first on many examples and is
found to be quite good for finding when two structures are synthetically similar.
Both algorithms take into account atom types, bond types, stereochemistry,
functional groups, rings, etc. Papers describing these functions are in draft
form soon to be submitted for publication.

Currently the similarity module requires a special version of SECS. We

plan in the next year to incorporate this module into the standard version of
SECS so that the bonds that tf broken could lead to identical or similar

75 E. A. Feigenbaum