DETAILED PROGRESS REPORT Section 1.3.2.5

1.3.2.5 SYSTEM RELIABILITY AND BACKUP

System reliability has remained high over the past years; excellent under
stable hardware and software conditions and degrading temporarily during
debugging and development periods and during periods of difficult hardware
problems. In general we take the system down for approximately 50 hours per
month for scheduled hardware maintenance, file backup, and other maintenance. In
addition we average from 10 to 15 hours per month in unscheduled downtime.

During particularly difficult hardware or software difficulties we must absorb
substantially more downtime.

1.3.2.6 PROGRAMMING LANGUAGES

Over the past years we or members of the SUMEX-AIM community have continued
to maintain the major languages on the system at current release levels, have
TENEXized several languages to improve efficiency, and have investigated a number
of issues related to the efficiency of programs written in various LISP
implementations and the exportability of prozrams. These issues are becoming
increasingly critical in dealing with AI performance programs which have reached
a level of maturity so that substantial, non-developmental user communities are
growing. The following summarizes general accomplishments and the following
section discusses in detail the work this past year in designing a machine-
independent ALGOL-like system (MAINSAIL).

LISP Efficiency:

There has been an on-going debate among a number of projects over the best
language to choose for developmental implementation of the various AI programs.
The key issues include ease and flexibility of conceptual representation of
program functions and objects, interactive debugging support, efficiency, and
exportability. To date the predominant language choice for AIM research has been
LISP and more particularly INTERLISP. These issues are important because they
influence the time required to develop new AI programs and subsequently the
incremental load placed on the SUMEX machine when in use. We recently attempted
an evaluation of INTERLISP and ILISP ineluding the relative efficiencies of the
two languages and the level of assistance the language systens provide the user
in developing programs. The tests were based on an implementation of a subset of
REDUCE (a symbolic algebra manipulator). The results of several iterations in
program refinement by experts in the respective languages were that the runtimes
for the two versions were quite comparable (far less than the factor of 5-10
disparity predicted by ILISP enthusiasts). A more disquieting result was the
substantial difference in runtimes depending on how particular functions were
coded IN THE SAME LANGUAGE. It is apparent from the results that factors of 10
differences in time can result from a superficial implementation - expert
programming insight is essential to efficient program performance. This is not a
real surprise in that it is true of programming in any language — the problems
may be inereased by such a rich language as INTERLISP with such a wide array of

Privileged Communication 27 J. Lederberg
Section 1.3.2.6 DETAILED PROGRESS REPORT

ways to do the same thing but with little guidance as to the relative costs. It
nas proven very difficult to quantify the "rules" for good programming. Mr.

Masinter and Mr. Phil Jackson attempted to document good INTERLISP programming
habits and issued a bulletin for SUMEX users.

A further impact of these data is that it is very difficult to
Simultaneously develop a new AI program and make the implementation highly
efficient. With the iterations required to develop the conceptual design of the
program, it is difficult to ensure its efficiency. This may lead to the need to
reimplement the program after the basic development stabilizes to increase
efficiency while still accommodating convenient and orderly further development.
such reimplementation may or may not be best done in LISP - this. will depend on
many factors including the nature of the program data structure requirements and
anticipated further development efforts.

MAINSAIL Progress

SUMEX, in its role as a nationally shared computer resource, is an
appropriate vehicle for the development of software unbound by the underlying
machine environment. We have a built-in community of program developers acutely
aware of the significance of providing their work to a broader base of users.
This intersection of hardware capability, software expertise, and dedication to

resource sharing presents a unique opportunity to promote a system designed for
program sharing.

The MAINSAIL (3) project has three closely related goals:

1) Provide an integrated set of tools for the creation of efficient portable
software on a variety of computer systems, and provide support and
continued development of these tools in a form compatible across all
implementations.

2) Study innovative approaches to portability, both hardware and software,
and develop such approaches into effective tools.

3) Promote the development and distribution of portable software, advise and
assist in its design, and evaluate its applicability.

By portable software we mean computer programs which may be executed on a
variety of machines with few, if any, alterations. MAINSAIL itself will provide
the initial example of portable software, since all of the system is written in
the MAINSAIL language except for those parts which are determined by the host
environment (hardware, instruction set, operating system, etc.). Even these
parts are embedded within MAINSATL.

oe ek a tn me em A Ge Sem A A te Se Pe Sm DS nh Om A mnt muh er me tm eee ee em ce mek SA ce ee oe ee ee ene ee cee ee oe ee

(3) The MAINSAIL (MAchine-INdependent SAIL) language is derived from SAIL, a
programming language developed at Stanford University’s Artificial Intelligence
Laboratory. It is not compatible with SAIL, since SAIL was designed for a PDP-10
with TOPS-10, and hence contains machine-~dependencies. However it has retained
the basic attributes of SAIL as an extended ALGOL-like language. A summary of
some of the features of the MAINSATL Language and their relationship to other
languages is given in Appendix III on page 231 (see Book IT).

J. Lederberg 28 Privileged Communication
DETAILED PROGRESS REPORT Section 1.3.2.6

There is a key distinction between MAINSAIL’s approach to portability and
the "classical" approach characterized by languages such as FORTRAN, ALGOL, LISP,
COBOL and BASIC. These languages attempt to adnere to a single syntax standard
which is separately implemented for each different computer system. Invariably
these implementations have differences which preclude the creation of a program
which is accepted by all. It is difficult, if not impossible, to define a
language standard which is unambiguous and at the same time sufficiently
comprehensible to provide the basis for compatible implementations. Furthermore,
many implementors yield to the temptation to provide "enhancements" to the
standard which immediately introduces machine and system dependencies.

MAINSAIL, on the other hand, provides a single system (written primarily in
itself) which is employed at every site. This is made possible by its ability to
compile itself into code for a variety of machines. Only the compiler’s code
generators and the runtime operating-system interfaces need be rewritten for each
implementation. These parts of MAINSAIL are at a level which has already been
defined by the machine-independent parts, and do not affect the language from the
user’s viewpoint. Thus the “language standard" has been reduced to a "semantic
standard" which is surrounded by machine-independent software.

It remains to be seen whether the temptation to augment the language with
machine-dependencies (for purposes of ultimate efficiency or to take advantage of
particular local system features) can be overcome. Herein also lies the biggest
"price" to be paid for exportability. The code emitted from the MAINSAIL
compiler can be (and is, based on tests to date) at least as efficient as that
from many machine-dependent compilers. On the other nand, special machine or
operating system features that cannot be uniformly implemented may provide local
optimizations at the cost of exportability or vice versa. We cannot effectively
measure the extent of this cost at this stage.

DEVELOPMENT APPROACH

We do not underestimate the difficulty in obtaining the cooperation of a
community which will span a wide variety of applications and hardware/software
systems. If MAINSAIL is to obtain widespread use, it is crucial that it have an
effective and credible base of support. The initial parts of MAINSAIL are just
about ready for limited distribution. We want to maintain close supervision of
this distribution, and insure that systems labelled as MAINSAIL are not altered
witnout our approval. In this regard we are pursuing legal channels to safeguard
tne integrity of MAINSAIL software. We plan to take MAINSAIL through an orderly
progression of development, and to avoid casual distribution with no provision
for a solid base of maintenance and future growth.

REVIEW OF PROGRESS TO DATE

MAINSAIL has been under development for almost three years now. Beginning
with an initial goal of converting the PDP-10 SAIL compiler to generate code for
a PDP-11, several versions had been implemented on a PDP-10 and a PDP-11, and the
groundwork had been laid for extending the system to a wider variety of machines.
The current version was begun in August of 19756.

Privileged Communication 29 J. Lederberg
Section 1.3.2.6 DETAILED PROGRESS REPORT

Early versions of MAINSAIL attempted to maintain close compatibility with
the original SAIL, but in surveying a wider variety of machines (especially mini-
computers), we concluded that this compatibility could be maintained only at the
expense of portability. It was felt that MAINSAIL could contribute more by
providing a truly portable system. Thus we began redesigning MAINSAIL,
rebuilding from previous implementations. This effort has resulted in a new
version which is still under development, and is now being tested on several
systems.

Initial implementations of the current design are for DEC PDP-10’s with the
TENEX operating system and with the TOPS-10 operating system. The TENEX version
is being tested at SUMEX and has been installed at one other TENEX site (Stanford
- IMSSS). The TOPS-10 version was developed at SUMEX by using TENEX facilities
which provide compatibility with TOPS-10. The Rutgers University PDP-10 facility
was chosen for external testing since it is a standard TOPS-10 system, and can be
accessed from SUMEX over a network. MAINSAIL is now undergoing preliminary
testing there. A modified TOPS-10 version nas been set up on the Stanford AI-
lab’s PDP-10, but also has not been open to general use.

Little additional work will be necessary to make the TENEX version execute
on a DECSYSTEM-20 since TOPS-20 is derived from TENEX. However, some time will
be needed to take full advantage of the extended instruction set of the KL-10.
Two sites are available for TOPS-20 developnent: the LOTS facility at Stanford;
and a machine at SRI, close to Stanford an¢ accessible over a network. Both of
tnese sites have expressed an interest in using NMAINSAIL.

The PDP~11 has been chosen as the first mini-conputer to be implemented.
Code generators have been written for it but not debugged. Several variants of
these code generators will be necessary to cover the full PDP-11 family.

MAINSAIL interfaces to three PDP-11 operating systems (RT-11, RSX-11 and
UNIX) are now under development. All of these operating systems are available to
the MAINSAIL project on PDP-11°s at Stanford. RT-11 will be the first to be
implemented. The mix of instruction sets, operating systems and configurations
will be a good test of MAINSAIL’s ability to provide a compatible implementation,
even across this one family of computers. we expect the PDP-11 systems to be
operational by this summer.

1.3.2.7 STANFORD AT HANDBOOK PROJECT

The AI Handbook is a compendium of short articles (3-5 pages each) about
the projects, ideas, problems and techniques that make up the field of Artificial
Intelligence. Over 150 articles have been drafted by researchers and students in
the field, on topics ranging in depth from "Ausmented Transaction Networks"
(ATN’s) to "An Overview of Natural Language Research", and covering the entire
breadth of AI research: search, robotics, soeech understanding, real-world
applications, ete. An outline of the current contents of the handbook is given
in Appendix II on page 225 (see Book II).

J. Lederberg 30 Privileged Communication
DETAILED PROGRESS R#PORT Section 1.3.2.7

During the Spring of 1976 tne final push for drafting new articles was
completed, with some 60 articles produced by students during that quarter. Since
then tne process has begun of rewriting the various chapters of the Handbook to
produce coherent manuscripts from the original work of five to ten authors. This
effort involves rewriting articles for accuracy and completeness as well as
integrating the 15 to 25 articles in a section into an editorially uniform and
readable document. An editor has been added to the project team who will be
responsible for maintaining a consistent format and style in the Handbook.

When completed, each chapter will be reviewed by experts in the appropriate
research area before it is released to the public. At present, the chapter on
Natural Language research is completed and being reviewed, and we expect that the
sections on Search, Speech Understanding, Representation of Knowledge, and
Automatic Programming will be completed during the next two months. During the
Fall of 1977 the first seven chapters of the handbook will be published in
preliminary form. Meanwhile, the handbook is already available to cooperative
experts and critics on-line via the SUMEX-AIM network connections. We are
considering maintaining the handbook on-line, with occasional hard-copy editions,
and believe this method of "publication" may be a prototype for other
encyclopedic monographs.

1.3.2.8 USER SOFTWARE AND INTRA-~COMMUNITY COMMUNICATION

In addition to the system and language software development efforts of
SUMEX, we have assembled or developed where necessary a broad range of utilities
and user software. These include operational aids, statistics packages, DEC-
Supplied programs, improvements to the TOPS-10 emulator, text editors, text
search programs, file space management programs, graphics support, a batch
program execution monitor, text formatting and justification assistance, and
magnetic tape conversion aids. We have also developed a number of user
information assistance programs such as a "WHOIS" facility to recover names and
affiliations of users and a "HELP" facility to locate on-line documentation of
interest through key word searches.

Of major importance for our community effort is the set of tools for inter-
user communications. We have enhanced the message sending and manipulation
programs to better integrate text editting facilities for easier message
preparation and reading. We have also developed a unique "bulletin board" system
to deal with informal notes, thereby bridging a functional zap between formal
system documents and private messages communications between individual users.
The bulletin board system provides an informal and dynamic base for information
about system facilities, lore, bugs, etc. or can provide a means for intra-
project communication and coordination.

The system has been in operation for more than one year and has been
exported to IMSSS (Stanford’s other TENEX site) and USC-ECL. We have also
proposed that the next generation of ARPANET information services provide for
bulletin board-like facilities. At SUMEX-AIM there are 10 bulletin boards, 8 of
which are project-specific. The main system bulletin board currently contains
more than 140 bulletins under 85 topics covering system status announcements,

Privileged Communication 31 J. Lederberg
Section 1.3.2.8 DETAILED PROGRESS REPORT

explanations of recent crasnes, hardware troubles and monitor upgrades, new
developments, bugs, and little-documented features of our programming languages
and utilities. Project bulletin boards have been used for notices and minutes of
meetings, references to and abstracts of papers, coordination of on-going
developments, vacation schedules, documentation and announcements of various
kinds.

Current Bulletin Board features include:
Multiple bulletin boards (public, private, general, specific, etec.).
Topics and subtopics (separated by periods) may be nested to any depth.
Expire dates for each bulletin, after which they are removed automatically.

Interest-list-of-topices for each user allows him to be notified about new
bulletins he is interested in and to ignore others.

Users notified when new bulletins arrive, by running BBCHECK (the bulletin-
board MAIL CHECK) or by mail.

Help and browsing facilitated in a variety of ways (? can be typed anywhere,
general and command-specific help provided).

Command structure modelled after the TENESX EXEC, with conscious attention to
human-engineering.

Companion program BBREAD is a bulletin-board R&ADMATL.

Companion program BBNEWS types out a directory listing of any new bulletins.

1.3.2.9 DOCUMENTATION AND EDJCATION

We have spent considerable effort to develop, maintain, and facilitate
access to our documentation so as to accurately reflect available software. The
HELP and Bulletin Board systems have been important in this effort. We have
limited manpower for user assistance. In general, users are responsible for
their own software development and maintenance. The SUMEX staff, however,
(including Lederberg and Rindfleisch) share the responsibilities for system level
assistance to users, tracking down bugs, reviewing user suggestions, ete. The
terminal linking facilities of TENEX have been valuable tools to assist remote
user groups and also for system users to communicate with each other. With the
recent initial release of the MAINSATL system on selected machines, we are
becoming increasingly involved in describing MAINSAIL and advising user projects
in its possible applications.

1.3.2.10 SOFTWARE COMPATIBILITY AND SHARING
At SUMEX-AIM we firmly believe in importing rather than reinventing

software where possible. At SUMEX many avenues exist for sharing between the
system staff, various user projects, other facilities, and vendors. In the past

J. Lederberg 32 Privileged Communication
DETAILED PROGRESS REPORT section 1.3.2.10

without communication networks, the system vendor served as the focal point for
distribution of most software to user sites. Since the process of distributing
tapes (and particularly of handling bug reports and user suggestions) was very
slow, it was common for sites to take a version of a program and then modify and
maintain it locally. This caused a proliferation of home-grown versions of
software. Similar impediments have existed to the dissemination of user
software. User organizations like SHARE and DECUS have helped to overcome these
problems but communication is still cumbersome. The advent of fast and
convenient communication facilities coupling communities of computer facilities
has the potential of making a major difference in facilitating inter-group
cooperation and to lower these barriers.

The TENEX sites on the ARPANET have been interacting increasingly with each
other to develop new software systems. This functions effectively to build
communication around the network and promote a functional division of labor and
expertise. The other major advantage is that as a by-product of the constant
communication about particular software, personal connections between staff
members of the various sites develop. These connections serve to pass general
information about software tools and to encourage the exchange of ideas among the
sites. Certain common problems are now regularly discussed on a multi-site
level. We continue to draw significant amounts of system software from other
ARPANET sites, reciprocating with our own local developments. Interactions have
included mutual backup support, hardware configuration experiments, operating
system enhancements, utility or language software, and user project
collaborations. We have been able to import many new pieces of software and
improvements to existing ones in this way. Examples of imported software include
the message manipulation program MSG, TENEX SATL, TENEX SOS, INTERLISP, the
RECORD program, ARPANET host tables, and many others. Reciprocally, we have
exported our contributions such as the drum page migration system, KI-10 page
table efficiency improvements, GIJ®N enhancements, PUB macro files, the bulletin
board system, SNDMSG enhancements, our BATCH monitor, etc. The most recent

example of this cooperative use of networks is in the preliminary export of
MAINSAIL.

1.3.2.91 RESOURCE MANAGEMENT

PHILOSOPHY OF MANAGEMENT

The tidiest way to administer a national resource would be by subcontract
to a fee-compensated, neutral agent. Tnis would still have to involve a
soverning body that could speak to the technical and quality-control interests of
the served constituency. Appropriate in some circumstances, this model would
separate the administration of a resource from active research and development.
An approach expected to foster greater creativity is to couple the resource with
an active user-center. This of course can lead to manifest conflicts of interest
that must be addressed and avoided if the resource is to be fairly available ona
regional or national basis.

As indicated in the introduction, our proposal for the latter approach was
followed by searching negotiations over a management plan that would be sensitive
to these considerations. The bureaucratic procedures, much as they have to be

Privileged Communication 33 J. Lederberg
Section 1.3.2.11 DETAILED PROGRESS REPORT

spelled out, are almost the last items that need to be specified for such a plan.
Far more important is a charter that spells out the underlying objectives and
responsibilities of the program, and which establishes incentives, resources, and
obligations for proper performance. We believe the plan that was negotiated and
implemented has all of these ingredients, and has made the design of the
procedural framework a matter of simple common-sense logic from these premises.
It will be plain that the convergence of local self-interest, and peer and
contractual responsibility offers the best assurance that the programmatic goals
will be respected, and simplifies the tasks of surveillance and accountability.

The self-interest part of this equation stems from our original motivation
in requesting the resource: the need for specialized computing facilities to
Support intense, interdisciplinary studies in applications of AI at Stanford
University Medical School. Comprising several departments (Genetics, Medicine,
Computer Science and Chemistry), and interwoven projects (e.g., DENDRAL,
Heuristic Programming, MYCIN, MOLGEN) and principal faculty (Professors
Lederberg, Feigenbaum, Djerassi, Cohen, and Buchanan), a substantial body of
research that has progressed and evolved over many years would be sacrificed if
such a resource were not available. Successful, stable collaborations of this
scope are not readily found. This history both depends upon and contributes to
tne doctrine of resource-sharing that underlies the SUMEX-AIM effort.

One premise of the management plan was therefore the charter allocation of
half the user-available capacity of the SUMEX facility to the Stanford complex of
projects, subject to a local committee chaired by Professor Lederberg.

The acceptance of this principle clearly defines the local benefit of the
resource, minimizes anxiety and conflict-of-interest, and en suite enables the
local group to respond quite objectively to the allocations that are made by an
Executive Committee for the "national" or non-Stanford aliquot (see "Executive
and Advisory Committee Organization" below). Another important contribution to
the success of the plan is the welcome participation of an NIH-BRP representative
on the Executive Committee. What would be inappropriate meddling, in the conduct
of a narrower research project funded by NIH, is a communication channel and
source of detached judgment that has been invaluable in expediting the
innumerable decisions about which NIH must and should be consulted in the week-
to-week business of the resource. The efficacy of this principle, as is
appropriate to acknowledge here, has been validated and enhanced by the style and
energy tnat Dr. William Baker has brought to this task.

That the "national" community should se conscientiously cultivated for the
most efficacious use of its aliquot, and that further growth of facilities should
in due course be distributed, are further inferences from the charter principles.

Finally, the recognition in the charter that SUMEX-AIM was not merely a
retail-~store for computer cycles, but the means of building a community, was a
necessary basis for the morale of the whole operation. Some of these matters
were addressed further in the section on SIGNIFICANCE (see Section 1.2 on page
4). The remainder of this section will now speak to the way in which these
responsibilities are handled bureaucratically.

J. Lederberg 34 Privileged Communication
DETAILED PROGRESS REPORT Section 1.3.2.11

ORGANIZATION AND PROCEDURES

The SUMEX-AIM resource is administered within the Genetics Department of
the Stanford University Medical School, Professor Lederberg’s "main office",
though he also holds appointments in the Computer Science Dept. and the Human
Biology program. Its mission, locally and nationally, entails both the
recruitment of appropriate research projects interested in medical ATI
applications and the catalysis of interactions among these groups and the broader
medical community. User projects are separately funded and autonomous in their
management. They are selected for access to SUMEX on the basis of their
scientific and medical merits as well as their commitment to the community goals
of SUMEX. Currently active projects span a broad range of application areas such
as clinical diagnostic consultation, molecular biochemistry, belief systems
modeling, mental function modeling, and instrument data interpretation (see
Section 6 on page 41 in Book II). We have pondered the possibilities of a fee.
for-service approach to allocation of the resource. We believe that this would
be inappropriate for an experimental system of such national scope, whose pricing
structure would have to be revised almost on a week-to-weekx basis to fairly
respond to evolutionary changes in the system. This would also pose problems of
accountability for the transfer of funds from one institution to anotner. Our
present policy of non-monetary allocation control, which we propose to continue
for the next term, of course accentuates our responsibility for the careful
selection of projects with high scientific and community merit.

EXECUTIVE AND ADVISORY COMMITTEE ORGANIZATION

As the SUMEX-AIM project is a multilateral undertaking by its very nature,
we have created several management committees to assist in administering the
various portions of the SUMEX resource. As defined in the SUMEX-AIM management
plan adopted at the time the initial resource grant was awarded, the available
facility capacity is allocated 40% to Stanford Medical School projects, 40% to
national projects, and 20% to common system development and related functions.
Within the Stanford aliquot, Dr. Lederberg has established an advisory committee
to assist him in selecting and allocating resources among projects appropriate to
the SUMEX mission. The current membership of this committee is listed in
Appendix V (see Book II).

For the national community, two committees serve complementary functions.
An Executive Committee oversees the operations of the resource as related to
national users and makes the final decisions on authorizing admission for
projects. It also establishes policies for resource allocation and approves
plans for resource development and augmentation within the national portion of
SUMEX (¢.2., hardware upgrades, MAINSAIL development priorities, ete.). The
Executive Committee oversees the planning and implementation of the AIM Workshop
series currently implemented under Prof. 5S. Amarel of Rutgers University and
assures coordination with other AIM activities as well. Tne committee will play
a key role in assessing the possible need for additional future AIM community
computing resources and in deciding the optimal placement and management of such
facilities. The current membership of the Executive committee is listed in
Appendix V (see Book II).

Privileged Communication 35 J. Lederberg
Section 1.3.2.11 DETAILED PROGRESS REPORT

Reporting to the Executive Committee, an Advisory Group represents the
interests of medical and computer science research relevant to AIM goals. The
Advisory Group serves several functions in advising the Executive Committee; 1)
recruiting appropriate medical/computer science projects, 2) reviewing and
recommending priorities for allocation of resource capacity to specific projects
based on scientific quality and medical relevance, and 3) recommending policies
and development goals for the resource. The current Advisory Group membership is
given in Appendix V (see Book II).

These committees have actively functioned in support of the resource.
Except for the meetings held during the AIM workshops, the committees have met by
telephone conference owing to the size of the groups and to save the time and
expense of personal travel to meet face to face. These telephone meetings, in
conjunction with terminal access to related text materials, have served quite
well in accomplishing the agenda business and facilitate greatly the arrangement
of meetings. Other solicitations of advice requiring review of sizable written
proposals are done by mail.

We will continue to work with the management committees to recruit the
additional high quality projects which can be accommodated and to evolve resource
allocation policies which appropriately reflect assigned priorities and project
needs. We hope to make more generally available information about the various
projects both inside and outside of the community and thereby to promote the
kinds of exchanges exemplified earlier and made possible by network facilities.

NEW PROJECT RECRUITING

The SUMEX-~AIM resource has been announced through a variety of media as
well as by correspondence, contacts of NIH-BRP with a variety of prospective
grantees who use computers, and contacts by our own staff and committee members,
The number of formal projects that have been admitted to SUMEX has more than
doubled since the start of the project; others are working tentatively as pilot
projects or are under review.

We have prepared a variety of materials for the new user ranging from
general information such as is contained in a brochure (see Appendix VI in
Book II) to more detailed information and guidelines for determining whether a
user project is appropriate for the SUMEX-AIM resource. Dr. E. Levinthal has
prepared a questionnaire to assist users seriously considering applying for
access to SUMEX-AIM (see Appendix VII in Book II). Pilot project categories
have been established both within the Stanford and national aliquots of the
facility capacity to assist and encourage projects just formulating possible AIM
proposals pending their application for funding support and in parallel formal
application for access to SUMEX. Pilot projects are approved for access for
limited periods of time after preliminary review by the Stanford or AIM Advisory
Group as appropriate to the origin of the project.

These contacts have sometimes done much more than provide support for
already-formulated programs. For example, Prof. Feigenbaum’s group at Stanford
has initiated a major collaborative effort with Dr. Osborn’s group at the
Institutes of Medical Sciences in San Francisco. This project in "Pulmonary
Function Monitoring and Ventilator Management - PUFF/VM" (see Section 6.4.6 on

J. Lederberg 36 Privileged Communication
DETAILED PROGRESS REPORT Section 1.3.2.11

page 197 in Book II) originated as a pilot request to use MLAB in a small way for
modeling. Subsequently the AL potentialities of this domain were recognized by

Feigenbaum, Nii, and Osborn who have submitted a joint proposal to NIH and have a
pilot status at present.

The following lists the fully authorized projects currently comprising the
SUMEX-AIM community (see Section 6 in Book II for more detailed descriptions).
The nucleus of five projects that were authorized at the initial funding of the
resource in December 1973 are marked by "<*>".

National -

1) Acquisition of Cognitive Procedures (ACT); Dr. J. Anderson (Yale
University)

<*> 2) Higher Mental Functions Project; K. Colby, M.D. (University of California
at Los Angeles)

3) INTERNIST Project; J. Myers, M.D. and Dr. H. Pople (University of
Pittsburgh)

4) Medical Information Systems Laboratory (MISL); J. Wilensky, M.D. and Dr.
B. McCormick (University of Illinois at Chicago Circle)

<*> 5) Rutgers Computers in Biomedicine; Dr. S. Amarel (Rutgers University)
6) Chemical Synthesis Project (SECS); Dr. T. Wipke (University of California

at Santa Cruz)

Stanford -
<*> 1) DENDRAL Project; Drs. C. Djerassi, J. Lederberg, and E. Feigenbaum
2) Large Multi-processor Arrays (HYDROID); Dr. G. Wiederhold

3) Molecular Genetics Project (MOLGEN); Drs. J. Lederberg, E. Feigenbaum, and
N. Martin

<*> 4) MYCIN Project; S. Cohen, M.D. and Dr. B. Buchanan

<*> 5) Protein Structure Modelling; Drs. J. Kraut and S. Freer (University of
California at San Diego) and E. Feigenbaum (Stanford)

As an additional aid to new projects or collaborators with existing
projects, we provide a limited amount of funds for use to support terminals and
communications needs of users without access to such equipment. We are currently
leasing 6 terminals and 4 modems for users as well as 4 foreign exchange lines to
better couple the Rutgers project into the TYMNET and a leased line between
Stanford and U. C. Santa Cruz for the Chemical Syntnesis project.

Privileged Communication 37 J. Lederberg
Section 1.3.2.11 DETAILED PROGRESS REPORT

STANFORD COMMUNITY BUILDING

The Stanford community has undertaken several internal efforts to encourage
interactions and sharing between the projects centered here. Professor
Feigenbaum organized a seminar class with the goal of assembling a handbook of AI
concepts, techniques, and current state-of-the-art. This project has had
enthusiastic support from the students and substantial progress made in preparing
many sections of the handbook as reported earlier. An outline of the material
being prepared can be found in Appendix II on page 225 (see Book II). Several

examples of completed articles are given in Appendix I on page 202 (see Book
II).

A second comnunity-building effort was a mini-conference on AI held at
Stanford in January 1976. This 3 day series of meetings featured presentations
by each of the local projects and comparative discussions of approaches to
current problems in AI research such as knowledge representations, production
system strategies and rule formation, etc. Weekly informal lunch meetings
(SIGLUNCH) are also held between community members to discuss general AI topics,
concerns and progress of individual projects, or system problems as appropriate
as well as having a number of outside invited speakers.

AIM WORKSHOP SUPPORT

Tne Rutgers Computers in Biomedicine resource (under Dr. Saul Amarel) has
organized a series of workshops devoted to a range of topics related to
artificial intelligence research, medical needs, and resource sharing policies
Within NIH. Meetings have been held for the past two years at Rutgers and
another is planned for this summer. The SUMBEX facility has acted as a prime
computing base for the workshop demonstrations. We expect to continue this
Support for future workshops. The AIM workshnoos provide much useful information
about the strengths and weaknesses of the performance programs both in terms of
criticisms from other AI projects and in terms of tne needs of practicing medical
people. We plan to continue to use this experience to guide the community
building aspects of SUMEX-AIM.

RESOURCE ALLOCATION POLICIES

As the SUMEX facility has become increasingly loaded, a number of diverse
and conflicting demands have arisen which require controlled allocation of
critical facility resources (file space and central processor time). We have
already spelled out a policy for file space management; an allocation of file
Storage is defined for each authorized project in conjunction with the manazement
committees. This allocation is divided among project members in any way desired
by the individual principal investigators. System allocation enforcement is
implemented by project each week. AS the weekly file dump is done, if the
aggregate space in use by a project is over its allocation, files are archived
from user directories over allocation until tne project is within its allocation.

J. Lederberg 38 Privileged Communication
DETAILED PROGRESS REPORT Section 1.3.2.11

We have recently implemented system scheduling controls to attempt to
maintain the 40:40:20 balance in terms of CPU utilization (see page 18). The
initial complement of user projects justifying the SUMEX resource was centered to
a large extent at Stanford. Over the first term of the SUMEX grant, a
substantial growth in the number of national projects was realized. During the
same time the Stanford group of projects has matured as well and in practice the
4O:40 split between Stanford and non-Stanford projects is not ideally realized
(see Figure 8 on page 43 and the tables of recent project usage on page 45).

Our job scheduling controls bias the allocation of CPU time based on percent time
consumed relative to the time allocated over the 40:40:20 community split. The
controls are "soft" however in that they do not waste computer cycles if users
below their allocated percentages are not on the system to consume the cycles.
The operating disparity in CPU use to date reflects a substantial difference in
demand between the Stanford community and the developing national projects,
rather than inequity of access. For example, the Stanford utilization is spread
over a large part of the 24-hour cycle, while national-AIM users tend to be more
sensitive to local prime-time constraints. (The 3-hour time-zone phase shift
across the continent is of substantial help in load-balancing.) For the present,
we propose to continue our policy of "soft" allocation enforcement for the fair
split of resource capacity. If necessary to assure proper apportionment, we can
implement a pie-slice reservation system to more rigidly control the allocations.

Our system also categorizes users in terms of access privileges. These
comprise fully authorized users, pilot projects, guests, and network visitors in
descending order of system capabilities. We want to encourage bona fide medical
and health research people to experiment witn the various programs available with
a minimum of red tape while not allowing unauthenticated users to bypass the
advisory group screening procedures by coming on as guests. So far we have had
relatively little abuse compared to what other network sites have experienced,
perhaps on account of the personal attention that senior staff gives to the logon
records, and to other security measures. However, the experience of most other
conputer managers behooves us to be cautious about being as wide-open as might be
preferred for informal service to pilot efforts and demonstrations. We will
continue developing this mechanism in conjunction with management committee
policy decisions.

Privileged Communication 39 J. Lederberg
section 1.3.2.12 DETAILED PROGRESS REPORT

1.3.2.12 SUMMARY OF RESOURCE USAGE

 

Tne following data give an overview of SUMEX-AIM resource usage. There are
five sub-sections containing data respectively for 1) monthly CPU time consumed,
2) resource usage by community (AIM and Stanford), 3) resource usage by project,
4) recent diurnal loading data, and 5) Network usage data.

MONTHLY CPU TIME CONSUMED

600;

500;

4001

300,

CPU Time Used (Hrs)

200;

1004

 

 

at, de be 4 Seen faranmnafevemande

0

efemrape
* t

ASONDJFMAMJIJIJJASONDJIFMAMIJIJASONDJIFMAMJI J
1974 1975 1976 1977

Figure 7. Monthly CPU Time Consumed

J. Lederberg 40 Privileged Communication
DETAILED PROGRESS REPORT Section 1.3.2.12

RELATIVE SYSTEM LOADING BY COMMUNITY

The SUMEX resource is divided, for administrative purposes, into 3 major
communities: user projects based at the Stanford Medical School, user projects
based outside of Stanford (national AIM projects), and common systems development
efforts. As defined in the resource management plan approved by BRP at the start
of the project, the available resource in terms of CPU capacity and file space
will be divided between these communities as follows:

Stanford KOS
AIM 403
staff 20%

The "available" resources to be divided up in this way are those remaining after
various monitor and community-wide functions are accounted for. These include
such things as job scheduling, overhead, network service, file space for
subsystems and documentation, ete.

The monthly usage of CPU and file space resources for each of these three
communities relative to their respective aliquots is shown in the plots in Figure
8 and Figure 9. It is clear that the Stanford projects have held an edge in
system usage despite our efforts at resource allocation and the substantial
voluntary efforts by the Stanford community to utilize non-prime hours. This
reflects the development of the Stanford group of projects relative to those
getting started on the national side and has correspondingly accounted for much
of the progress in AI program development to date.

reivilteged Communication 44 J. Lederberz
. oO
Section 1.3.2.12

 

DETAILED PROGRESS REPORT

 

 

 

 

 

HO} National AIM
yg
o
a
5
D
ay
Oo
4 a
aa
5
<
Se
oO
ad
hte mines pf frsesfntenfenff fener fneenfeeefennfnnen ee pp
ASONDIJIFMAMIJTASONDIFMAMIJTJASONDJIFMAMIJIG
1974 1975 1976 1977
hoy Stanford
og
a
wn
D
D
Ay
oO
et +
“|
S
<
WH
a
ae
patter ener ff frejernenfnrfenenenfnfnenfisnc fee p nef freemen nena fanart
ASONDJFMAMJIJJASONDIFMAMIJJASONDJIFMAMJ QJ
1974 1975 1976 1977
20+ System Staff
g
a
n
5D
>
ay
oO
i .
“d
5
<
a4
°
xg
met einen tpt ne frp fern neff namesfronpoemnijeomataceen pean farnnfenenfenmefenimhe
ASONDJFMAMIJASONDJIFMAMIJASONDJIFMASNMJ J
1974 1975 1975 1977
Figure 8. CPU Usage by Community

J. Lederberg

42 Privileged Communication
DETAILED PROGRESS REPORT Section 1.3.2.12

40+ National AIM

 

 

9
a
a
Dp
Y
v
a
A.
wn
x
A
5
<<
MH
°
Be

Om maenrfenfnn fff fran feenf ff fnfemnfnnen fen t+

ASONDJFMAMJIJTJASONDJIFMAMIJASONDJFMAMJ J

1974 1975 1976 1977

40+ Stanford

% of Avail. Space Used

 

Otro tmnt fener ent potr fee ff
ASONDJFMAMJIJASONDIJFMAMJIJASONDJIFMAMJ QJ
1974 1975 1976 1977

 

 

20+ System Staff
os
o
wn
D
@
oO
cs
jar
wn
oI
“
5
<
ay
oO
* Otten taper feenrinr ren omennmsfnnefejeb fnttefe feet frfanee fo fof
ASONDJIFMAMJJASONDJIJIFMAMJIJASONDJIFMAMY GQ
1974 1975 1976 1977
Figure 9. File Space Usage by Community
Privileged Communication 43 J. Lederberg
DETAILED PROGRESS REPORT Section 1.3.2.42
INDIVIDUAL PROJECT AND COMMUNITY USAGE

The table following shows cumulative resource usage by project in the past
grant year. The data displayed include a description of the operational funding
sources (outside of SUMEX-supplied computing resources) for currently active
projects, total CPU consumption by project (Hours), total terminal connect time
by project (Hours), and average file space in use by project (Pages, 1 page = 512
computer words). These data were accumulated for each project for the months
between May 1976 and April 1977. Again the well developed use of the resource by
the Stanford community can be seen. It should be noted that the Stanford
projects have voluntarily shifted a substantial part of their development work to
non-prime time hours which is not shown in these cumulative data. It should also
be noted that a significant part of the DENDRAL and MYCIN efforts, here charged
to the Stanford aliquot, support development efforts dedicated to national
community access to these systems. The actual demonstration and use of these
programs by extramural users is charged to the national community in the "AIM
USERS" category, however.

Privileged Communication 5 J. Lederberg
Section 1.3.2.12

STANFORD COMMUNITY

1)

2)

3)

4)

5)

6)

7)

J.

RESOURCE USE BY INDIVIDUAL PROJECT

CPU
(Hours)

DENDRAL PROJSCT 1181.

"Resource Related Research
Computers and Chemistry"

NIH RR~006 12-08

(3 yrs. 1977-80)

ARPA DAHC-15-7 3-C-0435

(2 yrs. 1977-79)

HYDROID PROJECT HO.

"Distributed Processing
and Problem Solving"
ARPA DAHC-15-7 3-C-0435

MOLGEN PROJECT 85
NSF MCS75~11649
NSF MCS76-11935
(2 yrs. 1976-78)

MYCIN PROJECT 410
"Computer-based Consult.
in Clin. Therapeutics"
HEW HS-01544 (2 yrs. 1977-79)
NSF (2 yrs. 1977-79)

PROTEIN STRUCT MODELING 159
“Heuristic Comp. Applied

to Prot. Crystallog."

NSF DCR 74-23451

(2 yrs. 1977-79)

ARPA DAHC 15-73-C-0435

ATHANDBOOK PROJECT 26

PILOT PROJECTS 327
{see reports in

Section 6.3 in

Book ITI)

COMMUNITY TOTALS 2232.

Lederberg

64

61

37

890

46

-67

46

CONNECT
(Hours)

19657.

5540

2394,

56

49

+73

“75

19

4O4.42

5919.

DETAILED PROGRESS REPORT

FILE SPACE
(Pages)

13058

239

1853

6688

2477

639
3506

Privileged Communication
DETAILED PROGRESS REPORT

NATIONAL AIM COMMUNITY

1)

2)

3)

4)

5)

6)

7)

8)

9)

ACT PROJECT 57.02
“Acquisition of
Cognitive Procedures"
NIMH MH29353
ONR NOO14-77-6-0242

HIGHER MENTAL FUNCTIONS 206 .03
"Computer Models in
Psychiatry and Psychother."
NIH MH-27132-02 (2 yrs.)
UCLA NPI Gen. Res.

INTERNIST PROJECT 205.20
(DIALOG)
"Computer Model of
Diagnostic Logic"
BHRD MB-00144-03 (3 yrs.)

MISL PROJECT 9.27
"Medical Information

Systems Laboratory"
US-PHS-MBO0114-03 (3 yrs.)

RUTGERS PROJECT 139.63
“Computers in Biomedicine"

NIH RR-00643-05 (3 yrs.)

SECS PROJECT 308 .96
"Chemical Synthesis"

AIM PILOT PROJECTS 40.91
(see reports in

Section 6.4 in

Book IT)
AIM Administration 11.13

AIM Users 56.89

owe eee

COMMUNITY TOTALS 1035.04

Privileged Communication NT

1195 .84

2680.16

2721.26

389 .05

2433 43

4374.03

1326 .56

383.22

672.35

16166.990

Section 1.3.2.12

986

2198

3535

876

10862

4515

1558

J. Lederberg
Section 1.3.2.12 DETAILED PROGRESS REPORT

SUMEX STAFF AND SYSTEM

1) Staff 9903.07 23198 .86 11919
2) Miscellaneous 80.87 _ 2508.98 1721
3) Operations 1505.50 §3113.94 32382
COMMUNITY TOTALS 2489 .44 88321.78 46022
RESOURCE TOTALS 5757 45 143977 .15 101136
J. Lederberg 48 Privileged Communication
DETAILED PROGRESS REPORT Section 1.3.2.12

SYSTEM DIURNAL LOADING VARIATIONS

The following figures give a picture of the recent variations in diurnal
SUMEX system load, taken during March 1977. The plots include:

Figure 10 ~ Total number of jobs logged in to the systen

Figure 11

Percent of total CPU time used by logged in jobs (maximum is 200%
for dual processor capacity)

Figure 12 —- Percent of total CPU time consumed as overhead; I/O wait, core
management, scheduling, ete. (maximum = 200%)

Figure 13  ~ Balance set size (number of jobs in core)

Figure 14 -— Number of runnable jobs (whether or not in core)

The abscissa for these plots is broken into 20 minute intervals throughout
the day. The ordinate for each interval is the average of all the daily
measurements for that interval over the weekdays during March 1977. A daily
measurement for a given 20 minute interval is in turn an average of the
appropriate statistic sampled every 10 seconds. Since these plots display
overall average data, they give representative illustration of the general
characteristics of diurnal loading. There are, of course, substantial
fluctuations in the quantities measured from day to day as well and for some,
also on time scales shorter than the intervals displayed in the figures. For
example in Figure 14, the number of runnable jobs (equivalent to the system "load
average") shows a fairly smooth curve peaking at 6.7 jobs. On both a scale of
minutes and from day to day, however, the number of runnable jobs will vary from
only a few to 12 or more. This fluctuation is not shown in these average plats
but also plays a role in the responsiveness of the system.

In the heading of each plot are shown range statistics for the measurement
over various parts of the day. Range data include the mininum value "Low",
average value "Ave", and maximum value "High". The first line of the heading
gives the range over the whole day and on succeeding lines, "Prime Time" covers
6:00-18:00 Pacific time and "Non Prime Time" covers the remaining night time
hours.

It can be noted in Figure 12 that the current overhead level for the dual
processor system is quite high (about 33% per processor). This is because of the
limited memory size (256K words) we currently have and the resulting increase in
Swapping interrupt rate and 1/0 wait time. We have a proposal pending with the
AIM Executive committee to augment our memory which should reduce this overhead
down to our earlier single processor levels (about 15-20% per processor).

Privileged Communication 49 J. Lederberg
Section 1.3.2.12 DETAILED PROGRESS REPORT

Figure 10. Average Diurnal Loading (3/77): Total Number of Jobs

50-1 Total Day (Low= 13.2, Ave= 23.7, High= 37.2)
| Prime Time (Low= 13.3, Ave= 28.4, Highs 37.2)
Non Prime Time (Low= 13.2, Ave= 17.9, High= 22.7)

1

'

(

|

1

I

eaa

{ 28000089003909908

i G8 2CGE9E9RGa0a80G00e0

i 80a8a8aaeaaaeeeseaaaagaea

i GC 22002900 99GAG0890ARR0GGaa

! GOA DG0GeCG2EeG0RRe0G90098099

i CC BCARSAGRRARACREAGAGACRAaOAARA @9890ea

i C8OREIIAIIEATAGAPABGASIAOAQA IRI BAAIAARAIAARAAABAA

i 69989809000 290920GdG AG AO2AIRA aA RADA RA aOARAIAsaRBABAA

| @@aegaae €830 CeSGCR CSAs ee dsedeaeaaaaeadeszagaaseReRsacsaargaaeaga

~ | 680209800000 000Aa Beas adaIIAAAaAAEARGASE GEG aaRaIAAGARAA 060020090000000aa

| @9@800a0aaGaaGeeaaaa GG00CBR999E0GRRE RAGE EAAARNOERSARBARGG008080G0990000000
| G@@9OGGDGAGIIGOSOICBOAGASIAAAAGAAaAAAAIARaANAAaAAgAagRAAAAD 92992099@a99@a
| C@8GG2OS Ie aaBOaaaaaaaeaRaaaaas 809209280200090099000000900090080998000080
| 0880000920996 000aGR00GARA8 2AARRAAIAGAAAAGRAAAAABSBAaRAARAAGA GGGeg0eaaaga

PAC t----- a prone to—- a fone teneee fem eee ta--=- teen tam aa— tae eee +

TIME 0 2 4 6 8 10 12 14 16 18 20 22 24

m

DM D

Figure 11. Average Diurnal Loading (3/77): Percent Time Used

200-} Total Day (Low= 39.2, Ave= 92.6, High= 133.5)
\ Prime Time (Low= 39.2, Aves 104.3, High= 133.5)
| Non Prime Time (Low= 48.5, Ave= 78.1, High= 117.5)
|
i
=|
i
| @€2098@ @2@3a @
i SC@G293804 8A AGa9086008 é
i 0@2900999890020080808988008 €@ @
=| 032293980000 063009390099380a0 @€0889308a
| GC GAIIAIAGAAAADAEARAIESIAAGARA @ @29@3000@
i@ @@ @ 9990909989290998088900000008009808 @9900998aa00€
128 @@ @8 a 08088020 00929994209AR 2082029 aGaIaAOABAAAARAEAAARAAD
1@8@8edae ae 08008999 982903985999003909990900990009089000 90800000
~| 220 229000aag 000 G80 9808 98G99909G99999039099009 909309990398 909000003008
| 880200000809008300000004990800008099809000900009009890000998890030098008
| 8099094909000009090000000000009909009090900000899000990009908008 90090000
| 2€898980800900000906000000098009000090908099890990009899900000008080900€
| 22000902 008008999080809I0000999939000000000000089999990000998 08080008908
PAC +-~---~ ++-—-- $-- aH poe eee tare n- $o--~— pone tao---- +----- +----- tome n en +
TIME 0 2 4 6 8 10 le V4 16 18 20 22 24
J. Lederberg 50 Privileged Communication