Technical Progress Section 2.1.2.5 2.1.2.5 Core Research Over the past year we have supported several core research activities aimed at developing information resources, basic AI research, and tools of general interest to the SUMEX-AIM community. Specific areas of current effort include: 1) The AI Handbook which is a compendium of knowledge about the field of Artificial Intelligence being compiled by students and investigators at several research facilities across the nation. The handbook is broad in scope, covering all of the important ideas, techniques, and systems developed during 20 years of research in AI in a series of articles. Each is about four pages long and is a description written for non-Al specialists and students of AI. The AI Handbook effort is described in more detail in Section 4.2.1 on page 130 and an outline of the current contents of the handbook can be found in Appendix I. 2) The AGE project which is an attempt to isolate inference, control, and representation techniques from previously developed knowledge-based programs; reprogram them for domain independence; write a rule-based interface that will help a user understand what the package offers and how to use the modules; and make the package available to other members of the AIM community. A more detailed description of progress on the AGE package can be found in Section 4.2.2 on page 133. 3) The MAINSAIL project which is an attempt to demonstrate the design of an ALGOL-like language system which facilitates software transportability between different machine/operating system environments. A final report on this effort is given below. It should be noted that SUMEX is providing only partial support for the AI Handbook and the AGE projects with complementary support coming from an ARPA contract to the Heuristic Programming Project. MAINSAIL System for Software Iransportability At the end of this grant year the MAINSAIL project will have successfully designed, demonstrated, and documented an ALGOL-like language system for machine- independent software design. This system includes the compiler, code generators, and run-time support for a range of target machine environments including TENEX, TOPS-20, TOPS-10, RT-11, RSX-11, and UNIX. The designs for other environments have been studied but resources have not allowed more extensive implementations. Within Council-approved funding and manpower limits and the AI charter of the SUMEX resource, we do not have access to the more extensive resources that would be required to continue effective development and export of this system beyond this initial research and demonstration phase. We are hopeful that the principal individuals involved (Messrs. Wilcox and Jirak and Ms. Dageforde) will be successful in forming a small private company to support and continue develapment of MAINSAIL with independent funding from a growing group of potential users. The following is a final summary on this demonstration phase of the MAINSAIL effort. A detailed final report is in preparation. The primary effort during the past year has been directed at making MAINSAIL a stable, maintainable, complete system ready for distribution and E. A. Feigenbaum 26 Section 2.1.2.5 Technical Progress serious production programming. Implementations developed in prior years have been improved and new ones added. The number of evaluating users has increased, as well as the number of applications programs written in MAINSAIL. The project is now at the point where new implementations can be undertaken, and the groundwork for portability which has been laid over the previous years can now begin to really show its strength. The compiler has undergone major examination, improvement, and reduction in size of data structures, and as a result is now able to run on machines with small address spaces (e.g., 32K words). The language itself has remained stable. The runtime system has not undergone any major modifications since September. Distribution of MAINSAIL beyond its initial test sites has begun. Due to the increasing size of the MAINSAIL user community the need for user support has also increased significantly. As part of our effort to evaluate MAINSAIL's effectiveness in actual applications we have provided user consultation within available resources but we have been limited in the amount of help we could actually provide while continuing active development efforts. A research project based on MAINSAIL is underway, aimed at providing an efficient program execution and development environment on a high-level language “MAINSAIL machine" which directly executes a tailor-made MAINSAIL instruction set. a) Implementations The PDP-10 TENEX version of MAINSAIL has now been in use for about three years at two local sites. A version for a somewhat non-standard TOPS-10 has been used locally to a lesser extent for two years. Standard TOPS-10 was implemented about a year and a half ago and received a moderate amount of use at a remote site. This year standard TOPS-10 was sent to four new sites, including the NIH DCRT Computer Facility. A TOPS-20 implementation was derived from the TENEX implementation during the past year. The TOPS-20 version is not yet complete in that it is simply a TENEX implementation with a few minor modifications. Utilization of features of the KL-processor instruction set and proper handling of structures in file names have not yet been implemented, but are relatively straightforward additions. This version of the TOPS-20 implementation is now undergoing evaluation at a number of sites, and is beginning to be the most requested version of MAINSAIL. Due to the interest in using MAINSAIL on machines with small address Spaces, substantial development work was done during this past year on PDP-11 implementations. On many minicomputer configurations the limited address space can have an adverse effect on the performance of large programs. Whenever a working set of modules cannot be contained in primary memory the system begins to exhibit the classic thrashing condition. Since modules are normally swapped from disk the attendant I/0 overhead seriously degrades the program's performance. 27 E. A. Feigenbaum Technical Progress Section 2.1.2.5 Some minicomputers have additional memory which is not directly addressable. Typically, this memory can be accessed only by changing a hardware relocation device. A portable caching algorithm has been developed to allon MAINSAIL to take advantage of such memory to reduce the effects of thrashing. The additional memory is used in a two-tier storage hierarchy with the disk. Since access to the additional memory is much faster than access to the disk, Swapping from the additional memory takes less time than swapping from the disk. MAINSAIL modules are maintained in the memory cache in a most-recently-used fashion. When the cache fills up, the least recently used module is bumped from the memory cache. The modular design of MAINSAIL made this caching approach quite natural, and holds much promise for further utilization of nemory hierarchies. A PDP-11740 running the RT-1!1 operating system has been running MAINSAIL programs for two years. The RSX-11M operating-system interface is complete. A PDP-11/34 running RSX-11M was used as our main testing site during the development of the compiler running in a small address space. The operating-system interface for UNIX is also complete. A few MAINSAIL utility programs have been run on a PDP-11/34 using UNIX and there are no outstanding problems. This implementation requires further testing. The runtime system has been run on a standard PDP-11/03 with DEC floppy disks. This was purely a demonstration effort for two main reasons: 1) the floppies are extremely slow, and 2) their storage capacity is insufficient for holding anything other than simple programs, since most of the storage is taken up by the operating system, its utilities, and the MAINSAIL runtimes. The runtime system and the compiler have been used on a number of LSI-11 configurations. These configurations had either dual density non-DEC floppy disks or an RK equivalent hard disk pack. Some of these machines had an additional 32K words of video memory which MAINSAIL utilized as a module cache. Prior to the demonstration of the compiler under RSX-11M, the fastest POP-11 compilation on record occurred on an LSI-11 with video memory and an RK type disk. The operating system interfaces, once written, have caused few problems. There have been two major sources of difficulty in implementing for the PDP-11: 1) The porting of data between machines is often difficult. We are hampered by the availability of compatible Peripherals. For instance, our primary RT~11 development machine has no Magnetic tape nor floppy disks. It can communicate with other PDP-11's only by exchanging RK-type disk packs or with SUMEX over a 2400 baud terminal line. 2) POP-11 code generation is non-trivial, and a number of bugs were discovered. This can be contrasted with the POP-10 code generators, which have caused almost no problems because of the richness of the PDP~ 10 instruction set, and the ample word size. The PDP-11 code generation problems have been decreasing in frequency. The demonstration of the compiler on the PDP-11 has increased confidence in the code generators though floating point code generation is only now beginning to undergo extensive testing. E. A. Feigenbaum 28 Section 2.1.2.5 Technical Progress The problems of data exchange have proven substantial. It is difficult to formulate a general strategy due to the diverse file systems which are encountered. These problems are often underestimated, resulting in unexpected delays in the development of portable systems. A number of groups are interested in the development of other MAINSAIL implementations, including ones for the VAX, ECLIPSE and TI-990 computers. We hope to start work on these implementations this summer through the private company being formed. b) Distribution and Use The distribution beyond Stanford of PDP-10 versions of MAINSAIL was begun this year. MAINSAIL has been implemented at sites on the Arpanet, and ported via magnetic tape to other locations. All sites have been able to run MAINSAIL as soon as the files are taken off the tape. This is in contrast to the typical hardware manufacturer's software, which often takes days, or even weeks, to make executable. There are currently three sites using the TENEX version, six using the TOPS-10 version, and five using the TOPS-20 version. Extensive work was done this year on various PDP-11 configurations, and it iS now beginning to be exported beyond the test sites here at Stanford. One user has been writing MAINSAIL programs for two years and running them on his POP-11740. These include a 3-dimensional graphics package, a code optimizer for the DEC VT-11 display processor, a flow rate monitor for a cell- sorter connected to the PDP-11, and machine-independent arbitrary-precision arithmetic routines. A geophysics group is writing MAINSAIL programs to extract data from ERTS tapes and to then perform a variety of image analyses on the extracted data. Another user is writing a machine-independent interprocessor communications facility. A sampling of other MAINSAIL programs developed during the past year include a machine-independent tape transfer program; a program which compares two text files and prints out the differences on a per-line basis; a program which forms a new text file from selected pages of existing text files; a "conference" program enabling more than two people at once to carry out an on-line discussion; a "calculator" program; a record i/o package, which, given a pointer to a record, will print out the values of the fields of the record. Work has also begun on a portable text editor. A number of sites are now evaluating MAINSAIL with the intent of using it for substantial product development. In most cases, the sites are primarily attracted by MAINSAIL’s portability, since there has been no other language which previously played this role while at the same time providing a rich programming environment. 23 E. A. Feigenbaum Technical Progress Section 2.1.2.5 c) Compiler Desian A detailed analysis was made of the compiler, its algorithms and use of data structures. The goal was to reduce the size and number of data structures to allow the compiler to fit on machines with small address spaces, without sacrificing too much efficiency. Once the compiler was able to fit, the next goal was to improve efficiency and reduce compilation time. First, an analysis of the compiler and of the runtime system uncovered some inefficiencies which were corrected. Next, various compiler configurations were examined. By configuration is meant the various ways in which procedures can be combined into modules. On a machine with a large address space the compiler is most efficient if it consists of a few modules, since that reduces the number of intermodule calls. But ona machine with a small address space, module swapping is necessary, and compilation time is roughly proportional to the number of swaps. We wanted to determine whether a "better" configuration (one which required less swapping) than that of the existing compiler could be found. A MAINSAIL program was written to simulate compilation on machines with various address spaces. The simulation was driven by exact data, obtained from traces of all procedure calls made during given compilations. A format was devised for easily specifying potential compiler configurations, and the simulator tested their efficiency. The resulting data showed that curves plotting amount of memory versus number of swaps are smoothly exponential. Examination of this data indicates that for 32K machines, another 10K would cut the number of module swaps in half, thus greatly increasing compilation speed. As a result, two configurations are now in use: a "big" configuration to be run on machines with “large” address spaces, and a “smal]" configuration to be run on those with small address spaces. As predicted by the simulation, use of the "“optimal™ small configuration significantly increased the compilation speed on the PDP-11. Use of the "big" configuration, with just a few large modules, also improved the compiler speed on the PDP-10. A new approach to code generation has been introduced over the past year. It utilizes tree structures for the intermediate representation, rather than the more primitive triples or quadruples. A tree structure is built for each procedure, and code is generated by Walking the tree. This new approach will probably be used in all future code generators since it allows for procedure-wide optimization, and also supports the debugging version described later. d)} Language Design The language itself has been very stable this past year, undergoing only a few simple additions. The fact that it has remained stable while supporting the past year of development is convincing evidence that the language has matured to the point of commercial viability. The ability to access certain fields of the array descriptor for an array was added. These fields tell the name and bounds of the array. Similarly, the name of a file can be accessed via a pointer to the file descriptor. E. A. Feigenbaum 30 Section 2.1.2.5 Technical Progress MAINSAIL originally guaranteed ASCII character codes. Last year, for portability reasons, it was decided that MAINSAIL would no longer specify the exact character set used, but only that minimal assumptions would be made about the character set. A number of system procedures were added to complement the guaranteed character set assumptions. e) Runtime Design At the time of the last annual report, a new runtime system, oriented toward execution efficiency and less memory utilization, was under implementation. It has been very stable since its completion in September 1978. Some examples of further improvements are: 1) the ability to have a map of memory printed, showing the number of pages used for control space, data space, and buffers, 2) tuning to the garbage collection facility, and 3) a new response that can be made to an error message will cause the printing of a table listing the procedure calls that led up to the call to the error message routine. The concept of a "module library" has also been introduced. The output of each compilation (after assembly) is an executable file. when a program consists of a large number of medules, it quickly becomes inconvenient (Cif not impossible) to have a separate file for each executable module. "Module libraries", bulk repositories for modules, were designed to solve this problem. A utility module was written to provide the necessary management functions, as were procedures to insert and delete library files from a runtime list of libraries maintained by the MAINSAIL system. The MAINSAIL runtimes themselves, along with the compiler modules, now reside in module libraries. f) Emulation Research MAINSAIL is being used as the basis of research into a language-oriented approach to program representation and execution. Such an approach starts with language characteristics, which determine program representation Cinstruction set) and execution environment, which in turn determine the processor architecture. This is in contrast to the conventional machine-coriented approach in which the instruction set and processor architecture exist independently of the language, and hence dictate the representation and limit the execution environment. As technology provides increasing flexibility in machine design, high-level-language processors provide an alternative to general-purpose machine- language processors. The MAINSAIL compiler, with its retargetable code generators and large body of machine-independent software, is an ideal basis for this study. A comprehensive study jis being made of the static and dynamic characteristics of MAINSAIL programs. Based on this study, a number of language representations are obtained by varying two primary design criteria: the nature of an operand and the encoding of the instruction stream. The resulting representations range from a stream of bit-aligned fields which directly reflect the source language structure, to a sequence of simple instructions with highly-constrained operands. Code generators, as well as instruction-set interpreters, have been developed for a number of such representations. Machine architectures which provide efficient implementations for these representations are also under 31 E. A. Feigenbaum Technical Progress Section 2.1.2.5 exploration. The goal is to provide an extremely efficient MAINSAIL processor from the standpoint of program execution time as well as program development time. Such a processor should be viewed as a “language processor" rather than a general-purpose processor since it is designed explicitly for the purpose of executing a single high-level language. A language processor can be used either as a stand-alone system which serves a single user, or as a component in a larger system consisting of many language processors (which need not all support the same language) that are assigned to appropriate user programs under control] of an executive processor. A MAINSAIL debugger based on this research is operational, though it has not been released for general use. This debugger involves an interpreter for a MAINSAIL instruction set (called “s-code", for structured code) which so closely captures the structure of MAINSAIL that it can be "decompiled" into what is essentially the source text, including the original variable names. The code generator for s-code utilizes the new tree-structured intermediate code, which is unbiased with regard to the form of the target code. The mode of operation on a conventional computer invalves compilation of those modules which are to be debugged into s-code. These s-code modules may be freely mixed with native code modules (e.g., modules compiled into the PDP-10 instruction set). During execution, MAINSAIL automatically determines when an s-code module is to gain control, and at that point gives contro! to the interpreter. The interpreter allows execution to progress in a manner which directly reflects the source program. The user can single step and place break points on the source-statement level, display and alter the values of variables, and display the decompiled text being executed. A screen-oriented debugger would involve the cursor moving along the displayed text as it was being executed in single-step mode, with the user moving the cursor to points at which break points are to be displayed, or under variables whose values are te be displayed. The current debugger has been designed to support such an approach, but does not yet support this mode of operation. Program execution can be made to halt based on a variety of conditions such as entry to a particular module or procedure; execution of a particular statement; or upon execution of a Specified number of statements since the start of the program. This latter type of break point allows the user to restart a Program which encountered an error, and have it break a specified number of statement executions before the error. Single step operation then allows examination of the execution environment on a statement-by-statement basis up to the point of the error. Whenever the s-code interpreter detects an error (e.g., subscript out of range), it gives control to the debugger, which informs the user what module, procedure and statement caused the error, and displays decompiled text around the statement. The user can then use the full power of the debugger to determine the source of the error. The entire runtime system and compiler can now be interpreted in this fashion. The “MAINSAIL machine" being designed as part of the research will directly execute the s-code representation, i.e., s-code is the (macro) instruction set of the machine. Due to the compactness of s-code (approximately one-third the size of equivalent POP-10 code), and its transparency with respect to the MAINSAIL execution environment, the MAINSAIL machine will provide optimized program execution along with the debugging capabilities. Since s~code E. A. Feigenbaum 32 Section 2.1.2.5 Technical Progress is the instruction set of the MAINSAIL machine, all modules can be decompiled and debugged with no penalty in execution speed. 2.1.2.6 User Software and Intra-Community Communication We have continued to assemble and maintain a broad range of utilities and user support software. These include operational aids, statistics packages, DEC- supplied programs, improvements to the TOPS-10 emulator, text editors, text search programs, file space management programs, graphics support, a batch Program execution monitor, text formatting and justification assistance, and magnetic tape conversion aids. Over the past year we have undertaken several Significant development efforts to provide needed new programs to the SUMEX-AIM community. These include: 1) TTYFTP - A number of users have had the need to move files between their local machines and SUMEX but were not connected to the ARPANET. These include for example the transfer of data between the PUFF project at Pacific Medical Center in San Francisco and SUMEX, distribution of MAINSAIL to various non-network sites, and movement of instrument data in support of the DENDRAL or Ultrasound Imaging (O0b-Gyn) projects. We have undertaken development of a file transfer program usable over any teletype line (hardline, dial-up, TYMNET, etc.) which incorporates appropriate control protocols and error checking. The design is based on the DIALNET protocols designed by Crispin at the Stanford AI Laboratory. Differences from DIALNET were necessary to achieve machine and data source independence. We also expanded the DIALNET packet opcodes to include a new packet (RCT) Which prevents data overruns and augmented the DIALNET "request for connection” packet to contain additional needed parameters. TTYFTP is written in MAINSAIL so that we can take advantage of the machine independence inherent in the language. The program is written modularly, and has a scheduler module which can service up to eight FTP modules per line, one packet processor per line, and multiple lines. Because of this it can run as a either user process, or a server process. The latter can be either a listening server (handling in-coming lines) or a host server, started up by a user program and then logged off after all transfers are complete. We have preserved DIALNET compatibility so that we will be able to communicate to machines running DIALNET. After the TENEX implementation is completed, we will make the changes necessary to connect TENEX to a PDP- 11 RT-11 system and follow that with an RSX-11M version. Since MAINSAIL is up and running under all three of these operating systems, this process is greatly simplified. 2) EMACS - We have continued to import and support the EMACS text editing system from MIT. This editor offers a broader range of services than TVEDIT but has lacked a smoothly human engineered interface. Substantial effort has gone into developing macro packages that improve the human engineering features of EMACS and providing introductory documentation for new users. This has been closely coordinated with similar efforts at SRI and MIT. A community of EMACS users is now developing at SUMEX. 33 E. A. Feigenbaum Technical Progress Section 2.1.2.6 3) ARCHED ~- In order to facilitate management of file archive directories, we have been developing a display-oriented editor to give improved interaction when posting retrieval requests and to allow records of previously archived files to have descriptive comments attached, be expunged (because they are outdated), or be moved into secondary archive directories. Facilities will exist to allow viewing files based on name template specifications or date constraints. We have also made changes and updates to many of the existing programs. While many of these changes were maintenance bug fixes, major efforts were involved to bring up new versions of PASCAL, SAIL BACKUP, MACRO, LINK10, GLOB, PA1050, and a new set of utility routines used by many of the DEC CUSP's. Improvements were made in PUB (a text formatting program), MSG (a message reading program written by J. Vittal), and BBD (the bulletin board reading program developed at SUMEX). Several other new programs are in various stages of being brought up on the system including Knuth's text publication system, TEX; a program to periodically update a news summary file from the AP news service files kept at the Stanford AI Laboratory, APNEWS; a program to connect to the Stanford Center for Information Processing machines, GOTRAN; an improved program to locate users on the SUNEX system and on other ARPANET sites, FIND; and an improved mail facility for GUESTS. 2.1.2.7 Documentation and Education We have spent considerable effort to develop, maintain, and facilitate access to our documentation so as to accurately reflect available software. The HELP and Bulletin Board subsystems have been important in this effort. As subsystems are updated, we generally publish a bulletin or small document describing the changes. As more and more changes occur, it becomes harder and harder for users to track down all of the change pointers. We are in the process of reviewing the existing documentation system again for compatibility with the programs now on line and to integrate changes into the main documents. This will also be done with a view toward developing better tools for maintaining up-to- date documentation. 2.1.2.8 Software Compatibility and Sharing At SUMEX~AIM we firmly believe in importing rather than reinventing software where possible. As noted above, a number of the packages we have brought up are from outside groups. Many avenues exist for sharing between the system staff, various user projects, other facilities, and vendors. The advent of fast and convenient communication facilities coupling communities of computer facilities has made possible effective intergroup cooperation and decentralized maintenance of software packages. The TENEX sites on the ARPANET have been a good model for this kind of exchange based on a functional division of labor and expertise. The other major advantage is that as a by-product of the constant communication about particular software, personal connections between staff members of the various sites develop. These connections serve to pass genera} information about software tools and to encourage the exchange of ideas among the Sites. Certain common problems are now regularly discussed on a multi-site level. We continue to draw significant amounts of system software from other E. A. Feigenbaum 34 Section 2.1.2.8 Technical Progress ARPANET sites, reciprocating with our own Jocal developments. Interactions have included mutual backup support, experience with various hardware configurations, experience with new types of computers and operating systems, designs for local networks, operating system enhancements, utility or language software, and user project collaborations. We have been able to import many new pieces of software and improvements to existing ones in this way. Examples of imported software include the message manipulation program MSG, TENEX SAIL, PASCAL, TENEX SOS, INTERLISP, the RECORD program, ARPANET host tables, and many others. Reciprocally, we have exported our contributions such as the crash analysis program, drum page migration system, KI-10 page table efficiency improvements, GTJFN enhancements, PUB macro files, the bulletin board system, MAINSAIL, SPELL, SNDMSG enhancements, our BATCH monitor, and improved SA-10 software. We have also assisted groups that have interacted with SUMEX user projects get access to software available in our community. For example, Prof. Dreiding’s group in Switzerland became interested in some of the system software available here after attending the DENDRAL CONGEN workshops (see Section 4.2.3 on page 139). We have provided him with the non-licensed programs requested. 35 E. A. Feigenbaum Resource Management Section 2.1.3 2.1.3 Resource Management 2.1.3.1 Organization The SUMEX-AIM resource is administered between the Departments of Genetics and Computer Science of Stanford University. Its mission, locally and nationally, entails both the recruitment of appropriate research projects interested in medical AI applications and the catalysis of interactions among these groups and the broader medical community. User projects are separately funded and autonomous in their management. They are selected for access to SUMEX on the basis of their scientific and medical merits as well as their commitment to the community goals of SUMEX. Currently active projects span a broad range of application areas such as clinical diagnostic consultation, molecular biochemistry, belief systems modeling, mental function modeling, and instrument data interpretation (descriptions of the individual collaborative projects are in Section 4 beginning on page 64). At the end of the last grant year, Professor Lederberg assumed his new role as president of Rockefeller University and Professor Feigenbaum, chairman of the Stanford Department of Computer Science, took over as principal investigator of the SUMEX project. This management transition took place without missing a beat and the SUMEX-AIM community continues to function with the same high level of vitality as before. This is due, in large part, to the depth of Professor Feigenbaum's prior involvement as co-principal investigator and Stanford's multi- disciplinary support of SUMEX-AIM. Professor Lederberg continues an active role in the SUMEX-AIM community as chairman of the AIM Executive Committee and on a more frequent basis through the system message facilities. Professor Stanley Cohen has continued his role on the Stanford SUMEX Advisory Committee and has assumed a new role on the national AIM Executive Committee. He provides biomedical ties and coordination with the Stanford Medical School and projects. 2.1.3.2 Management Committees Since the SUMEX-AIM project is a multilateral undertaking by its very nature, we have created several management committees to assist in administering the various portions of the SUMEX resource. As defined in the SUMEX-AIM management plan adopted at the time the initial resource grant was awarded, the available facility capacity is allocated 40% to Stanford Medical School projects, 40% to national projects, and 20% to common system development and related functions. Within the Stanford aliquot, Prof. Feigenbaum and BRP have established an advisory committee to assist in selecting and allocating resources among projects appropriate to the SUMEX mission. The current membership of this committee is listed in Appendix III. For the national community, two committees serve complementary functions. An Executive Committee oversees the operations of the resource as related to national users and makes the final decisions on authorizing admission for new projects and revalidating continued access for existing projects. It also establishes policies for resource allocation and approves plans for resource E. A. Feigenbaum 36 Section 2.1.3.2 Resource Management development and augmentation within the national portion of SUMEX (e.g., hardware upgrades, significant new development projects, etc.). The Executive Committee oversees the planning and implementation of the AIM Workshop series currently Implemented under Prof. S. Amarel of Rutgers University and assures coordination with other AIM activities as well. The committee will play a key role in assessing the possible need for additional future AIM community computing resources and in deciding the optimal placement and management of such facilities. The current membership of the Executive committee is listed in Appendix III. Reporting to the Executive Committee, an Advisory Group represents the interests of medical and computer science research relevant to AIM goals. The Advisory Group serves several functions in advising the Executive Committee; 1) recruiting appropriate medical/computer science projects, 2) reviewing and recommending priorities for allocation of resource capacity to specific projects based on scientific quality and medical relevance, and 3) recommending policies and development goals for the resource. The current Advisory Group membership is given in Appendix III. These committees have actively functioned in support of the resource. Except for the meetings held during the AIM workshops, the committees have "met" by messages, net-mail, and telephone conference owing to the size of the groups and to save the time and expense of personal travel to meet face to face. The telephone meetings, in conjunction with terminal access to related text materials, have served quite well in accomplishing the agenda business and facilitate greatly the arrangement of meetings. Other solicitations of advice requiring review of sizable written proposals are done by mail. We will continue to work with the management committees to recruit the additional high quality projects which can be accommodated and to evolve resource allocation policies which appropriately reflect assigned priorities and project needs. We will continue to make information available about the various projects both inside and outside of the community and thereby promote the kinds of exchanges exemplified earlier and made possible by netuork facilities. 2.1.3.3 New Project Recruiting The SUMEX-AIM resource has been announced through a variety of media as well as by correspondence, contacts of NIH-BRP with a variety of prospective grantees who use computers, and contacts by our own staff and committee members. The number of formal projects that have been admitted to SUMEX has more than trebled since the start of the project to a current total of 9 national AIM projects and 8 Stanford projects. Others are working tentatively as pilot projects or are under revieu. We have prepared a variety of materials for the new user ranging from general information such as is contained in a SUMEX-AIM overview brochure to more detailed information and guidelines for determining whether a user project is appropriate for the SUMEX-AIM resource. Dr. E. Levinthal has prepared a questionnaire to assist users seriously considering applying for access to SUMEX- 37 E. A. Feigenbaum Resource Management Section 2.1.3.3 AIM. Pilot project categories have been established both within the Stanford and national aliquots of the facility capacity to assist and encourage new projects in formulating possible AIM proposals and pending their application for funding support. Pilot projects are approved for access for limited periods of time after preliminary review by the Stanford or AIM Advisory Group as appropriate to the origin of the project, These contacts have sometimes done much more than provide support for already formulated programs. For example, Prof. Feigenbaum's group at Stanford previously initiated a major collaborative effort with Dr. Osborn's group at the Institutes of Medical Sciences in San Francisco. This project in “Pulmonary Function Monitoring and Ventilator Management - PUFF/VM"™ (see Section 4.1.7 on page 98) originated as a pilot request to use MLAB in a smal] way for modeling. Subsequently the AI potentialities of this domain were recognized by Feigenbaum, Nii, and Osborn and a joint proposal was submitted to and funded by NIH. This past summer John Kunz from Or. Osborn's laboratory spent approximately half time at Stanford to learn more about AI research and to participate more closely in the development of the PUFF/VM program. Similarly, Prof. Feigenbaum and Ms. Nii recently spent two days with Profs. Kintsch and Polson at the University of Colorado, introducing them to the newly developed AGE package for use in formulating their program on modeling aspects of human cognition. The following lists the fully authorized projects currently comprising the SUMEX-AIM community (see Section 4 for more detailed descriptions). The nucleus of five projects that were authorized at the initial funding of the resource in December 1973 are marked by "(i>" and the new Projects admitted this past year by " 3) Hierarchical Models of Human Cognition; Drs. W. Kintsch and P. Polson (University of Colorado) 4) Higher Mental Functions Project; K. Colby, M.D. (University of California at Los Angeles) 5) INTERNIST Project; J. Myers, M.D. and Dr. H. Pople (University of Pittsburgh) 6) Medical Information Systems Laboratory (MISL); M. Goldberg, M.D. and Or. B. McCormick (University of Illinois at Chicago Circle) 7) Pulmonary Function Project (PUFF/VM); J. Osborn, M.D. CInstitutes of Medical Sciences, San Francisco) and Dr. E. Feigenbaum (Stanford University) E. A. Feigenbaum 38 Section 2.1.3.3 Resource Management 8) Rutgers Computers in Biomedicine; Dr. S. Amare] (Rutgers University) 9) Simulation of Comprehension Processes; Drs. J. Greeno and A. Lesgold (University of Pittsburgh) Stanford Community - 1) Al Handbook Project; Or. &. Feigenbaum €i> 2) DENDRAL Project; Ors. C. Djerassi and £. Feigenbaum 3) Generalization of AI Tools (AGE); Or. €. Feigenbaum 4) Large Multi-processor Arrays (HYDROID); Dr. G6. Wiederhold 5) Molecular Genetics Project (MOLGEN); Dr. E. Feigenbaum and L. Kedes, M.D. €i> 6) MYCIN Project; £. H. Shortliffe, M.D. and Or. B. Buchanan 7) Protein Structure Modeling; Ors. E. Feigenbaum and R. Engelmore «n> 8) RX Project; R. Blum, M.D. As an additional aid to new projects or collaborators with existing projects, we provide a limited amount of funds for use to support terminals and communications needs of users without access to such equipment. We are currently Providing support for 6 terminals and 4 modems for users as well as a leased line between Stanford and the University of California at Santa Cruz for the Chemical Synthesis project. 2.1.3.4 Stanford Community Building The Stanford community has undertaken several internal efforts to encourage interactions and sharing between the projects centered here. Professor Feigenbaum organized a project with the goal of assembling a handbook of Al concepts, techniques, and current state-of-the-art. This project has had enthusiastic support from the students and substantial progress made in preparing many sections of the handbook (see Section 4.2.1 on page 130 for more details). Weekly informal lunch meetings (SIGLUNCH) are also held between community members to discuss general AI topics, concerns and progress of individual projects, or system problems as appropriate. In addition, presentations from a substantial number of outside speakers are invited. 39 E. A. Feigenbaum Resource Management Section 2.1.3.5 2.1.3.5 Existing Project Reviews We have conducted a continuing careful revien of on-going SUMEX-AIM projects to maintain a high scientific quality and relevance to our medical AI goals and to maximize the resources available for newly developing applications projects. At the last AIM workshop, meetings of the AIM Advisory Group and Executive Committee were held to review the national AIM projects. These groups recommended continued access for all formal projects then on the system. They also recommended phasing out the Organ Culture pilot project. In the fall of 1978, meetings of the Stanford Advisory Group were held to review projects supported out of the Stanford aliquot. The recommendation of this group was to phase out support for the Hydroid Project, pending work more directly applicable to SUMEX~AIM goals. The group also recommended phasing out the Quantum Chemistry and Genetics Applications pilot projects unless stronger AI relevance were established immediately. The Quantum Chemistry project is attempting to develop ties to the DENDRAL stereochemistry effort. Since Prof. Loew will move to Rockefeller University this summer, her access to SUMEX would come under. the jurisdiction of the AIM Executive Committee and we will ask them to review her application for continued support. The Genetics Application project has acquired their own machine for statistical calculations on genetic demographic data and has stopped using SUMEX. 2.1.3.6 AIM Workshop Support The Rutgers Computers in Biomedicine resource (under Dr. Saul Amarel) has organized a series of workshops devoted to a range of topics related to artificial intelligence research, medical needs, and resource sharing policies within NIH. Meetings have been held for the past several summers at Rutgers. In May 1979, a mini-AIM workshop devoted to clinical diagnosis programs was organized by MIT-Tufts and Rutgers and held in Vermont. This meeting was smal] Cabout 25 attendees) and emphasized detailed technical discussions about system designs and the strengths and weaknesses of various approaches. Many of the attendees were graduate students in order to maximize the benefit of personal contacts and discussions for on-going research projects. Topics covered in the discussions included state-of-the-art in explanation, causality in reasoning, strategies of focusing and dealing with multiple diagnostic problems, issues of representation and grain of description, creating and updating a knowledge base, planning strategies, issues of time representation, and inexact reasoning. The SUMEX facility has served as a communications base for workshop planning and provided support for workshop demonstrations when requested. We expect to continue this support for future workshops. The AIM workshops provide much useful information about the strengths and weaknesses of the performance programs both in terms of criticisms from other AI projects and in terms of the needs of practicing medical people. We plan to continue to use this experience to guide the community building aspects of SUMEX-AIM. E. A. Feigenbaum 40 Section 2.1.3.7 Resource Management 2.1.3.7 Resource Allocation Policies As the SUMEX facility has become increasingly loaded, a number of diverse and conflicting demands have arisen which require controlled allocation of critical facility resources (file space and central processor time). We have already spelled out a policy for file space management; an allocation of file storage is defined for each authorized project in conjunction with the management committees. This allocation is divided among project members in any way desired by the individual principal investigators. System allocation enforcement is implemented by project each week. As the weekly file dump is done, if the aggregate space in use by a project is over its allocation, files are archived from user directories over allocation until the project is within its allocation. We have implemented effective system scheduling controls (see page 16) to attempt to maintain the 40:40:20 balance in terms of CPU utilization and to avoid system and user inefficiencies during overload conditions. The initial complement of user projects justifying the SUMEX resource was centered to a large extent at Stanford. Over the past five years of the SUMEX grant, a substantial growth in the number of national projects was realized. During the same time the Stanford group of projects has matured as well and in practice the 40:40 split between Stanford and non-Stanford projects is not ideally realized (see Figure 11 on page 49 and the tables of recent project usage on page 52). Our job scheduling controls bias the allocation of CPU time based an percent time consumed relative to the time allocated over the 40:40:20 community split. The controls are "soft" however in that they do not waste computer cycles if users below their allocated percentages are not on the system to consume the cycles. The operating disparity in CPU use to date reflects a substantial difference in demand between the Stanford community and the developing national projects, rather than inequity of access. For example, the Stanford utilization is spread over a large part of the 24~hour cycle, while national-AIM users tend to be more sensitive to local prime-time constraints. (The 3-hour time zone phase shift across the continent is of substantial help in load balancing.) Ouring peak times under the new overload controls, the Stanford community stil] experiences mutual contentions and delays while the AIM group has relatively open access to the system. For the present, we propose to continue our policy of "soft" allocation enforcement for the fair split of resource capacity. Our system also categorizes users in terms of access privileges. These comprise fully authorized users, pilot projects, guests, and network visitors in descending order of system capabilities. We want to encourage bona fide medical and health research people to experiment with the various programs available with a minimum of red tape while not allowing unauthenticated users to bypass the advisory group screening procedures by coming on as guests. So far we have had relatively little abuse compared to what other network sites have experienced, perhaps on account of the personal attention that senior staff gives to the logon records, and to other security measures. However, the experience of most other computer managers behooves us to be cautious about being as wide open as might be preferred for informal service to pilot efforts and demonstrations. We will continue developing this mechanism in conjunction with management committee policy decisions. We have also encouraged mature projects to apply for their own machine resources in order to preserve the SUMEX-AIM resource for research and 41 E. A. Feigenbaum Resource Management Section 2.1.3.7 development efforts and to support projects unable to justify their own machines. The DENDRAL project is currently applying for a VAX machine to support their planned development and program export work. This machine would be integrated with the SUMEX resource through the planned local network and would be dedicated to biomolecular structure elucidation problems. At the same time it would give SUMEX resource staff experience with the VAX architecture in anticipation of projected developments within the ARPANET AI community to move toward that machine for INTERLISP support. Other projects may make similar Proposals in the near future. E. A. Feigenbaum 42 Section 2.1.4 Future Plans 2.1.4 Future Plans Our plans for the next grant year are a continuation of the work in progress as discussed earlier. Specific goals are outlined below. Objectives for the individual collaborating projects are discussed in their respective reports (see Section 4 beginning on page 64). 1} RESOURCE OPERATIONS We will continue to make available to the SUMEX-AIM communities an effective, state-of-the-art facility to support the development of medical AI programs and to facilitate collaborations between community members. Goals include: a) Continue development of the existing KI-TENEX facility to maximize effectiveness for community use. We expect to continue improving system reliability and efficiency, subsystem software, documentation and user help facilities, and communications facilities. b) Finish procurement of a satellite machine (DEC 2020) and integrate it into the existing SUMEX-AIM facility. This will inelude developing necessary hardware and software interfaces (Ethernet) and evolving management policies and tools with the AIM Executive Committee to allocate this resource most effectively to meet community needs. This system will also give us experience with the many issues of distributing computing resources among collaborating projects that we expect to face in future years. c) Recruit new applications and projects to broaden the range of high quality medical Al applications. Several potential user projects are currently pending review and we will explore others that might be suggested by advisory group members or other contacts. We will continue to review existing projects in relation to SUMEX AI goals and capacity and to encourage the development of independent resources to support mature projects. d) We plan to work closely with other AIM resource nodes, such as the one at Rutgers, to ensure effective community support between the facilities and to take further advantage of expertise in various user groups for system and user software development. e) We will submit an application for a follow-on renewal term to the current 3-year grant which terminates in July 1981. This application will focus on continued development of artificial intelligence tools and applications central to the needs of medical science and the development of effective computing resources within the SUMEX-AIM community to enable progress towards those goals. 2) TRAINING AND EDUCATION Within our resources, we will continue to assist new and established user projects in gaining access to SUMEX-AIM facilities. Collaborating projects will provide their own manpower and expertise for the development and dissemination of their AI programs. Goals include: 43 E. A. Feigenbaum Future Plans Section 2.1.4 a) Continue to provide a high standard of system documentation and limited staff assistance for user problems. b) Allocate funds approved for “collaborative linkages" in cooperation with the AIM Executive Committee to assist collaborating projects to meet their needs for communication and access to the SUMEX-AIM resource. c) Provide continued support for the AIM workshop activities in the form of demonstration support, participation in workshop discussions, and assistance for potential pilot users in understanding the SUMEX-AIM community. 3) CORE RESEARCH Next year, no further work is planned on the MAINSAIL project for which highly successful initial design and demonstration phases were completed this past year. Our core research work will emphasize continued development of tools of general interest to the SUMEX-AIM community, AI information resources, and basic efforts to understand and build knowledge-based "intelligent agent” programs. This work will complement on-going collaborator project developments by providing links to make more general results available to the entire community. We will] continue to provide partial funding for selected individuals in the Stanford Heuristic Programming Project for these core research goals with special relevance to SUMEX medical AI applications. This support is an appropriate share, complementing funding from other sources such as ARPA and NSF. Attention will be focused on a number of areas of research: a) AI Handbook - complete publication of Volume I and concentrate on the research, draft, and external review process for Volume II. b) AGE - improve the user interface to the AGE “tool kit" including tutorial and design assistance subsystems. We will also extend the range of tools available including such mechanisms as backward-chained inference, heuristic search, portions of the MOLGEN "units" package, and semantic networks. c) Representation - design appropriate symbolic structures for modeling knowledge about a problem. Presently this phase is carried out entirely by system builders. Goals are to codify the knowledge used to make such decisions, beth as an aid to the system builders and ultimately to enable programs themselves to choose appropriate representations. d) Reasoning - model the appropriate inference mechanisms for a problem and build systems that incorporate those models. e) Knowledge acquisition - design of systems that acquire knowledge by communication with human experts. E. A. Feigenbaum 44 Section 2.1.4 Future Plans #) Multiple uses of knowledge - design of systems that use the symbolic representation of the domain knowledge for additional purposes such as consensus building Caccommedating conflicting advice from experts whose competence may be equal but whose "styles" vary), tutoring of human students by employing the knowledge base (both the information it contains and the way it is organized), and explanation (constructing a chain of rules which satisfactorily rationalize the system's behavior to an observer. 45 E. A. Feigenbaum Future Plans Section 2.2 2.2 Summary of Resource Usage The following data give an overview of SUMEX-AIM resource usage. There are four subsections containing data respectively for 1) overall system loading, 2) resource use by community, 3) resource use by project, and 4) network use. 2.2.1 Overall System Loading The following plots display several different aspects of system loading over the life of the project. These include total CPU time delivered per month, the peak number of jobs logged in, and the peak load average. The monthly "peak" value of a given variable is the average of the daily peak values for that variable during the month. Thus, these "peak" values are representative of average monthly loading maxima and do not reflect the largest excursions seen on individual days. These data show well the continued growth of SUMEX use and the self- limiting saturation effect of system load average, especially after installation of our overload controls early in 1978. Since late 1976, when the dual processor capacity became fully used, the peak daily load average has remained between about 5.5 and 6. This is a measure of the user capacity of our current hardware configuration and the mix of AI programs. 7004 Total CPU Usage My se i 500+ wi" £ c = 4004 “N n & =~ 3004 = 256K Memory Added O Disks Upgraded 200- Dual Processor Installed 100- 9 v t T F T t t 7 f i : + a 1 Tv T T I Tt OFTATOIATITOITAITaTaAT a: 1375 1376 1977 1373 1379 Figure 8. Total CPU Time Consumed by Month E. A. Fetgenbaum 46 Section 2.2.1 Overall System Loading 504 Peak Jobs Logged In a 40- A 2 An 3 MU 5 256K Memory Added ° Mj Disks Upgraded a 207 if Dual Processor Installed 10- 9 t i TT v rs i r cm r 1 T t i rm T . 1 T OJATOIFTATOIFAIAITAIAOIA 1975 1876 1977 18732 1379 Figure 9. Peak Number of Jobs by Month 3- Peak Load Average 64 V\ \ u \ an Q : y v \ > é,. | a 7 256K Menory added = Dual Processor alienate Ttastalled 25 0 t t T T ¥ PE + T t v t J T oF Lo I 7 OJATOIFAITI AGI ATO AToT A 13975 1976 1977 1978 i979 Figure 10. Peak Load Average by Month 47 E. A. Feigenbaum Relative System Loading by Community Section 2.2.2 2.2.2 Relative System Loading by Community The SUMEX resource is divided, for administrative Purposes, into 3 major communities: user projects based at the Stanford Medical School, user projects based outside of Stanford (national AIM projects), and common system development efforts. As defined in the resource management plan approved by BRP at the start of the project, the available system CPU capacity and file space resources are divided between these communities as follows: Stanford 402% AIM 40% Staff 20% The "available" resources to be divided up in this way are those remaining after various monitor and community-wide functions are accounted for. These include such things as job sch2du"ing, overhead, network service, file space for subsystems, documentation, etc. The monthly usage of CPU and file space resources for each of these three communities relative to their respective aliquots is shown in the plots in Figure 11 and Figure 12. Terminal connect time is shown in Figure 13. It is clear that the Stanford projects have held an edge in system usage despite our efforts at resource allocation and the substantial voluntary efforts by the Stanford community to utilize non-prime hours. This reflects the maturity of the Stanford group of projects relative to those getting started on the national side and has correspondingly accounted for much of the progress in Al program development to date. E. A. Feigenbaum 48 Relative System Loading by Community Section 2.2.2 National AIM RCN pm bm emt nae) LK Op mbm ent OO waar) Lop. ty - rH Fy PON ak er 2 en. Po i7 et 0 40+ 304 20- 10- P2fn Ndd % 0 Stanford crn ta 4 r O PR) PO pe pmb wt - OO. Py ROR btm 4 r CO RMN py - H > RCN pmb 4 0 404 3904 20-7 10- P25n Ndd % 0 Staff RECN pomemy oo} re PCR — b> mt > RON Pai) wnt - fH en) RON pty ot + bey wena > «1 20+ T oO “i P8Sn Ndd % Monthly CPU Usage by Community 11. Figure E. A. Feigenbaum 49 Section 2.2.2 Relative System Loading by Community Disks Upgraded National AIM 404 0 0 0 0 pecn soudg ail4g ¥ a 0) ce mY po tm} r by © ee Op oO pm by amt r oO > i Nm crn oO ba Py gd Disks Upgraded - O PH > Lo Pr Oem oO? pt) et - (3 Py Ww ReCrn ° par) <4 LO Stagford eo 40- 0 0 0 pesn soudsg o1!14 X% Disks Upgraded Staff art jam bony et r O PF cares Ht 4 ro rb Lo pom my of R > Peon, em tmy emt Ry mon maa oe r © 7 7 oO So “QJ am pesn souds oI 4 x Monthly File Space Usage by Community 12. Figure 50 Feigenbaum cE. A.