SUMEX STANFORD UNIVERSITY MEDICAL EXPERIMENTAL COMPUTER RESOURCE COMPETING RENEWAL APPLICATION RR - 00785 BOOK | RESEARCH PROPOSAL Submitted to BIOTECHNOLOGY RESOURCES PROGRAM NATIONAL INSTITUTES OF HEALTH — June 1, 1977 DEPARTMENT OF GENETICS STANFORD UNIVERSITY SCHOOL OF MEDICINE Joshua Lederberg, Principal Investigator Table of Contents BOOK I ~ ection Page Table of Contents - BOOK I . . ee ee ee i List of Figures . . ee ee ek iv Table of Contents - BOOK II . . . . - 6 ee Vv 1. BACKGROUND AND PROPOSED WORK . . . . ee ee ee a 1.1 OVERVIEW OF OBJECTIVES AND RATIONALE . . . . - re | 1.2 SIGNIFICANCE rr . . 68 6 - - - & 1.3 BACKGROUND AND PROGRESS . . . 0.0. 1.3.1 PROGRESS SUMMARY se ee 1.3.2 DETAILED PROGRESS REPORT . . . . . , 1.3.2.1 DEFINITION OF TERMS AND CBJECTIVES . . . 1.3.2.2 FACILITY HARDWARE. 2... kk wk 11 1.3.2.3 SYSTEM SOFTWARE eee i a 18 1.3.2.4 NETWORK COMMUNICATION FACILITIES . 2... . ., 20 1.3.2.5 SYSTEM RELIABILITY AND BACKUP . . . . 1.3.2.6 PROGRAMMING LANGUAGES a 1.3.2.7 STANFORD AT HANDBOOK PROJECT 2. 2... uw, 320 1.3.2.8 USER SOFTWARE AND INTRA-COMNUNITY COMMUNICATION - 31 1.3.2.9 DOCUMENTATION AND EDUCATIGN a rn 32 1.3.2.10 SOFTWARE COMPATIBILITY AND SHARING - 8 32 1.3.2.11 RESOURCE MANAGEMENT . 2... : 33 1.3.2.12 SUMMARY OF RESOURCE USAGE . . . . 4Q 1.3.2.13 NETWORK USAGE STATISTICS J. Lederberg i Privileged Communication 1.3.2. 1.3.2. 2. SPECIFIC AIMS 2.1 RESOURCE 2.2 TRAINING TABLE OF CONTENTS BOOK I (continued) 14 PUBLICATIONS 15 RESOURCE STAFFING HISTORY OPERATIONS AIMS . 2. . AND EDUCATION AIMS . . 2.3 CORE RESEARCH AIMS . . 3. METHODS OF PROCEDURE ee ee 3.1 RESOURCE OPERATIONS PLANS 8 ee ee 3.1.1 SYSTEM HARDWARE AND MONITOR PLANS 3.1.2 COMMUNICATION NETWORK PLANS . . . . 3.1.3 SOFTWARE SUPPORT PLANS ee ee 3.1.4 COMMUNITY MANAGEMENT PLANS . . . wl 3.2 TRATNING 3.3 CORE RESEARCH PLANS a AND EDUCATION PLANS ee ee 3.3.1 GENERALIZATION OF AI TECHNIQUES 3.3.1.1 DESIGN OF KNOWLEDGE-BASED CONSULTATION 3.3.1.2 ATTEMPT TO GENERALIZE (AGE) PACKAGE 3.3.1.3 PLAN PACKAGE - ee : 3.3.1.4 HEURISTIC KNOWLEDGE ACQUISITION : 3.3.1.5 GENERAL EXPLANATION SYSTEM . . 3.3.2 SOPTWARE EXPORT ALTERNATIVS eee 3.3.2.1 NETWORK ACCESS .. 3.3.2. 3.3.2 SYSTEMS 2 MACHIN#-INDEPENDENT LANGUAGE IMPLEMENTATION 3 EXPORTABLE (PDP-10) SYSTEM . 3.3.3 EXPORTABLE MACHINE PLANS . . Privileged Communication Li 56 - 57 - - 58 : 58 59 39 61 - 62 62 - « 64 . 64 ~ 65 - -« 56 - 67 - . 67 : 57 - 69 : 71 2 + 73 * 75 - + 78 - 79 79 80 . 31 Lederberg TABLE OF CONTENTS BOOK I (continued) 3.3.4 MAINSAIL DEVELOPMENT PLANS . . 2. 3.3.4.1 DEVELOPMENT MANAGEMENT . . . wl 3.3.4.2 LANGUAGE DEVELOPMENT . . . 2. . 3.3.4.3 COMPILER DEVELOPMENT . . . 2. . 3.3.4.4 RUNTIME DEVELOPMENT . . . . . . 3.3.4.5 DEBUGGING SYSTEM DEVELOPMENT . . . 3.3.4.6 DOCUMENTATION PLANS . . . . . 3.3.4.7 MAINTENANCE AND DISTRIBUTION PLANS . 3.3.4.8 PLANS FOR ADDITIONAL IMPLEMENTATIONS 3.3.4.9 MAINSALL OPERATING SYSTEM PLANS . . 3.3-4.10 MICROCODED MAIWSAIL MACHINE PLANS : 3.3.-4.11 DEVELOPMENT OF PORTABLE SOFTWARE . . q, AVATLABLE FACILITIES oe J. Lederberg 83 83 83 84 85 85 86 87 87 88 89 90 92 iii. Privileged Communication 10. 11. 12. 13. 14. J. Lederberg TABLE OF CONTENTS BOOK I (continued) List of Figures SUMEX-AIM Computer Configuration . . . . Cost-effectiveness of SUMEX Augmentations - * « . « * Capacity and Loading Increase with Dual Processor Augmentation TYMNET Network Map 2 ARPANET Geographical Network Map . . . . . .«. .- ARPANET Logical Network Map. . . . trionthly CPU Time Consumed ee eel ~ 6 6 ‘CPU Usage by Community 8 ee . File Space Usage by Community . . . . : . Average Diurnal Loading (3/77): Total Number of Jobs . .. . Average Diurnal Loading (3/77): Percent Time Used . . . . Average Diurnal Loading (3/77): Percent Overhead . Average Diurnal Loading (3/77): Balance Set - Joos in Core Average Diurnal Loading (3/77): Runnable Jobs . . . . TYMNET and ARPANET Usage Data. . . . . . . . iv 13 15 17 25 40 42 50 59 51 51 52 54 Privileged Communication Table of Contents BOOK II Section 5. BIOGRAPHICAL SKETCHES : - . 6. COLLABORATIVE PROJECT PROGRESS AND OBJECTIVES 6.1 STANFORD PROJECTS . . . . « . 6.1.1 DENDRAL PROJECT . . . . 6.1.2 HYDROID PROJECT . . . . . 6.1.3 MOLGEN PROJECT . ©. 2. .« « se 6.1.4 MYCIN PROJECT oe ee 6.1.5 PROTEIN STRUCTURE PROJEC . 6.2 NATIONAL AIM PROJECTS . . 6.2.1 ACQUISITION OF COGNITIVE PROCEDURES (ACT) 6.2.2 CHEMICAL SYNTHESIS PROJECT (SECS) . 6.2.3 HIGHER ciENTAL FUNCTIONS PROJECT - 6.2.4 INTERNIST PROJECT . . . . 6.2.5 MEDICAL INFORHATION SYSTEMS LABORATORY 6.2.6 RUTGERS COMPUTERS IN BIOMEDICINE 6.3 PILOT STANFORD PROJ#CTS . . . 6.3.1 GENETICS APPLICATIONS PROJSCT . . 6.3.2 BAYLOR-METHODIST CEREBROVASCULAR PROJECT 6.3.3 COMPUTER ANALYSIS OF CORONARY ARTERIOGRAMS 6.3.4 QUANTUM CHEMICAL INVESTIGATIONS . Privileged Communication . Page 2 ee eT - ee AY - 6 eA ~ « . 42 7 + «6 75 ~ . . 81 ~~ . 6 84 - « « 108 * «© « 112 «= 2 6 113 - 6 « 118 ee 128 - 2 . 132 » 2 2) 138 oe TAY - 2. = 158 7 - « 159 - « «. 161 2 - -) 165 2 ee) 169 J. Lederberg TABLE OF CONTENTS BOOX II (continued) 6.4 PILOT AIM PROJECTS . 2. 2... hehe 6.4.1 COMMUNICATION ENHANCEMENT PROJECT. . 6.4.2 AY IN PSYCHOPHARMACOLOGY . . . 6.4.3 ORGAN CULTURE PROJECT . . . . «. . 6.4.4 NEUROPROSTHESES PROJECT . . . 6.4.5 MATHEMATICAL MODELING OF PHYSIOLOGICAL 6.4.6 PUFF/VM PROJECT . 2. 2. . ee Appendix I OVERVIEW OF ARTIFICIAL INTELLIGENCE RESEARCH Appendix II AI HANDBOOK OUTLINE a Appendix III SUMMARY OF MAINSAIL LANGUAGE FEATURES . . . . Appendix IV MICROPROGRAMMED MAINSATL PLANS . . . Appendix V AIM MANAGEMENT COMMITTEE MEMBERSHIP - ee Appendix VI USER INFORMATION ~ GENERAL BROCHURE ee Appendix VII GUIDELINES FOR PROSPECTIVE USERS 8 eee Privileged Communication vi- . . SYSTEMS 6 -) 191 . 194 - + + 197 235 + 2 « 239 2 ee ONG J. Lederberg RESEARCH PLAN II. RESEARCH PLAN ~ BOOK I This is an application for renewal of a grant supporting the Stanford University Medical Experimental computer (SUMEX) research resource for applications of Artificial Intelligence in Medicine (AIM). The research plan has been divided into several logical parts: 1) Book L - Resource research objectives and rationale, progress report, and detailed research plans. 2) Book Il - Biographical sketches, collaborating project reports and plans, and supporting appendixes. 3) Budget - First year budget detail, five-year budget summary, and budget explanation and justification, 1 BACKGROUND AND PROPOSED WORK 1.1 OVERVIEW OF OBJECTIVES AND RATIONALE The SUMEX-AIM project is a national computer resource with a dual mission: 1) the promotion of applications of artificial intelligence (AI) computer science research to biological and medical problems and 2) the demonstration of computer resource sharing within a national community of health research projects. In the body of this proposal, we offer definitions and explanations of these efforts at several levels of detail to meet the needs of reviewers from various perspectives. For this overview, we give only a brief summary of our recent accomplishments, present status and expectations for the requested term of the renewal, the five years beginning August 1,1978. Definitive funding of the SUMEX-AIM resource was initiated in December 1973. The principal hardware was delivered and accepted in April 1974, and the system became operational for users during the summer of 1974. The present renewal is therefore written from a perspective of just short of three years of experience in attempting to develop and serve the user community for the resource. The original SUMEX proposal was an outgrowth of two lines of endeavor at Stanford that had been supported by the Biotechnology Resources Program. The ACHE project (Advanced Computer for MEdical Research), 1965-72, had introduced the innovation of interactive time-shared computing to the medical research community at the Stanford Medical Center. Based on an IBM 360/50 with mass core storage, this system was notaole for the ease with which physicians and scientists, previously inexperienced with computers, were able to learn a variety of applications with minimal help from professional programmers. With the further development of the technology, and the rationalization of computer support functions at Stanford, this system was eventually integrated with the university- Privileged Communication 1 J. Lederberg Section 1.1 OVERVIEW OF OBJECTIVES AND RATIONALE wide time-sharing service. While ACME had some shortcomings as a production (contra development) tool many of our colleagues at the medical school still look back regretfully at having lost it as a medical-school-dedicated system tuned to their special needs. The second line, the DENDRAL project, is a resource-related project connected with applications of artificial intelligence to problems of molecular characterization by analytical instruments like mass-spectrometry, gas- ecnromatography, nuclear magnetic resonance, and so on. In 1972 we applied to NIH for the establishment at Stanford of a next generation computer resource to supplant ACME for applications for which the university-wide facility was inadequate. The DENDRAL project was the central source of this initiative; several others entailing real-time instrumentation as much as AI needs were also specified. During the subsequent 18 months, we entered a phase of protracted review and negotiations with BRP and its advisory groups, from which emerged the policy determination that resources of this scope were best justified if they could be functionally specialized, but geographically . generalized. The emerging technology of computer networking opened an opportunity to demonstrate this model in a way that could serve both local and national needs. With all of this in mind, we were happy to undertake the responsibility of such a demonstration, which seemed important as a step in community-building as well as in providing the computing resources so urgently needed for our own and others” research efforts. In many respects it would have been far more convenient to focus on our own requirements, but the satisfaction of these seemed both infeasible and too limited an aspiration in the face of the suggested opportunity. Three years is hardly long enough for a conclusive determination of the success of such a model, though we ean fairly take pride in the diligence and technical competence with which we nave responded to the community responsibilities mandated by the terms of the award. An important element in satisfying those responsibilities was the establishment of a mutually satisfactory management structure, on which we report in further detail below. Good will and common purpose are of course the indispensable ingredients, and we are grateful to have been able to offer this service in a congenial framework, and at the same time to be able to support our local computing research needs. Our technical task has been achieved: to collect and implement an effective set of hardware and software tools supporting the development of large and complex AI programs and to facilitate communications and interactions between user groups. In effect, users throughout the country can turn on their own teletype or CRT-display terminals, dial a local number, and logon to SUMEX-AIM with the same ease as if it were located on their own campus -- and have access to a specialized resource unlikely to be matched nearby. From the community viewpoint, we have substantially increased the roster of user projects (from an initial 5) to 11 current major projects plus a group of pilot efforts. Many of these projects are built around the communications network facilities we have assembled; bringing together medical and computer science collaborators from remote institutions and making their research programs available to still other remote users. As discussed in the sections describing the individual projects, a number of the computer programs under development by these groups are maturing into tools increasingly useful to the raspective research communities. The demand for production-level use of these programs has surpassed the capacity of the present SUMEX facility and has raised the general issues of how such software systems can be optimized for production environments, exported, and maintained. J. Lederberg 2 Privileged Communication OVERVIEW OF OBJECTIVES AND RATIONALE Section 1.1 The principal thrust of this renewal proposal is to sustain the momentum of SUMEX-~AIM, both as a facility and as a community, during a period of rapid change in the technology and economics of computers. For reasons that will be justified in more detail, we do not plan further major expansion of centralized hardware at SUMEX, believing that growing community needs should now be met as justified at distributed nodes. It is difficult to make firm predictions of the technological changes that will present themselves during the period of the grant, but it may be that some conversion of the system will be necessary if only to keep pace with the software exchanged with cognate communities. More concretely, our objectives for this next grant term include: 1) Maintaining the vitality of the ATM community of projects. This will entail scrutiny of old and new projects in what is approaching a steady-state of maximum capacity, and improving the efficiency with which developmental programs can be furnished to medical research groups. 2) Continued computational support for the AIM community based initially on our existing KI-10 facility. We expect the computing hardware technology to change substantially in the next few years with the availability of both more powerful and smaller and cheaper machines. Additional large-machine resources may still be necessary to meet the growing needs of the community during this period. As already stated, this kind of growth should be implemented at sites other than Stanford, but can be embraced by the same management structure as governs SUMEX-AIM. We plan to study these new technological alternatives affecting our central facility and to attempt to maintain software compatibility for our dual KI-10 system. Only should this prove untenable or grossly inefficient will we consider a hardware conversion to a more directly compatible implementation. 3) Continued work to improve system software and communication facilities for community interactions and tne dissemination of programs. This will include advantageous connections to emerging communications networks and administrative efforts to exploit community expertise and sharing in software development. 4) Core research work to explore ways of exporting complex AI programs including new language support (MAINSAIL), specialized satellite computer systems, the use of networks for software dissemination and maintenance, and examinations of more operationally efficient implementations of AI programs. We will continue to work closely with the XEROX-PARC group, which remains primarily responsible for maintaining INTERLISP. 5) Core research work to attempt to generalize and document AI tools that have been developed in the context of a number of individual application projects. This will include work to organize the present state-of-the-art in AI techniques and tools through the AI-Handbook effort and the development of generalized software packages for the acquisition, representation, and utilization of knowledge in AI programs. These packages will facilitate the exploration of new areas of application of these tools. Privileged Communication 3 J. Lederberg Section 1.2 SIGNIFICANCE 1.2 SIGNIFICANCE Viewed in the narrowest definition of a biotechnology resource, SUMEX-AIM is justified by the technical capabilities it offers for the pursuit of research using advanced computer applications relevant to the NIH mission. The progress reports of the various user projects speak for themselves in the diversity and pertinence of the work accomplished. We do not underestimate (and share as a grave responsibility) the overall investment charged to the resource; but this is quite reasonable when apportioned over the whole range of projects. The shared resource is plainly far more economical than any alternative method of providing comparable facilities to such a range of users distributed over the country. Similar considerations apply to a variety of other kinds of research hardware. Unique to the computer is the extent to which shared hardware contributes to methodological cooperation; wnat in this context we call software compatibility. This follows from the unparalleled complexity of computer programs as process-specifications. What other techniques are or can be formulated as recipes of 190,000 or more instructions, each of which must be faithfully executed or the whole system will collapse? Yet we know that a sreat deal of our knowledge, e.g., in medical diagnosis, may prove to be of similar couplexity when explicitly and formally expressed. We infer that many fields of scientific inquiry will have to use similar methods of exchange of critical commentary; that the electronic communications of computer programs is a prototype for the maintenance of other knowledge bases essential for the fabric of a. complex and demanding society. The conputer is at one time the node of a knowledge-sharing network, and the device for verifying the consistency and pertinence of the updates and criticisms that the users remit. Thus we can view our resource as exemplifying a technology that induces a new social organization of seientific effort (we would not be the first to recall Gutenberg; and to view ourselves as analogs of some of the early experiments with the use of the print medium for journals and academies.) From this perspective, it is quite fittins that the initial grant that established SUMEX-AIM was attended by so much preoccupation with managerial design, not ordinarily the favorite occupation of scientific types. several concrete illustrations of the encouragement of dynamic criticism that enhances the robustness of shared knowledge can be elicited from current projects (see Section 6 on page 41 in Book II), apart from the most familiar instances of sharing of software over the computer networks. The MYCIN rule bases, and the text of the AIHANDBOOX are continuously updated by critical users and reviewers. In fact, the text of various parts of this proposal went through dozens of iterative revisions, with comments fron many interested groups, within the several weeks that were dedicated to its preparation. Another, and one of the most interesting examples, was the experimental use of the CONGEN program (See the DENDRAL progress report on page 42 in Book II) in a graduate class in advanced organic chemistry taught by Professor Djerassi. Each of 25 students scanned tne recent literature for claims of new structures whose proofs were deemed to be interesting or dubious or both. Five exanples were selected for exhaustive reexamination by the students. In each case, the published proof was found to be defective when it was checked by CONGEN -- alternative structures Naving been overlooked by the authors that still gave good fits to the given data. These and several comparable examples of asserted scientific fact are being more carefully reexamined in the autnors’” laboratories in response to the Jd. Lederberg 4 Privileged Communication SIGNIFICANCE a Section 1.2 program’s refutations. In due course, we believe this kind of mechanized checking of "proofs" of chemical structures will be a routine part of the peer review critical function of the editorial staff of the journals. These advances are facilitated by the tight internal cohesion of argument in structural organic chemistry, compared to other scientific fields -~ precisely why this scientific domain was the one chosen for our initial work on applied AI. The technical and sociological implications of our program are in fact elaborated throughout this proposal. By contrast, this may be the place to digress with some more personal observations (in the voice of the principal investigator) about the need for scientists to attend more self-consciously to the process of science itself, and to the political questions of social choice that are part of the accountability of science, to offer due return for value received. Although SUMEX-AIM is rooted in the sub-discipline of "Artificial Intelligence" we understand and share the discomfort that many bystanders have in trying to give it a precise definition. It might have been preferable to think of "knowledge-engineering" as the thread that links almost all of our projects. This has connotations that might recall "data-base-management"; and we should not disparage the role that efficient systems for retrieving complex data will have in our effort. But our task is not usually to maintain a telephone-directory witn yellow pages, but instead to gather, test and validate a hierarchy of generalized rules that operate both on each other, and on data of the kind that are the province of the information-retrieval subdiscipline. The development of the computer programs to perform these operations is the software-science part of our effort. Benind it is necessarily a new level of focussed inquiry into the rules of scientific inference in detail. that could only be cross-—checked by interaction with the machine. We are traversing a time when the very justification for basic research is under critical, often even hostile scrutiny. Many quarters are asking such questions as "How much of the health progress of the past 30 years can be attributed to advances in knowledge connected with NIH-supported research?" Are our institutional arrangements and patterns of funding really the most appropriate for the most efficient “transfer of technology” from the basic laboratory “to the bedside’?" Less often raised by external critics is, "To what extent does the present system support the most fundamental innovations within science itself; or does it inevitably focus overwhelming support on the most obvious, transparent questions and discourage more revolutionary kinds of inquiry?" Within the NIH directorate, it has been stipulated that "Currently, within the research community, formal processes are lacking to assure systematic identification and evaluation of clinically relevant research information, and its effective transfer to the health care community...." It is not always popular to insist that these questions must be faced up to -~ that basic science cannot indefinitely subsist on unconfirmed faith as to its promise. Furthermore, it is easy to show that many short-term advances have arisen from the most pragmatic kinds of investigation: empirical screening for antibiotics or antidiuretics has undoubtedly generated more life-saving therapeutic products than the most sophisticated molecular biology, up to the Privileged Communication 5 J. Lederberg Section 1.2 SIGNIFICANCE present moment. Indeed, salt-water, intelligently administered, has been one of the great life-savers of the recent era! On the other hand, I hold that it would be tragic to undermine the enormous long range potential of basic insight without a deeper analysis of the process by which knowledge and insight move from basic science into clinical problems; and we just might find some ways to improve the system without wrecking it! These remarks should be taken as exposing a philosophical preoccupation ratner than as the design of a research program. Tney do relate to efforts like the MOLGEN project, which include a great deal of focussed introspection on the intellectual substance of scientific inquiry. It would be premature to clain that computer programs per se will soon be delegated the major responsibility for "systematic identification of relevant knowledge", although they can already play a very helpful role in assisting human intelligence to correlate bibliographic data, and in other ways. However, the very process of implementing an "applied philosophy of science", which is the principal forework of developing a domain for the application of knowledge-based AI, is exactly the kind of formal systematization called for in these renewed efforts to facilitate technology transfer to health care. Longer range success in our AI research will be as important in helping us understand what we are doing as scientists and diagnosticians as in providing mechanical assistance to these ends. Aithough our substantive efforts are mostly concerned with the "micro. problems" of scientific or clinical inference, there may be more important treasures in a macro-perspective on the integration of knowledge in medicine. My own most important laboratory accomplishments have all concerned the discovery of new problems, and the bringing together of previously disparate disciplines, rather than the solution of extant puzzles -- the discovery of sex in bacteria, better viewed as the marriage of genetics and bacteriology is perhaps the least controversial instance. I believe that it is reasonable to expect that the systematization of biomedical knowledge, to which computer AI will make an indispensable contribution, is an important side effect of these investigations in knowledge-engineering; and that this will lead in turn to the recognition of holes in the overall fabric tnat badly need patching. We have too little theory of the practice of science to offer more than case studies at this time -- I have been spending some time in collaboration with a historian and sociologist in trying to achieve a better understanding of the dynamics of discovery of bacterial recombination, and found there is more to the context of that story than my own ingenuity. But it is also very difficult to reconstruct such events without critical recordings of the incidents as they occur -- recordings we are learning how to make in the MOLGEN work. [** Copies of a working paper illustrating this are available on request. **] To turn to a more clinically urgent arena, it is somewhat dismaying to recall that it took 35 years from Beadle and Tatum’s discovery of nutritional mutants in Neurospora to tne beginnings of the biochemical genetics of such important situations in man as atherosclerosis. I do intend to initiate some inquiry as to the inevitability of delays of that kind, which seem retrospectively absurd. We will not get analytically versuasive or policywise sound determinations of such questions without more attention to the underlying process of scientific inquiry tnan unselfconscious scientists are customarily wont to indulge in. J. Lederberg 6 Privileged Communication SIGNIFICANCE Section 1.2 This kind of speculation can also be translated into conerete research programs, which in turn may evoke some new principles. Kidney-stones are an unlikely arena of concern for someone of my particular scientific background: but a number of issues have emerged in consultations with some of my colleagues in tne Stanford Division of Urology. There has been substantial evidence for some time of a significant genetic factor in chronic recurrence of stones. This does not seem to be correlated with overall rates of calcium oxalate excretion; indeed one must focus on the stone as a pathological form of crystal aggregation -~- much larger quantities of calcium oxalate are passed as microcrystals by normal individuals. Several workers nave identified mucopolysaccharides in the matrix of these stones, and some have speculated about their possible role as initiators or cements in stone formation. On the other hand, geneticists have long known that blood-group substances, (mucopolysaccharides!) appear in the secretions, including the urine, of the Se/se and Se/Se [Secretor] genotypes; although saliva is the preferred sample for diagnosis. Still another worker, a pathologist, has remarked on the occurrence of mucopolysaccharide concretions in the tubules near the renal papillae of Se/se subjects. To the best of my knowledge, these disciplinary nuggets have been privately and separately held, and there has been no effort to study their possible interconnection. A survey is now underway at Stanford to test a possible statistical association of Secretor and blood group type with stone recurrence. These suggestions were arrived at through interpersonal discourse, experts from different disciplines being able to furnish provocative data points when prodded by a more general inquiry. Could one imagine a more general problem. generator that could arrive at similar conclusions? Pernaps so -- one could parse through the medical subspecialties, or through significant diseases, to ask more systematically if they had been scrutinized from the perspective of, say, biochemical genetics. And this raises many other nypothetical inputs to a combinatorial-generator of potential, new interdisciplines. One hastens to add, that most of the rotely drawn intersections will be meaningless or empty -- enough perhaps that the whole game may end up looking quite silly. However, the problematics of the game have not been explored, and to that extent, there is a pilot project here that I intend to pursue. Its practical feasibility will depend in part on the briskness with which relevant data can be fetched from the literature and from other experts, and I will be exploring possibilities of on- line access to bibliographic databases 1) to help support this effort, and 2) to suggest further research efforts in the use of AI techniques for bibliographic inquiry in ways that may be pertinent to macro-policy of research management. Privileged Communication 7 J. Lederberg Section 1.3 BACKGROUND AND PROGRESS 1.3 BACKGROUND AND PROGRESS 1.3.1 PROGRESS SUMMARY This progress summary covers the period from December 1973, when the SUMEX- AIM resource was initially funded, through April 1977. During this period we have met all of the defined goals of the resource: i) We have established an effective computing facility to support a nation- wide community of medical AI research projeets including connections to two computer communication networks to provide wide geographical access to the facility and research programs. ii) We have actively recruited a growing community of user projects and collaborations. The initial complement of collaborators included five projects. This roster nas grown to eleven fully authorized projects currently plus a group of approximately six pilot efforts in various stages of formulation. Recruiting efforts have included a public dedication and announcement of the resource, NIH referrals from computer- based project reviews, direct contacts by resource personnel and on-going projects as well as contacts through the AIM workshop series coordinated by the Rutgers Computers in Biomedicine resource under Dr. Saul Amarel. iii) We have established an AIM community management structure based on an overseeing Executive Committee and an Advisory Group to assist in recruiting and assessing new project applications and in guiding the priorities for SUMEX-AIM developments and resource allocations. These committees also provide a formal mechanism for user projects. to request adjustments in their allocated share of facility resources and to make known their desires for resource developments and priorities. iv) SUMEX user projects have made good progress in developing more effective consultative computer programs for medical research; one of the major goals toward which our AI applications are aimed. These performance programs provide expertise in analytical biochemical analyses and syntheses, medical diagnoses, and various kinds of cognitive and affective psychological modeling. v) We have worked hard to build system facilities to enable the inter- and intra-~ group communications and collaborations upon whicn SUMEX is based. We have a number of examples in which user projects combine medical and computer science expertise from geozrapnically remote institutions and numerous examples of users from all over the United States and occasionally from Europe experimenting with the developing AT programs. The SUMEX staff itself nas had good success in establishing such sharing relationships on a system level with otner research groups and has many examples of complementary development and maintenance agreements for system programs. vi) We have made numerous improvements to the computing resource to extend its capacity, to improve its efficiency, to enhance its human interfaces, to improve its documentation, and to enhance tne range of software facilities available to user projects. J. Lederberg 8 Privileged Communication PROGRESS SUMMARY Section 1.3.1 vii) We have begun a core research effort to investigate alternatives and programming tools to facilitate the exportability of user and system software. This is just now producing a "machine-independent" implementation of the ALGOL-like SAIL languaze which will run ona range of large and small machines and provide a language base for transferring programs, viii) We have supported community efforts in the more systematic documentation of AI concepts and techniques and in buildings more general software tools for the design and implementation of AI application programs. These have included a Stanford AI Handbook project comprising a compendium of short articles about the projects, ideas, problems, and techniques that make up the field of ATI. Privileged Communication 9 J. Lederb J. Le erg Section 1.3.2 DETAILED PROGRESS REPORT 1.3.2 DETAILED PROGRESS REPORT The following material covers in greater detail the SUMEX-AIM resource activities over the past 3.5 years. These sections attempt to define in more detail the technical objectives of our research community and include progress in the context of the resource staff and the resource management. Details of the progress and plans for our external collaborator projects are presented in Seetion 6 on page 41 (in Book II). 1.3.2.1 DEFINITION OF TERMS AND OBJECTIVES Artificial Intelligence is a branch of computer science which attempts to discern the underlying principles involved in the acquisition and utilization of knowledge in reasoning, deduction, and problem-solving activities (1). Currently authorized projects in the SUMEX community are concerned in some way with the application of these principles to biomedical research. The tangible objective of this approach is the development of computer programs which, using formal and informal knowledge bases together with mechanized hypothesis formation and problem solving procedures, will be more general and effective consultative tools for the clinician and medical scientist. The exhaustive search potential of computerized hypothesis formation and knowledge base utilization, constrained where appropriate by heuristic rules or interactions with the user, has already produced promising results in areas such as chemical structure elucidation and synthesis, diagnostic consultation, and mental function modeling. Needless to Say, much is yet to be learned in the process of fashioning a coherent scientific discipline out of the assemblage of personal intuitions, mathematical procedures, and emerging theoretical structure of the "analysis of analysis" and of problem solving. State-of-the-art programs are far more narrowly specialized and inflexible than the corresponding aspects of human intelligence they emulate; however, in special domains they may be of comparable or greater power, e.g., in the solution of formal problems in organic chemistry or in the integral calculus. An equally important function of the SUMEX-AIM resource is an exploration of the use of computer communications as a means for interactions and sharing between geographically remote research groups in the context of medical computer science research. This facet of scientific interaction is becoming increasingly important with the explosion of complex information sources and the regional specialization of grouns and facilities that might be shared by remote researchers. Qur community building role is based upon the current state of computer communications technology. While far from perfected, these new capabilities offer nighly desirable latitude for collaborative linkages, both within a given research project and among them. Several of the active projects on SUMEX are based upon the collaboration of computer and medical scientists at me me en ee ee ee ee ce ee ee ee ee ar eae ne a ae ne ee a ee ee ene eee ee ee ee ee re ee ee ee ee ee ee ee (1) For recent reviews to give some perspective on the current state of AI, see: (i) Winston, P.H., “Artificial Intelligence", Addison-Wesley Publishing Co., 1977; (ii) Nilsson, N.J-., "Artificial Intelligence", Information Processing 74, North-Holland Pub. Co. (1975); and (iii) a summary by Feigenbaum, E. A., attached as Appendix I, page 202 (see Book II). An additional overview of research areas in AI is provided by the outline for an "Artificial Intelligence Handbook" being prepared under Professor Feigenbaum by computer science students at Stanford (see Appendix II on page 225 in Book II). J. Lederberg 10 Privileged Communication DeTAILED PROGRESS REPORT Section 1.3.2.1 geographically separate institutions; separate both from each other and from the computer resource. The network experiment also enables diverse projects to interact more directly and to facilitate selective demonstrations of available programs to physicians and medical students. Even in their current developing State, we have been able to demonstrate that such communication facilities allow access to the rather specialized SUMEX computing environment and programs from a great many areas of the United States (even to a limited extent from Europe) for potential new research projects and for research product dissemination and demonstration. In a similar way, the network connections have made possible close collaborations in the development and maintenance of system software with other facilities. 1.3.2.2 FACTLITY HARDWARE Based on the AI mission of SUMEX-AIM, we selected a Digital Equipment Corporation (DEC) model KI-10 computer system for our facility. This selection was based on 1) hardware architectural and performance features, 2) available software support relevant to AI applications, 3) price versus performance data for the system, and 4) the scope of the user community from which we might expect to draw collaborators and share software. This choice has proved highly effective. The current system hardware configuration is diagrammed in Figure 1 on page 14. It is the result of a number of augmentations over the past 3 years to meet the capacity needs of the growing SUMEX-AIM project community. Our initial configuration consisted of a KI-10 processor, core memory (192K 36-bit words @ 1 microsecond), swapping storage (1.7M words 9 8 msec average rotational latency and 2 microsecond/word transfer rate), file storage (40M words), magnetic tapes, DEC tapes, terminal line scanner, and line printer. Our network connections are discussed in Section 1.3.2.4 on page 20. This system reached prime-time saturation by fall of 1974. Since many of our medical and other professional collaborators cannot adjust their schedules to maten light computer loading during the night-time hours, the prime-time responsiveness is crucial to being able to support medical experimentation with developing programs and to allow community growth. We have taken active steps to transfer as much prime-time loading as feasible to evening and night hours including shifting personnel schedules (particularly for Stanford-—based projects), controlling the allocations of CPU resources between various user communities and projects, and encouraging jobs not requiring intimate user interaction to run during off hours by developing bateh job facilities. Despite tnese efforts, prime-time loading has remained quite high, particularly with the growth of the number of user projects. A similar congestion has persisted in the on-line file space we have been able to allocate to user projects. Again we have implemented controls to try to assure effective use of available space and to encourage use of external file Storage facilities such as the ARPANET Data Computer and other computer sites. Nevertneless, the interactive character of SUHMEX use, the large AI program files, and the extensive use of SUMEX for collaborator communications have continuously raised file space demands beyond those we could meet. Privileged Communication 11 J. Lederberg Section 1.3.2.2 DETATLED PROGRESS REPORT | We have proposed a number of hardware configuration augmentation steps to the Executive Committee to cost-effectively provide additional capacity. These were based on analyses of predominant system bottlenecks and enhancement steps feasible within available budgets. The enhancements approved by the committee and implemented include: 1) Add 64K words of core memory and 20ri words of file storage (11/74) 2) Add second KI-10 CPU for dual processor operation (5/75) 3) Add 256K words of core memory and upgrade file system to higher volume, lower cost technology (recently approved by NIH and the AIM Executive Committee with implementation in progress) A plot of effective CPU capacity as a function of continuing investment is shown in Figure 2 on page 15 and displays the cost-effectiveness of our sequential augmentations. At the present time our hardware configuration has grown about as much as is cost-effective. Additional growth would entail Significant redesigns of the system including upgrades of existing hardware. Contemplating such future expansion also raises the issues of compatibility with newer hardware technologies being announced. These provide advantages in speed, cost, size, and maintainability. Such a complete upgrade is not envisioned in the immediate future as a number of interesting new product announcements are expected over the next 1 or 2? years that could substantially affect such an upgrade strategy. Our plans in this direction are discussed in more detail under the proposed resource plans for the continuation period (see Section 3.1 on page 62). J. Lederberg 12 Privileged Communication Section 1.3.2.2 DETAILED PROGRESS REPORT uotTJeingpyuog szeqndwog WIy—-XANNAS —f ant $420g SS Q1-90 gouzqaquy aL wo Get Iv “T ein3yq [suypus9, ~Tay Asuuros Laan} = ANI /OT-Ix €1s - TBo07 — auTl >, adi GUVUSKAL £0-du €0-da 4ST xASTd O¢-nL odey c0-du £0-d¥ OT-Nd | saat AST 4ST ~To1r,U09 de-Ad €0-d¥ €0-du adel. AST ASTG O1-dL €O~-dul €0-da = A3Tbeed Og 9s 7A ASF AST SAUTT Aromqou -[013u0) vere at pneq 00g? (2) 00 T-da : OI-va O12 FOTTOAIUOD . saat zagutad ZIEL-V ZIEL-V -Toaqt09 aur 4sTq Asta O1-aa aoeszaqUy Zuzddens Sutddeas yeuueyg LENWAL QI-sad Of OT-IX Tf OT-IN OI-XN ASTTOIqUOD aossax0i1g aossav0ig zaxeTdparoy y TeuueUD Tess” Teaquse) Arowey OT-a OT-dh OT-4nN O1-IK Arowepxl Arowsyy Arowajy ArLoway J. Lederberg 13 ion icat leged Communi ivi Pr DETAILED PROGRESS REPORT Section 1.3.2.2 Figure 2. Cost-effectiveness of SUMEX Augmentations Estimated Capacity in Useful KI-10 Equivalents (Net of overhead) 24 - Add 256K memory and upgrade file/tape system [estimated improvement - upgrade in progress] \ Add second KI-10, 5/76 1 + - Add 64K memory, 11/74 \ Initial purchase, 3/74 KI-10 with 192K memory 0 1 2 Cumulative System Investment ($M) This plot illustrates the incremental increases in computing capacity achieved as a function of cumulative investment in the SUMEX-AIM facility. The higher slope of the curve after the initial investment illustrates both the substantial investment in peripheral devices (file system, tapes, communications, ete.) and the trend toward lower memory prices. The largest impact in terms of PDP-10 memory price reductions occurred around the time of adding the 64x increment in November 1974. Since then processor prices have stayed relatively stable and memory prices have dropped less dramatically. It should be noted that semi-conductor memories have not yet made a big in-road in the PDP-10 market; this technology is where the more recent memory price reductions have occurred. The original purchase of 1 KI-10 with 192K of memory for about $800K performed with about 60% efficiency under peak load. Adding the 64K of memory for $75K brought the efficiency up to about 85%. Then adding the second processor for $200K increased throughput to about 1.3-1.4 KI-10 equivalents. This step represents about a 59% increase in throughput for a 20% increased investment. A proposal has been approved recently by the AIM Executive Committee and NIH to augment core memory by 256K words. This augmentation would increase throughput to about 1.7 KI-10 equivalents for another $100K; this would be a 26% Privileged Communication 15 J. Lederberg Section 1.3.2.2 DETAILED PROGRESS REPORT throughput increase for 8% additional investment. As part of the proposed memory augmentation we plan to upgrade the file and tape systems as well to relieve file Space congestion and increase system operations efficiency. Including the net cost of the file/tape upgrade in these figures (purchase price less resale of existing equipment) raises the proposed additional investment to $160K and the fractional increase from 8% to 13%. Of course, the disk upgrade affects CPU throughput only indirectly in that the increased speed reduces contention, particularly when moving head swapping is necessary. It contributes primarily to supporting the growing on-line file needs of the projects. J. Lederberg 16 Privileged Communication DETAILED PROGRESS REPORT Section 1.3.2.2 Figure 3. Capacity and Loading Increase with Dual Processor Augmentation 1-PROC OP’N 2-PROC TRNS‘N 2-PROC OP’N 2-PROC OPN 1/76 - 4/76 5/76 ~ 8/76 9/76 - 12/76 W/TT - 3/77 Peak Ld Ave 4.8 5.6 6.0 6.6 Peak Jobs 30.2 33.3 34.7 38.1 % Overhead/ 18.1 31.1 33.2 31.9 processor Total CPu 304 4 384.9 534.0 520.1 Hrs/Mo 'Tnis table presents system usage data averaged over several months preceding, during, and after installation of the SUMEX-AIM dual processor system in order to show real changes in peak loading capacity and computing resources delivered. The first three rows of data are derived from monthly diurnal loading data and reflect average prime-time peak loading conditions (daily peak usage figures are often considerably higher, but those shown better represent gross trends). The last row gives average total monthly CPU hours delivered during the various periods. With the common criterion that users have pushed both the single and dual processor systems to the limits of useful work in terms of prime time responsiveness, it is clear that the second processor has substantially increased throughput ("tolerable" peak load average up 38%, number of jobs up 26%, and delivered CPU hours up 71%). At the same time the overhead burden per machine has risen from 18 to 32%, principally in the category of I/0 wait (total scheduler time and time waiting for a runnable job to be loaded in core). An additional factor, not explicitly shown in these data (because we only have a J msec clock), is the added time spent at interrupt level servicing drum swapping. This adds another 10-15% estimated overhead. We feel these increased overhead fisures can be reduced roughly to the single processor levels by adding more memory, thereby effectively recovering about 40-50% of the capacity of a KI-10 processor. A proposal is now pending witn the AIM Executive Committee for this augmentation and we expect it to be implemented within the funding ceiling of the current grant. Privileged Communication 17 J. Lederberg Section 1.3.2.3 DETAILED PROGRESS REPORT 1.3.2.3 SsYvsTten SOFTWARE In parallel with the choice of DEC PDP-10 hardware for the SUMEX-AIM facility, we selected the TENEX operating system developed by Bolt, Baranek, and Newman (BBN) as the most effective for our medical AI applications work. TENEX was the only available demand-paged system to support simultaneous large address space users, offered the INTERLISP language for LISP-oriented program development, and was well integrated with the ARPANET facilities which provide an excellent base for our community sharing efforts. This choice has proven a very effective one in that the productivity of the TENEX community in AI research has been highly advantageous to us (2). The original BBN TENEX was written for a hardware-modified KA-10 system. This version of the system required a substantial amount of work to accommodate the relatively limited paging facilities of the KI-10 to run effectively. These early phases also included substantial monitor work to incorporate the TYMNET memory-sharing interface which connects us to the TYMNET and to integrate the high speed swapping storage. We have made numerous enhancements to the monitor calls and corrections of bugs to develop a hizhly reliable and effective operating system for our community work. We continue to work to improve the efficiency of the system and its effectiveness in allocating valuable resources. For example we have modified the handling of user page tables so that the expensive procedure of clearing page tables and setting them up to run time-shared users could be minimized. This involved creating a pool of page tables which could be allocated to currently running users and could be kept available without setup overhead. we also implemented a system for migrating dormant pages from our fast swapping storage to moving head disk. This preserves the use of this limited resource for the currently active jobs. We have implemented a form of "soft" CPU allocation control in the monitor, assisted by a program which adjusts user percentages for the scheduler based on the dynamic loading of the system. The allocation control structure works based on the scheduler queue system and takes account of the a priori allocation of CPU time and that actually consumed. Our TENEX uses a hierarchy of five queues for jobs ranging from highly interactive jobs requiring only small amounts of CPU time between waits to more CPU intensive jobs which can run for long periods without user interaction. These interactive queues (text editting, ete.) are scheduled at highest priority without consideration of allocation percentages. If nothing is runnable from the high priority queues, the CPpU-bound queues are scanned and jobs are selected for running Sased on how much of their allocated time has been received during a given allocation cycle time (currently 100 seconds). If no such jobs are runnable, then those that have received their allocation of CPU time already are scheduled based on how much they are over (2) It should be noted that DEC has recently adopted a form of TENEX (TOPS-— 20) as their choice for future system marketing. They have made improvements in a number of areas of the monitor and subsystem software but have also shown an increasing tendency to make changes to the TOPS-20 system that impair compatibility with older TENEX systems. The long-term impact of this trend toward incompatibilities with the coming DeC "standard" is discussed in more detail on page 62. J. Lederberg 13 Privileged Communication DETAILED PROGRESS REPORT Section 1.3.2.3 allocation and how long they have waited to be run again. This system is not a reservation system in that it does not guarantee a given user some percentage of the system. It allocates cycles preferentially, trading off a priori allocations with actual demand but does not waste cycles. This allocation control system is Still in an experimental state and we are attempting to evolve the "best" policies with the AIM Executive Committee for dividing the system fairly and effectively among the various communities of users. During the spring of 1976 we implemented a dual processor version of TENEX as the most cost-effective way to increase our processing capacity. In order to upgrade to the new KL-"n" technology, we would have had to replace most of the equipment that had been purchased initially. For the cost of an additional processor and 8 man-months of intensive software development we were able to increase our CPU capacity by 75%. We have an additional 40% equivalent of a KI- 10 processor which can be made available by increasing memory to reduce our swapping contention. The dual processor system that has evolved is running quite reliably. It treats the two machines in an almost symnetric manner. The only difference is that one of the machines has all of the I/O equipment attached to it. They both schedule jobs independently and share the rest of the non-I/0- device monitor code. The areas of the monitor involving the management of resources and jobs which cannot be manipulated by both machines simultaneously are protected by a system of locks. We have made some measurements indicating that overhead for lock waits is less than 10%. The overall increase in capacity provided by the processor upgrade is illustrated in Figure 3 on page 17 which measures key loading parameters in the periods before and after tne dual processor installation. Observing the delivery of DEC’s high-performance KL- TENEX systems8 over the past 6 months, it seems clear that for the investment, we made the best choice for the community by implementing the dual processor upgrade. We hope to augment the memory soon to finisn exploiting the capacity this extra machine provides and to remove some non-linearities remaining in system swapping performance. Now that the dual processor system has stabilized, we are undertaking another assessment of system performance to be sure we have removed residual and correctable inefficiencies. This study is on-going now. Finally, over the past year we made several substantial improvements in the "GTJFN" monitor call which interactively acquires handles on file names specified by the user. These extensions allow for more general "wild card" specifications and interactive help in deciding between and searching for existing file name alternatives. They also give the user much more flexibility in designating groups of files and therefore in structuring his data. With a working dual processor systen, the current implementation of allocation controls in our system, the diverging path of tne DEC TOPS-20 system, the termination of active BBN TENEX development, and the unique complications of the KI-10 paging system, we have not made any concerted effort to upgrade our TENEX system to the latest BBN release (1.34). The advantages of such an upgrade are not overwhelming in face of the complicated conversion (XI paging, dual processor, special swapping device handler, TYMNET service routines, local JSYS’s, ete.) and resulting system unreliability for some period. Privileged Communication 19 Jd. Lederberg Section 1.3.2.3 DETAILED PROGRESS REPORT Anotner area of software development is in the EXECutive program which is the basic user interface to manipulate files, directories, and devices; control joo and terminal parameter settings; observe job and system status; and execute public and private programs. This work improves system accommodation to users and provides more convenient and useful information about system and job status. Through such features as login default files, directed file search path commands, mail notification, help facilities, better file archival and retrieval commands, and flexible status information, we have tried to make it easier for users to work on the SUMEX-AIM machine. 1.3.2.4 NETWORK COMMUNICATION FACILITIES A highly important aspect of the SUMEX system is effective communication with remote users. In addition to the economic arguments for terminal access, networking offers other advantages for shared computing such as uniform user access to multiple machines and special purpose resources, convenient file transfers for software sharing and multiple machine use, more effective backup, co-processing between remote machines, and improved inter-user communications. Over the past year we have been substantially aided in exporting the MAINSAIL system through our network connections. Because of the developmental nature of the language at present, it is important that we have close interactions with the user community and that we be able to effectively perform bug fixes and upgrades. Since MAINSAIL by its nature involves operations on a variety of machines and Since our access to example systems cannot be entirely local, the network connections to Rutgers, the Stanford AI Lab, and Stanford Research Institute have been invaluable. It would be considerably more difficult to export MAINSAIL and communicate with users via tapes and mail. We have based our remote communication services on two networks — TYMNET and ARPANET. These were the only networks existing at the start of the project which allowed foreign host access. Since then, other commercial network systems (notably TELENET) have come into existence and are growing in coverage and services. The two networks to which we are currently connected complement each other; the TYMNET providing primarily terminal service with very broad geographical coverage and unrestricted user access, and the ARPANET having more limited access but providing a broader range of communication services. Togetner, these networks give a good view of the current strengths and weaknesses of this approach. Users asked to accept a remote computer as if it were next door will use a local telephone call to the computer as a standard of comparison. Current network terminal facilities do not quite accomplish the illusion of a local eall. Data loss is not a problem in network communications - in fact with the more extensive error checking schemes, data integrity is much higher than for a long distance phone link. On the other hand, networking relies upon shared community use of telephone lines to procure widespread geographical coverage at Substantially reduced cost. However, unless enough total line capacity is provided to meet peak loads, substantial queueing and traffic jans result in the loss of terminal responsiveness. J. Lederberg 20 Privileged Communication DETAILED PROGRESS REPORT Section 1.3.2.4 TYMNET: Networks such as TYMNET are a complex interconnection of nodes and lines spanning the country (see Figure 4 on page 24). The primary cause of delay in passing a message through the network is the time to transfer a message from node to node and the scheduling of this traffic over multiplexed lines. This latter effect only becomes important in heavily loaded situations; the former is always present. Clearly from the user viewpoint, the best situation is to have as few nodes as possible between him and the host ~ this means many interconnecting lines through the network and correspondingly higher costs for the network manager. TENEX in some ways emphasizes this conflict more than other time-— Ssnaring systems because of the highly interactive nature of terminal handling (e.g., command and file name recognition and non-printing program commands as in text editors or INTERLISP). In such instances, individual characters must be seen by the host machine to determine the proper echo response in contrast to other systems where only "line at a time™ commands are allowed. We have connected SUMEX to the TYMNET in two places as shown in Figure 4 so as to allow more direct access from different parts of the country. Based on delay time statistics collected during the previous year from our TYMSTAT program, the response times are scarcely acceptable. When delay times exceed 200-300 milliseconds, the character printing lag problems become noticable with a full duplex, 30 char/sec terminal. In the past these times have been particularly bad in New York with peak delays approaching 3 seconds one way! Other nodes have shown uniformly high readings as well. These data were reflected in the subjective, but strongly articulated, comments of many of our user groups. We have had numerous meetings with TYMNET personnel to try to ease these problems and have instituted reroutings of the lines connecting SUMEX-AIM to the network. Also local lines to more strategic terminal nodes have been considered for users in areas poorly served by the existing line layout. TYMNET has also made some upgrades in the internal connectivity and speeds with which data is switched within their node clusters. These changes seem to have had some beneficial effects in that delay. times have improved and user complaints have subsided. We will continue to pursue improvements in TYMNET response but user terminal interactions such as used in TENEX programs are not realized in the time-sharing systems offered by most other TYMNET users and hence are not supported well by TYMNET. TYMNET has implemented 1200 baud service in 7 major cities over the past year. Unfortunately many of our users are not in these cities so we have only limited experience with the 1200 baud support. ARPANET: The ARPANET, while designed for aore general information transfer than purely terminal nandling, has similar bottleneck problems in its topology (see the current geographical and logical maps of the ARPANET in Figure 5 and Figure 6 on page 25). These are reduced by the use of relatively higher speed interconnection lines (50 K baud instead of 2400 - 9500 baud lines as in TYMNET) but response delays through many nodes become objectionable eventually as well. Privileged Communication 21 J. Lederberg section 1.3.2.4 DETAILED PROGRESS REPORT Consistent with the agreements with ARPA when we were granted network access initially, we are enforcing a policy to restrict the use of the ARPANET to users who have affiliations with ARPA-supported contractors and system/software interchange with cooperating TENEX sites. The administration of the network passed from the ARPA Information Processing Techniques Office to the Defense Communications Agency as of July 1975. At that time policies were announced restricting access to DoD-affiliated users. We have restricted the facilities for calling from SUMEX out to other sites on the ARPANET to authorized users. This also protects the SUMEX-AIM machine from acting as an expensive terminal handler for other machines - this function is better fulfilled by dedicated terminal handling machines (TIPS). In general, we have developed excellent working relationships with other sites on the ARPANET for system backup and software interchange ~ such day-to-day working interactions with remote facilities would not be possible without the integrated file transfer, communication, and terminal handling capabilities unique to the ARPANET. We take very seriously the responsibility to provide effective communication capabilities to SUMEX-AIM users and are continuously looking for ways to improve our existing facilities as well as investigate alternatives becoming available. We have done preliminary investigations of the TELENET facilities that have been rapidly expanding this past year. BB&N has hooked one of their TENEX systems up to TELENET and whereas we did not have the same quantitative tools we have for measuring response on the TYMNET, we observed TELENET delays at least as long as those encountered on TYMNET. We did the reverse experiment by using long distance telephone to connect from the TELENET node in Washington, D.C. to the SUMEX macnine in California and observed the same sort of delays reaching several seconds per character. The TELENET has many attractive feature in terms of a symmetry analogous to that of the ARPANET for terminal traffic and file transfers and being commercial would not have the access restrictions of the ARPANET. However, until the network throughput improves we would not get substantial benefits from connecting to it. J. Lederberg ™N Nh Privileged Communication Section 1.3.2.4 DETAILED PROGRESS REPORT dey YIOMI0N LANWAL ‘y 2an3TZ 1 happened [LL -Bt 98, cone [ ANi “ONT ‘LENWAL om fF <. vi LaNnAl | @¥W SOON L3ENWAL wt OS A gesO~ Oo a=. sent ra FB [OT a . Ww dvs © famines | (Ne : aN wp, +y 47 J. Lederberg 23 7 ‘ ‘ eo foiuje \ allt 887s, ese, want, ° eee \ye nya, nea, = Pomel Aa ee / ee * Ne mya’ @ my Sk A ee Ee) Oe) € & SOG Oa sO “YY "6 ©—— — . tee ae a a i q i q r Toa ge ee Te Te Ta ee 6 ‘ 5 £ ! Privileged Communication DETAILED PROGRESS REPORT Section 1.3.2.4 S3NVN LSOH (ANYVSS393N) LON ‘SJNVN dW! 34¥ NMOHS S3SANVN (SNOILOJNNOOD 3LITIV3LVS TWAINSWIN3dX3 S¥du¥ MOHS LON S300 dvVW SIHL :-3L0N) dwWISsngiunid V dit O di O LINDMID SLITISLVS we NOQNO7 0 NOSVLN3d uYSHON [ od Z 2261 WHdV ‘dVW DIHdVY9OS9 LINVdyV *¢ aan3sty HiVAVH SISSAV OO) 113430N ion t ica leged Commun ivi Pr 24 J. Lederberg Section 1.3.2.4 DETAILED PROGRESS REPORT S3NVN LSOH (ANIHVSS394N) LON 'S3WUN dWI 3YV NMOHS SANVN (ADVYENDOV SLI YO4s IGVW 3A NVO WIV19 ON '3TEVNIVLE8O NOLLYANYOSNI 4838 3Hi OL ONIGNOOV AYOMLAN FHL 4O NOILV 1NdOd 1SOH 3HL SMOHS dGVW SIKL JTIHM LVHL SLON 3SV37d) 0092909 LINDYID 34173 LVS ww odin O SoLsog: oossa 0099909 AdWISngiunid FY dato o1b 191 il-dadA ti-ddd NOSVLN3d NI793 -YBLNNO SUX3L M40 22 ISI O80t 949 oO 6: —_ = i} O a G61/O9E — NI193 Oi-dgal2S1S1 Ot-d0d [a0 | il-ddad | sda dOx Ol-d0d d 9x 00zb-8 UdOd Ol-dad Ov02-330 Ol-dOd [si-d0d] [1 - dod ONVY BGI-OLE \t-ddd 2t-d¥ Sd4 tt-ddd at Ol-ddd go ngignt {I-dd 14N veda Sngiunid d yen 0v/09¢ Ol-ddd it-dQd nN {1-dQd Ob/09¢ 16/09 YVSHON Trade vas osn ll-dGd yion Td N330u38V DIT PA WLLOse \(- ddd Bis Sn lt-dda} > MIOAT38 [econ ZL-_< { 9900 Y9S 8SOLLDVAINN tl-ddd 0026909 AlL-ddd BOL-DVAINN Ol -ddd = \t-ddd MAN : YVANVH DMN or -aaa] Ol-d0d G6l/OzZE ded Lv=SdS Erman C O a 0099909 ll-ddd 3YVHSWAL XAWNS Geoanvis 0094909 4 ¥ ZOXVWDuVd foi-aad| l1-ddd loc nee a 008-VAON Ob Naal TINY lt-dQd Oi-ddd tid] 0099909 SET] OS-05d avn oy (0801-534) | 3Sa1193 = N-d0d “ddd [Ol-ddd | Jou oraaa) [O.-add | [i- 304] cue Brean cia = Mf Siraad a Tor Ponaaa bb LI 21uS ao SISIAY[ myy Le-Sd$ (08/89H] Deion [rad] li-ddd | Midd os Ol-dad je 0601-930 za/o9¢| Und iso9u vo3 9 IW) a — dL ] ag gavdM mm HWLN ie 443440N Oi-dad] | Ol-dad [oi-dog] %4 SION vi 189 50dS303 Ol-dOd (snaiunia | fosoe-oa0] T t1-daa| Baa tl-ddd tl-da d | 0092909 Ol-ddd 2261 HOMVW APVW 1V9IS07 LSNVdYv "9g dANSTy J, Lederberg 25 ication leged Communi ° ivi Pr DETAILED PROGRESS REPORT Section 1.3.2.5 1.3.2.5 SYSTEM RELIABILITY AND BACKUP System reliability has remained high over the past years; excellent under stable hardware and software conditions and degrading temporarily during debugging and development periods and during periods of difficult hardware problems. In general we take the system down for approximately 50 hours per month for scheduled hardware maintenance, file backup, and other maintenance. In addition we average from 10 to 15 hours per month in unscheduled downtime. During particularly difficult hardware or software difficulties we must absorb substantially more downtime. 1.3.2.6 PROGRAMMING LANGUAGES Over the past years we or members of the SUMEX-AIM community have continued to maintain the major languages on the system at current release levels, have TENEXized several languages to improve efficiency, and have investigated a number of issues related to the efficiency of programs written in various LISP implementations and the exportability of prozrams. These issues are becoming increasingly critical in dealing with AI performance programs which have reached a level of maturity so that substantial, non-developmental user communities are growing. The following summarizes general accomplishments and the following section discusses in detail the work this past year in designing a machine- independent ALGOL-like system (MAINSAIL). LISP Efficiency: There has been an on-going debate among a number of projects over the best language to choose for developmental implementation of the various AI programs. The key issues include ease and flexibility of conceptual representation of program functions and objects, interactive debugging support, efficiency, and exportability. To date the predominant language choice for AIM research has been LISP and more particularly INTERLISP. These issues are important because they influence the time required to develop new AI programs and subsequently the incremental load placed on the SUMEX machine when in use. We recently attempted an evaluation of INTERLISP and ILISP ineluding the relative efficiencies of the two languages and the level of assistance the language systens provide the user in developing programs. The tests were based on an implementation of a subset of REDUCE (a symbolic algebra manipulator). The results of several iterations in program refinement by experts in the respective languages were that the runtimes for the two versions were quite comparable (far less than the factor of 5-10 disparity predicted by ILISP enthusiasts). A more disquieting result was the substantial difference in runtimes depending on how particular functions were coded IN THE SAME LANGUAGE. It is apparent from the results that factors of 10 differences in time can result from a superficial implementation - expert programming insight is essential to efficient program performance. This is not a real surprise in that it is true of programming in any language — the problems may be inereased by such a rich language as INTERLISP with such a wide array of Privileged Communication 27 J. Lederberg Section 1.3.2.6 DETAILED PROGRESS REPORT ways to do the same thing but with little guidance as to the relative costs. It nas proven very difficult to quantify the "rules" for good programming. Mr. Masinter and Mr. Phil Jackson attempted to document good INTERLISP programming habits and issued a bulletin for SUMEX users. A further impact of these data is that it is very difficult to Simultaneously develop a new AI program and make the implementation highly efficient. With the iterations required to develop the conceptual design of the program, it is difficult to ensure its efficiency. This may lead to the need to reimplement the program after the basic development stabilizes to increase efficiency while still accommodating convenient and orderly further development. such reimplementation may or may not be best done in LISP - this. will depend on many factors including the nature of the program data structure requirements and anticipated further development efforts. MAINSAIL Progress SUMEX, in its role as a nationally shared computer resource, is an appropriate vehicle for the development of software unbound by the underlying machine environment. We have a built-in community of program developers acutely aware of the significance of providing their work to a broader base of users. This intersection of hardware capability, software expertise, and dedication to resource sharing presents a unique opportunity to promote a system designed for program sharing. The MAINSAIL (3) project has three closely related goals: 1) Provide an integrated set of tools for the creation of efficient portable software on a variety of computer systems, and provide support and continued development of these tools in a form compatible across all implementations. 2) Study innovative approaches to portability, both hardware and software, and develop such approaches into effective tools. 3) Promote the development and distribution of portable software, advise and assist in its design, and evaluate its applicability. By portable software we mean computer programs which may be executed on a variety of machines with few, if any, alterations. MAINSAIL itself will provide the initial example of portable software, since all of the system is written in the MAINSAIL language except for those parts which are determined by the host environment (hardware, instruction set, operating system, etc.). Even these parts are embedded within MAINSATL. oe ek a tn me em A Ge Sem A A te Se Pe Sm DS nh Om A mnt muh er me tm eee ee em ce mek SA ce ee oe ee ee ene ee cee ee oe ee (3) The MAINSAIL (MAchine-INdependent SAIL) language is derived from SAIL, a programming language developed at Stanford University’s Artificial Intelligence Laboratory. It is not compatible with SAIL, since SAIL was designed for a PDP-10 with TOPS-10, and hence contains machine-~dependencies. However it has retained the basic attributes of SAIL as an extended ALGOL-like language. A summary of some of the features of the MAINSATL Language and their relationship to other languages is given in Appendix III on page 231 (see Book IT). J. Lederberg 28 Privileged Communication DETAILED PROGRESS REPORT Section 1.3.2.6 There is a key distinction between MAINSAIL’s approach to portability and the "classical" approach characterized by languages such as FORTRAN, ALGOL, LISP, COBOL and BASIC. These languages attempt to adnere to a single syntax standard which is separately implemented for each different computer system. Invariably these implementations have differences which preclude the creation of a program which is accepted by all. It is difficult, if not impossible, to define a language standard which is unambiguous and at the same time sufficiently comprehensible to provide the basis for compatible implementations. Furthermore, many implementors yield to the temptation to provide "enhancements" to the standard which immediately introduces machine and system dependencies. MAINSAIL, on the other hand, provides a single system (written primarily in itself) which is employed at every site. This is made possible by its ability to compile itself into code for a variety of machines. Only the compiler’s code generators and the runtime operating-system interfaces need be rewritten for each implementation. These parts of MAINSAIL are at a level which has already been defined by the machine-independent parts, and do not affect the language from the user’s viewpoint. Thus the “language standard" has been reduced to a "semantic standard" which is surrounded by machine-independent software. It remains to be seen whether the temptation to augment the language with machine-dependencies (for purposes of ultimate efficiency or to take advantage of particular local system features) can be overcome. Herein also lies the biggest "price" to be paid for exportability. The code emitted from the MAINSAIL compiler can be (and is, based on tests to date) at least as efficient as that from many machine-dependent compilers. On the other nand, special machine or operating system features that cannot be uniformly implemented may provide local optimizations at the cost of exportability or vice versa. We cannot effectively measure the extent of this cost at this stage. DEVELOPMENT APPROACH We do not underestimate the difficulty in obtaining the cooperation of a community which will span a wide variety of applications and hardware/software systems. If MAINSAIL is to obtain widespread use, it is crucial that it have an effective and credible base of support. The initial parts of MAINSAIL are just about ready for limited distribution. We want to maintain close supervision of this distribution, and insure that systems labelled as MAINSAIL are not altered witnout our approval. In this regard we are pursuing legal channels to safeguard tne integrity of MAINSAIL software. We plan to take MAINSAIL through an orderly progression of development, and to avoid casual distribution with no provision for a solid base of maintenance and future growth. REVIEW OF PROGRESS TO DATE MAINSAIL has been under development for almost three years now. Beginning with an initial goal of converting the PDP-10 SAIL compiler to generate code for a PDP-11, several versions had been implemented on a PDP-10 and a PDP-11, and the groundwork had been laid for extending the system to a wider variety of machines. The current version was begun in August of 19756. Privileged Communication 29 J. Lederberg Section 1.3.2.6 DETAILED PROGRESS REPORT Early versions of MAINSAIL attempted to maintain close compatibility with the original SAIL, but in surveying a wider variety of machines (especially mini- computers), we concluded that this compatibility could be maintained only at the expense of portability. It was felt that MAINSAIL could contribute more by providing a truly portable system. Thus we began redesigning MAINSAIL, rebuilding from previous implementations. This effort has resulted in a new version which is still under development, and is now being tested on several systems. Initial implementations of the current design are for DEC PDP-10’s with the TENEX operating system and with the TOPS-10 operating system. The TENEX version is being tested at SUMEX and has been installed at one other TENEX site (Stanford - IMSSS). The TOPS-10 version was developed at SUMEX by using TENEX facilities which provide compatibility with TOPS-10. The Rutgers University PDP-10 facility was chosen for external testing since it is a standard TOPS-10 system, and can be accessed from SUMEX over a network. MAINSAIL is now undergoing preliminary testing there. A modified TOPS-10 version nas been set up on the Stanford AI- lab’s PDP-10, but also has not been open to general use. Little additional work will be necessary to make the TENEX version execute on a DECSYSTEM-20 since TOPS-20 is derived from TENEX. However, some time will be needed to take full advantage of the extended instruction set of the KL-10. Two sites are available for TOPS-20 developnent: the LOTS facility at Stanford; and a machine at SRI, close to Stanford an¢ accessible over a network. Both of tnese sites have expressed an interest in using NMAINSAIL. The PDP~11 has been chosen as the first mini-conputer to be implemented. Code generators have been written for it but not debugged. Several variants of these code generators will be necessary to cover the full PDP-11 family. MAINSAIL interfaces to three PDP-11 operating systems (RT-11, RSX-11 and UNIX) are now under development. All of these operating systems are available to the MAINSAIL project on PDP-11°s at Stanford. RT-11 will be the first to be implemented. The mix of instruction sets, operating systems and configurations will be a good test of MAINSAIL’s ability to provide a compatible implementation, even across this one family of computers. we expect the PDP-11 systems to be operational by this summer. 1.3.2.7 STANFORD AT HANDBOOK PROJECT The AI Handbook is a compendium of short articles (3-5 pages each) about the projects, ideas, problems and techniques that make up the field of Artificial Intelligence. Over 150 articles have been drafted by researchers and students in the field, on topics ranging in depth from "Ausmented Transaction Networks" (ATN’s) to "An Overview of Natural Language Research", and covering the entire breadth of AI research: search, robotics, soeech understanding, real-world applications, ete. An outline of the current contents of the handbook is given in Appendix II on page 225 (see Book II). J. Lederberg 30 Privileged Communication DETAILED PROGRESS R#PORT Section 1.3.2.7 During the Spring of 1976 tne final push for drafting new articles was completed, with some 60 articles produced by students during that quarter. Since then tne process has begun of rewriting the various chapters of the Handbook to produce coherent manuscripts from the original work of five to ten authors. This effort involves rewriting articles for accuracy and completeness as well as integrating the 15 to 25 articles in a section into an editorially uniform and readable document. An editor has been added to the project team who will be responsible for maintaining a consistent format and style in the Handbook. When completed, each chapter will be reviewed by experts in the appropriate research area before it is released to the public. At present, the chapter on Natural Language research is completed and being reviewed, and we expect that the sections on Search, Speech Understanding, Representation of Knowledge, and Automatic Programming will be completed during the next two months. During the Fall of 1977 the first seven chapters of the handbook will be published in preliminary form. Meanwhile, the handbook is already available to cooperative experts and critics on-line via the SUMEX-AIM network connections. We are considering maintaining the handbook on-line, with occasional hard-copy editions, and believe this method of "publication" may be a prototype for other encyclopedic monographs. 1.3.2.8 USER SOFTWARE AND INTRA-~COMMUNITY COMMUNICATION In addition to the system and language software development efforts of SUMEX, we have assembled or developed where necessary a broad range of utilities and user software. These include operational aids, statistics packages, DEC- Supplied programs, improvements to the TOPS-10 emulator, text editors, text search programs, file space management programs, graphics support, a batch program execution monitor, text formatting and justification assistance, and magnetic tape conversion aids. We have also developed a number of user information assistance programs such as a "WHOIS" facility to recover names and affiliations of users and a "HELP" facility to locate on-line documentation of interest through key word searches. Of major importance for our community effort is the set of tools for inter- user communications. We have enhanced the message sending and manipulation programs to better integrate text editting facilities for easier message preparation and reading. We have also developed a unique "bulletin board" system to deal with informal notes, thereby bridging a functional zap between formal system documents and private messages communications between individual users. The bulletin board system provides an informal and dynamic base for information about system facilities, lore, bugs, etc. or can provide a means for intra- project communication and coordination. The system has been in operation for more than one year and has been exported to IMSSS (Stanford’s other TENEX site) and USC-ECL. We have also proposed that the next generation of ARPANET information services provide for bulletin board-like facilities. At SUMEX-AIM there are 10 bulletin boards, 8 of which are project-specific. The main system bulletin board currently contains more than 140 bulletins under 85 topics covering system status announcements, Privileged Communication 31 J. Lederberg Section 1.3.2.8 DETAILED PROGRESS REPORT explanations of recent crasnes, hardware troubles and monitor upgrades, new developments, bugs, and little-documented features of our programming languages and utilities. Project bulletin boards have been used for notices and minutes of meetings, references to and abstracts of papers, coordination of on-going developments, vacation schedules, documentation and announcements of various kinds. Current Bulletin Board features include: Multiple bulletin boards (public, private, general, specific, etec.). Topics and subtopics (separated by periods) may be nested to any depth. Expire dates for each bulletin, after which they are removed automatically. Interest-list-of-topices for each user allows him to be notified about new bulletins he is interested in and to ignore others. Users notified when new bulletins arrive, by running BBCHECK (the bulletin- board MAIL CHECK) or by mail. Help and browsing facilitated in a variety of ways (? can be typed anywhere, general and command-specific help provided). Command structure modelled after the TENESX EXEC, with conscious attention to human-engineering. Companion program BBREAD is a bulletin-board R&ADMATL. Companion program BBNEWS types out a directory listing of any new bulletins. 1.3.2.9 DOCUMENTATION AND EDJCATION We have spent considerable effort to develop, maintain, and facilitate access to our documentation so as to accurately reflect available software. The HELP and Bulletin Board systems have been important in this effort. We have limited manpower for user assistance. In general, users are responsible for their own software development and maintenance. The SUMEX staff, however, (including Lederberg and Rindfleisch) share the responsibilities for system level assistance to users, tracking down bugs, reviewing user suggestions, ete. The terminal linking facilities of TENEX have been valuable tools to assist remote user groups and also for system users to communicate with each other. With the recent initial release of the MAINSATL system on selected machines, we are becoming increasingly involved in describing MAINSAIL and advising user projects in its possible applications. 1.3.2.10 SOFTWARE COMPATIBILITY AND SHARING At SUMEX-AIM we firmly believe in importing rather than reinventing software where possible. At SUMEX many avenues exist for sharing between the system staff, various user projects, other facilities, and vendors. In the past J. Lederberg 32 Privileged Communication DETAILED PROGRESS REPORT section 1.3.2.10 without communication networks, the system vendor served as the focal point for distribution of most software to user sites. Since the process of distributing tapes (and particularly of handling bug reports and user suggestions) was very slow, it was common for sites to take a version of a program and then modify and maintain it locally. This caused a proliferation of home-grown versions of software. Similar impediments have existed to the dissemination of user software. User organizations like SHARE and DECUS have helped to overcome these problems but communication is still cumbersome. The advent of fast and convenient communication facilities coupling communities of computer facilities has the potential of making a major difference in facilitating inter-group cooperation and to lower these barriers. The TENEX sites on the ARPANET have been interacting increasingly with each other to develop new software systems. This functions effectively to build communication around the network and promote a functional division of labor and expertise. The other major advantage is that as a by-product of the constant communication about particular software, personal connections between staff members of the various sites develop. These connections serve to pass general information about software tools and to encourage the exchange of ideas among the sites. Certain common problems are now regularly discussed on a multi-site level. We continue to draw significant amounts of system software from other ARPANET sites, reciprocating with our own local developments. Interactions have included mutual backup support, hardware configuration experiments, operating system enhancements, utility or language software, and user project collaborations. We have been able to import many new pieces of software and improvements to existing ones in this way. Examples of imported software include the message manipulation program MSG, TENEX SATL, TENEX SOS, INTERLISP, the RECORD program, ARPANET host tables, and many others. Reciprocally, we have exported our contributions such as the drum page migration system, KI-10 page table efficiency improvements, GIJ®N enhancements, PUB macro files, the bulletin board system, SNDMSG enhancements, our BATCH monitor, etc. The most recent example of this cooperative use of networks is in the preliminary export of MAINSAIL. 1.3.2.91 RESOURCE MANAGEMENT PHILOSOPHY OF MANAGEMENT The tidiest way to administer a national resource would be by subcontract to a fee-compensated, neutral agent. Tnis would still have to involve a soverning body that could speak to the technical and quality-control interests of the served constituency. Appropriate in some circumstances, this model would separate the administration of a resource from active research and development. An approach expected to foster greater creativity is to couple the resource with an active user-center. This of course can lead to manifest conflicts of interest that must be addressed and avoided if the resource is to be fairly available ona regional or national basis. As indicated in the introduction, our proposal for the latter approach was followed by searching negotiations over a management plan that would be sensitive to these considerations. The bureaucratic procedures, much as they have to be Privileged Communication 33 J. Lederberg Section 1.3.2.11 DETAILED PROGRESS REPORT spelled out, are almost the last items that need to be specified for such a plan. Far more important is a charter that spells out the underlying objectives and responsibilities of the program, and which establishes incentives, resources, and obligations for proper performance. We believe the plan that was negotiated and implemented has all of these ingredients, and has made the design of the procedural framework a matter of simple common-sense logic from these premises. It will be plain that the convergence of local self-interest, and peer and contractual responsibility offers the best assurance that the programmatic goals will be respected, and simplifies the tasks of surveillance and accountability. The self-interest part of this equation stems from our original motivation in requesting the resource: the need for specialized computing facilities to Support intense, interdisciplinary studies in applications of AI at Stanford University Medical School. Comprising several departments (Genetics, Medicine, Computer Science and Chemistry), and interwoven projects (e.g., DENDRAL, Heuristic Programming, MYCIN, MOLGEN) and principal faculty (Professors Lederberg, Feigenbaum, Djerassi, Cohen, and Buchanan), a substantial body of research that has progressed and evolved over many years would be sacrificed if such a resource were not available. Successful, stable collaborations of this scope are not readily found. This history both depends upon and contributes to tne doctrine of resource-sharing that underlies the SUMEX-AIM effort. One premise of the management plan was therefore the charter allocation of half the user-available capacity of the SUMEX facility to the Stanford complex of projects, subject to a local committee chaired by Professor Lederberg. The acceptance of this principle clearly defines the local benefit of the resource, minimizes anxiety and conflict-of-interest, and en suite enables the local group to respond quite objectively to the allocations that are made by an Executive Committee for the "national" or non-Stanford aliquot (see "Executive and Advisory Committee Organization" below). Another important contribution to the success of the plan is the welcome participation of an NIH-BRP representative on the Executive Committee. What would be inappropriate meddling, in the conduct of a narrower research project funded by NIH, is a communication channel and source of detached judgment that has been invaluable in expediting the innumerable decisions about which NIH must and should be consulted in the week- to-week business of the resource. The efficacy of this principle, as is appropriate to acknowledge here, has been validated and enhanced by the style and energy tnat Dr. William Baker has brought to this task. That the "national" community should se conscientiously cultivated for the most efficacious use of its aliquot, and that further growth of facilities should in due course be distributed, are further inferences from the charter principles. Finally, the recognition in the charter that SUMEX-AIM was not merely a retail-~store for computer cycles, but the means of building a community, was a necessary basis for the morale of the whole operation. Some of these matters were addressed further in the section on SIGNIFICANCE (see Section 1.2 on page 4). The remainder of this section will now speak to the way in which these responsibilities are handled bureaucratically. J. Lederberg 34 Privileged Communication DETAILED PROGRESS REPORT Section 1.3.2.11 ORGANIZATION AND PROCEDURES The SUMEX-AIM resource is administered within the Genetics Department of the Stanford University Medical School, Professor Lederberg’s "main office", though he also holds appointments in the Computer Science Dept. and the Human Biology program. Its mission, locally and nationally, entails both the recruitment of appropriate research projects interested in medical ATI applications and the catalysis of interactions among these groups and the broader medical community. User projects are separately funded and autonomous in their management. They are selected for access to SUMEX on the basis of their scientific and medical merits as well as their commitment to the community goals of SUMEX. Currently active projects span a broad range of application areas such as clinical diagnostic consultation, molecular biochemistry, belief systems modeling, mental function modeling, and instrument data interpretation (see Section 6 on page 41 in Book II). We have pondered the possibilities of a fee. for-service approach to allocation of the resource. We believe that this would be inappropriate for an experimental system of such national scope, whose pricing structure would have to be revised almost on a week-to-weekx basis to fairly respond to evolutionary changes in the system. This would also pose problems of accountability for the transfer of funds from one institution to anotner. Our present policy of non-monetary allocation control, which we propose to continue for the next term, of course accentuates our responsibility for the careful selection of projects with high scientific and community merit. EXECUTIVE AND ADVISORY COMMITTEE ORGANIZATION As the SUMEX-AIM project is a multilateral undertaking by its very nature, we have created several management committees to assist in administering the various portions of the SUMEX resource. As defined in the SUMEX-AIM management plan adopted at the time the initial resource grant was awarded, the available facility capacity is allocated 40% to Stanford Medical School projects, 40% to national projects, and 20% to common system development and related functions. Within the Stanford aliquot, Dr. Lederberg has established an advisory committee to assist him in selecting and allocating resources among projects appropriate to the SUMEX mission. The current membership of this committee is listed in Appendix V (see Book II). For the national community, two committees serve complementary functions. An Executive Committee oversees the operations of the resource as related to national users and makes the final decisions on authorizing admission for projects. It also establishes policies for resource allocation and approves plans for resource development and augmentation within the national portion of SUMEX (¢.2., hardware upgrades, MAINSAIL development priorities, ete.). The Executive Committee oversees the planning and implementation of the AIM Workshop series currently implemented under Prof. 5S. Amarel of Rutgers University and assures coordination with other AIM activities as well. Tne committee will play a key role in assessing the possible need for additional future AIM community computing resources and in deciding the optimal placement and management of such facilities. The current membership of the Executive committee is listed in Appendix V (see Book II). Privileged Communication 35 J. Lederberg Section 1.3.2.11 DETAILED PROGRESS REPORT Reporting to the Executive Committee, an Advisory Group represents the interests of medical and computer science research relevant to AIM goals. The Advisory Group serves several functions in advising the Executive Committee; 1) recruiting appropriate medical/computer science projects, 2) reviewing and recommending priorities for allocation of resource capacity to specific projects based on scientific quality and medical relevance, and 3) recommending policies and development goals for the resource. The current Advisory Group membership is given in Appendix V (see Book II). These committees have actively functioned in support of the resource. Except for the meetings held during the AIM workshops, the committees have met by telephone conference owing to the size of the groups and to save the time and expense of personal travel to meet face to face. These telephone meetings, in conjunction with terminal access to related text materials, have served quite well in accomplishing the agenda business and facilitate greatly the arrangement of meetings. Other solicitations of advice requiring review of sizable written proposals are done by mail. We will continue to work with the management committees to recruit the additional high quality projects which can be accommodated and to evolve resource allocation policies which appropriately reflect assigned priorities and project needs. We hope to make more generally available information about the various projects both inside and outside of the community and thereby to promote the kinds of exchanges exemplified earlier and made possible by network facilities. NEW PROJECT RECRUITING The SUMEX-~AIM resource has been announced through a variety of media as well as by correspondence, contacts of NIH-BRP with a variety of prospective grantees who use computers, and contacts by our own staff and committee members, The number of formal projects that have been admitted to SUMEX has more than doubled since the start of the project; others are working tentatively as pilot projects or are under review. We have prepared a variety of materials for the new user ranging from general information such as is contained in a brochure (see Appendix VI in Book II) to more detailed information and guidelines for determining whether a user project is appropriate for the SUMEX-AIM resource. Dr. E. Levinthal has prepared a questionnaire to assist users seriously considering applying for access to SUMEX-AIM (see Appendix VII in Book II). Pilot project categories have been established both within the Stanford and national aliquots of the facility capacity to assist and encourage projects just formulating possible AIM proposals pending their application for funding support and in parallel formal application for access to SUMEX. Pilot projects are approved for access for limited periods of time after preliminary review by the Stanford or AIM Advisory Group as appropriate to the origin of the project. These contacts have sometimes done much more than provide support for already-formulated programs. For example, Prof. Feigenbaum’s group at Stanford has initiated a major collaborative effort with Dr. Osborn’s group at the Institutes of Medical Sciences in San Francisco. This project in "Pulmonary Function Monitoring and Ventilator Management - PUFF/VM" (see Section 6.4.6 on J. Lederberg 36 Privileged Communication DETAILED PROGRESS REPORT Section 1.3.2.11 page 197 in Book II) originated as a pilot request to use MLAB in a small way for modeling. Subsequently the AL potentialities of this domain were recognized by Feigenbaum, Nii, and Osborn who have submitted a joint proposal to NIH and have a pilot status at present. The following lists the fully authorized projects currently comprising the SUMEX-AIM community (see Section 6 in Book II for more detailed descriptions). The nucleus of five projects that were authorized at the initial funding of the resource in December 1973 are marked by "<*>". National - 1) Acquisition of Cognitive Procedures (ACT); Dr. J. Anderson (Yale University) <*> 2) Higher Mental Functions Project; K. Colby, M.D. (University of California at Los Angeles) 3) INTERNIST Project; J. Myers, M.D. and Dr. H. Pople (University of Pittsburgh) 4) Medical Information Systems Laboratory (MISL); J. Wilensky, M.D. and Dr. B. McCormick (University of Illinois at Chicago Circle) <*> 5) Rutgers Computers in Biomedicine; Dr. S. Amarel (Rutgers University) 6) Chemical Synthesis Project (SECS); Dr. T. Wipke (University of California at Santa Cruz) Stanford - <*> 1) DENDRAL Project; Drs. C. Djerassi, J. Lederberg, and E. Feigenbaum 2) Large Multi-processor Arrays (HYDROID); Dr. G. Wiederhold 3) Molecular Genetics Project (MOLGEN); Drs. J. Lederberg, E. Feigenbaum, and N. Martin <*> 4) MYCIN Project; S. Cohen, M.D. and Dr. B. Buchanan <*> 5) Protein Structure Modelling; Drs. J. Kraut and S. Freer (University of California at San Diego) and E. Feigenbaum (Stanford) As an additional aid to new projects or collaborators with existing projects, we provide a limited amount of funds for use to support terminals and communications needs of users without access to such equipment. We are currently leasing 6 terminals and 4 modems for users as well as 4 foreign exchange lines to better couple the Rutgers project into the TYMNET and a leased line between Stanford and U. C. Santa Cruz for the Chemical Syntnesis project. Privileged Communication 37 J. Lederberg Section 1.3.2.11 DETAILED PROGRESS REPORT STANFORD COMMUNITY BUILDING The Stanford community has undertaken several internal efforts to encourage interactions and sharing between the projects centered here. Professor Feigenbaum organized a seminar class with the goal of assembling a handbook of AI concepts, techniques, and current state-of-the-art. This project has had enthusiastic support from the students and substantial progress made in preparing many sections of the handbook as reported earlier. An outline of the material being prepared can be found in Appendix II on page 225 (see Book II). Several examples of completed articles are given in Appendix I on page 202 (see Book II). A second comnunity-building effort was a mini-conference on AI held at Stanford in January 1976. This 3 day series of meetings featured presentations by each of the local projects and comparative discussions of approaches to current problems in AI research such as knowledge representations, production system strategies and rule formation, etc. Weekly informal lunch meetings (SIGLUNCH) are also held between community members to discuss general AI topics, concerns and progress of individual projects, or system problems as appropriate as well as having a number of outside invited speakers. AIM WORKSHOP SUPPORT Tne Rutgers Computers in Biomedicine resource (under Dr. Saul Amarel) has organized a series of workshops devoted to a range of topics related to artificial intelligence research, medical needs, and resource sharing policies Within NIH. Meetings have been held for the past two years at Rutgers and another is planned for this summer. The SUMBEX facility has acted as a prime computing base for the workshop demonstrations. We expect to continue this Support for future workshops. The AIM workshnoos provide much useful information about the strengths and weaknesses of the performance programs both in terms of criticisms from other AI projects and in terms of tne needs of practicing medical people. We plan to continue to use this experience to guide the community building aspects of SUMEX-AIM. RESOURCE ALLOCATION POLICIES As the SUMEX facility has become increasingly loaded, a number of diverse and conflicting demands have arisen which require controlled allocation of critical facility resources (file space and central processor time). We have already spelled out a policy for file space management; an allocation of file Storage is defined for each authorized project in conjunction with the manazement committees. This allocation is divided among project members in any way desired by the individual principal investigators. System allocation enforcement is implemented by project each week. AS the weekly file dump is done, if the aggregate space in use by a project is over its allocation, files are archived from user directories over allocation until tne project is within its allocation. J. Lederberg 38 Privileged Communication DETAILED PROGRESS REPORT Section 1.3.2.11 We have recently implemented system scheduling controls to attempt to maintain the 40:40:20 balance in terms of CPU utilization (see page 18). The initial complement of user projects justifying the SUMEX resource was centered to a large extent at Stanford. Over the first term of the SUMEX grant, a substantial growth in the number of national projects was realized. During the same time the Stanford group of projects has matured as well and in practice the 4O:40 split between Stanford and non-Stanford projects is not ideally realized (see Figure 8 on page 43 and the tables of recent project usage on page 45). Our job scheduling controls bias the allocation of CPU time based on percent time consumed relative to the time allocated over the 40:40:20 community split. The controls are "soft" however in that they do not waste computer cycles if users below their allocated percentages are not on the system to consume the cycles. The operating disparity in CPU use to date reflects a substantial difference in demand between the Stanford community and the developing national projects, rather than inequity of access. For example, the Stanford utilization is spread over a large part of the 24-hour cycle, while national-AIM users tend to be more sensitive to local prime-time constraints. (The 3-hour time-zone phase shift across the continent is of substantial help in load-balancing.) For the present, we propose to continue our policy of "soft" allocation enforcement for the fair split of resource capacity. If necessary to assure proper apportionment, we can implement a pie-slice reservation system to more rigidly control the allocations. Our system also categorizes users in terms of access privileges. These comprise fully authorized users, pilot projects, guests, and network visitors in descending order of system capabilities. We want to encourage bona fide medical and health research people to experiment witn the various programs available with a minimum of red tape while not allowing unauthenticated users to bypass the advisory group screening procedures by coming on as guests. So far we have had relatively little abuse compared to what other network sites have experienced, perhaps on account of the personal attention that senior staff gives to the logon records, and to other security measures. However, the experience of most other conputer managers behooves us to be cautious about being as wide-open as might be preferred for informal service to pilot efforts and demonstrations. We will continue developing this mechanism in conjunction with management committee policy decisions. Privileged Communication 39 J. Lederberg section 1.3.2.12 DETAILED PROGRESS REPORT 1.3.2.12 SUMMARY OF RESOURCE USAGE Tne following data give an overview of SUMEX-AIM resource usage. There are five sub-sections containing data respectively for 1) monthly CPU time consumed, 2) resource usage by community (AIM and Stanford), 3) resource usage by project, 4) recent diurnal loading data, and 5) Network usage data. MONTHLY CPU TIME CONSUMED 600; 500; 4001 300, CPU Time Used (Hrs) 200; 1004 at, de be 4 Seen faranmnafevemande 0 efemrape * t ASONDJFMAMJIJIJJASONDJIFMAMIJIJASONDJIFMAMJI J 1974 1975 1976 1977 Figure 7. Monthly CPU Time Consumed J. Lederberg 40 Privileged Communication DETAILED PROGRESS REPORT Section 1.3.2.12 RELATIVE SYSTEM LOADING BY COMMUNITY The SUMEX resource is divided, for administrative purposes, into 3 major communities: user projects based at the Stanford Medical School, user projects based outside of Stanford (national AIM projects), and common systems development efforts. As defined in the resource management plan approved by BRP at the start of the project, the available resource in terms of CPU capacity and file space will be divided between these communities as follows: Stanford KOS AIM 403 staff 20% The "available" resources to be divided up in this way are those remaining after various monitor and community-wide functions are accounted for. These include such things as job scheduling, overhead, network service, file space for subsystems and documentation, ete. The monthly usage of CPU and file space resources for each of these three communities relative to their respective aliquots is shown in the plots in Figure 8 and Figure 9. It is clear that the Stanford projects have held an edge in system usage despite our efforts at resource allocation and the substantial voluntary efforts by the Stanford community to utilize non-prime hours. This reflects the development of the Stanford group of projects relative to those getting started on the national side and has correspondingly accounted for much of the progress in AI program development to date. reivilteged Communication 44 J. Lederberz . oO Section 1.3.2.12 DETAILED PROGRESS REPORT HO} National AIM yg o a 5 D ay Oo 4 a aa 5 < Se oO ad hte mines pf frsesfntenfenff fener fneenfeeefennfnnen ee pp ASONDIJIFMAMIJTASONDIFMAMIJTJASONDJIFMAMIJIG 1974 1975 1976 1977 hoy Stanford og a wn D D Ay oO et + “| S < WH a ae patter ener ff frejernenfnrfenenenfnfnenfisnc fee p nef freemen nena fanart ASONDJFMAMJIJJASONDIFMAMIJJASONDJIFMAMJ QJ 1974 1975 1976 1977 20+ System Staff g a n 5D > ay oO i . “d 5 < a4 ° xg met einen tpt ne frp fern neff namesfronpoemnijeomataceen pean farnnfenenfenmefenimhe ASONDJFMAMIJASONDJIFMAMIJASONDJIFMASNMJ J 1974 1975 1975 1977 Figure 8. CPU Usage by Community J. Lederberg 42 Privileged Communication DETAILED PROGRESS REPORT Section 1.3.2.12 40+ National AIM 9 a a Dp Y v a A. wn x A 5 << MH ° Be Om maenrfenfnn fff fran feenf ff fnfemnfnnen fen t+ ASONDJFMAMJIJTJASONDJIFMAMIJASONDJFMAMJ J 1974 1975 1976 1977 40+ Stanford % of Avail. Space Used Otro tmnt fener ent potr fee ff ASONDJFMAMJIJASONDIJFMAMJIJASONDJIFMAMJ QJ 1974 1975 1976 1977 20+ System Staff os o wn D @ oO cs jar wn oI “ 5 < ay oO * Otten taper feenrinr ren omennmsfnnefejeb fnttefe feet frfanee fo fof ASONDJIFMAMJJASONDJIJIFMAMJIJASONDJIFMAMY GQ 1974 1975 1976 1977 Figure 9. File Space Usage by Community Privileged Communication 43 J. Lederberg DETAILED PROGRESS REPORT Section 1.3.2.42 INDIVIDUAL PROJECT AND COMMUNITY USAGE The table following shows cumulative resource usage by project in the past grant year. The data displayed include a description of the operational funding sources (outside of SUMEX-supplied computing resources) for currently active projects, total CPU consumption by project (Hours), total terminal connect time by project (Hours), and average file space in use by project (Pages, 1 page = 512 computer words). These data were accumulated for each project for the months between May 1976 and April 1977. Again the well developed use of the resource by the Stanford community can be seen. It should be noted that the Stanford projects have voluntarily shifted a substantial part of their development work to non-prime time hours which is not shown in these cumulative data. It should also be noted that a significant part of the DENDRAL and MYCIN efforts, here charged to the Stanford aliquot, support development efforts dedicated to national community access to these systems. The actual demonstration and use of these programs by extramural users is charged to the national community in the "AIM USERS" category, however. Privileged Communication 5 J. Lederberg Section 1.3.2.12 STANFORD COMMUNITY 1) 2) 3) 4) 5) 6) 7) J. RESOURCE USE BY INDIVIDUAL PROJECT CPU (Hours) DENDRAL PROJSCT 1181. "Resource Related Research Computers and Chemistry" NIH RR~006 12-08 (3 yrs. 1977-80) ARPA DAHC-15-7 3-C-0435 (2 yrs. 1977-79) HYDROID PROJECT HO. "Distributed Processing and Problem Solving" ARPA DAHC-15-7 3-C-0435 MOLGEN PROJECT 85 NSF MCS75~11649 NSF MCS76-11935 (2 yrs. 1976-78) MYCIN PROJECT 410 "Computer-based Consult. in Clin. Therapeutics" HEW HS-01544 (2 yrs. 1977-79) NSF (2 yrs. 1977-79) PROTEIN STRUCT MODELING 159 “Heuristic Comp. Applied to Prot. Crystallog." NSF DCR 74-23451 (2 yrs. 1977-79) ARPA DAHC 15-73-C-0435 ATHANDBOOK PROJECT 26 PILOT PROJECTS 327 {see reports in Section 6.3 in Book ITI) COMMUNITY TOTALS 2232. Lederberg 64 61 37 890 46 -67 46 CONNECT (Hours) 19657. 5540 2394, 56 49 +73 “75 19 4O4.42 5919. DETAILED PROGRESS REPORT FILE SPACE (Pages) 13058 239 1853 6688 2477 639 3506 Privileged Communication DETAILED PROGRESS REPORT NATIONAL AIM COMMUNITY 1) 2) 3) 4) 5) 6) 7) 8) 9) ACT PROJECT 57.02 “Acquisition of Cognitive Procedures" NIMH MH29353 ONR NOO14-77-6-0242 HIGHER MENTAL FUNCTIONS 206 .03 "Computer Models in Psychiatry and Psychother." NIH MH-27132-02 (2 yrs.) UCLA NPI Gen. Res. INTERNIST PROJECT 205.20 (DIALOG) "Computer Model of Diagnostic Logic" BHRD MB-00144-03 (3 yrs.) MISL PROJECT 9.27 "Medical Information Systems Laboratory" US-PHS-MBO0114-03 (3 yrs.) RUTGERS PROJECT 139.63 “Computers in Biomedicine" NIH RR-00643-05 (3 yrs.) SECS PROJECT 308 .96 "Chemical Synthesis" AIM PILOT PROJECTS 40.91 (see reports in Section 6.4 in Book IT) AIM Administration 11.13 AIM Users 56.89 owe eee COMMUNITY TOTALS 1035.04 Privileged Communication NT 1195 .84 2680.16 2721.26 389 .05 2433 43 4374.03 1326 .56 383.22 672.35 16166.990 Section 1.3.2.12 986 2198 3535 876 10862 4515 1558 J. Lederberg Section 1.3.2.12 DETAILED PROGRESS REPORT SUMEX STAFF AND SYSTEM 1) Staff 9903.07 23198 .86 11919 2) Miscellaneous 80.87 _ 2508.98 1721 3) Operations 1505.50 §3113.94 32382 COMMUNITY TOTALS 2489 .44 88321.78 46022 RESOURCE TOTALS 5757 45 143977 .15 101136 J. Lederberg 48 Privileged Communication DETAILED PROGRESS REPORT Section 1.3.2.12 SYSTEM DIURNAL LOADING VARIATIONS The following figures give a picture of the recent variations in diurnal SUMEX system load, taken during March 1977. The plots include: Figure 10 ~ Total number of jobs logged in to the systen Figure 11 Percent of total CPU time used by logged in jobs (maximum is 200% for dual processor capacity) Figure 12 —- Percent of total CPU time consumed as overhead; I/O wait, core management, scheduling, ete. (maximum = 200%) Figure 13 ~ Balance set size (number of jobs in core) Figure 14 -— Number of runnable jobs (whether or not in core) The abscissa for these plots is broken into 20 minute intervals throughout the day. The ordinate for each interval is the average of all the daily measurements for that interval over the weekdays during March 1977. A daily measurement for a given 20 minute interval is in turn an average of the appropriate statistic sampled every 10 seconds. Since these plots display overall average data, they give representative illustration of the general characteristics of diurnal loading. There are, of course, substantial fluctuations in the quantities measured from day to day as well and for some, also on time scales shorter than the intervals displayed in the figures. For example in Figure 14, the number of runnable jobs (equivalent to the system "load average") shows a fairly smooth curve peaking at 6.7 jobs. On both a scale of minutes and from day to day, however, the number of runnable jobs will vary from only a few to 12 or more. This fluctuation is not shown in these average plats but also plays a role in the responsiveness of the system. In the heading of each plot are shown range statistics for the measurement over various parts of the day. Range data include the mininum value "Low", average value "Ave", and maximum value "High". The first line of the heading gives the range over the whole day and on succeeding lines, "Prime Time" covers 6:00-18:00 Pacific time and "Non Prime Time" covers the remaining night time hours. It can be noted in Figure 12 that the current overhead level for the dual processor system is quite high (about 33% per processor). This is because of the limited memory size (256K words) we currently have and the resulting increase in Swapping interrupt rate and 1/0 wait time. We have a proposal pending with the AIM Executive committee to augment our memory which should reduce this overhead down to our earlier single processor levels (about 15-20% per processor). Privileged Communication 49 J. Lederberg Section 1.3.2.12 DETAILED PROGRESS REPORT Figure 10. Average Diurnal Loading (3/77): Total Number of Jobs 50-1 Total Day (Low= 13.2, Ave= 23.7, High= 37.2) | Prime Time (Low= 13.3, Ave= 28.4, Highs 37.2) Non Prime Time (Low= 13.2, Ave= 17.9, High= 22.7) 1 ' ( | 1 I eaa { 28000089003909908 i G8 2CGE9E9RGa0a80G00e0 i 80a8a8aaeaaaeeeseaaaagaea i GC 22002900 99GAG0890ARR0GGaa ! GOA DG0GeCG2EeG0RRe0G90098099 i CC BCARSAGRRARACREAGAGACRAaOAARA @9890ea i C8OREIIAIIEATAGAPABGASIAOAQA IRI BAAIAARAIAARAAABAA i 69989809000 290920GdG AG AO2AIRA aA RADA RA aOARAIAsaRBABAA | @@aegaae €830 CeSGCR CSAs ee dsedeaeaaaaeadeszagaaseReRsacsaargaaeaga ~ | 680209800000 000Aa Beas adaIIAAAaAAEARGASE GEG aaRaIAAGARAA 060020090000000aa | @9@800a0aaGaaGeeaaaa GG00CBR999E0GRRE RAGE EAAARNOERSARBARGG008080G0990000000 | G@@9OGGDGAGIIGOSOICBOAGASIAAAAGAAaAAAAIARaANAAaAAgAagRAAAAD 92992099@a99@a | C@8GG2OS Ie aaBOaaaaaaaeaRaaaaas 809209280200090099000000900090080998000080 | 0880000920996 000aGR00GARA8 2AARRAAIAGAAAAGRAAAAABSBAaRAARAAGA GGGeg0eaaaga PAC t----- a prone to—- a fone teneee fem eee ta--=- teen tam aa— tae eee + TIME 0 2 4 6 8 10 12 14 16 18 20 22 24 m DM D Figure 11. Average Diurnal Loading (3/77): Percent Time Used 200-} Total Day (Low= 39.2, Ave= 92.6, High= 133.5) \ Prime Time (Low= 39.2, Aves 104.3, High= 133.5) | Non Prime Time (Low= 48.5, Ave= 78.1, High= 117.5) | i =| i | @€2098@ @2@3a @ i SC@G293804 8A AGa9086008 é i 0@2900999890020080808988008 €@ @ =| 032293980000 063009390099380a0 @€0889308a | GC GAIIAIAGAAAADAEARAIESIAAGARA @ @29@3000@ i@ @@ @ 9990909989290998088900000008009808 @9900998aa00€ 128 @@ @8 a 08088020 00929994209AR 2082029 aGaIaAOABAAAARAEAAARAAD 1@8@8edae ae 08008999 982903985999003909990900990009089000 90800000 ~| 220 229000aag 000 G80 9808 98G99909G99999039099009 909309990398 909000003008 | 880200000809008300000004990800008099809000900009009890000998890030098008 | 8099094909000009090000000000009909009090900000899000990009908008 90090000 | 2€898980800900000906000000098009000090908099890990009899900000008080900€ | 22000902 008008999080809I0000999939000000000000089999990000998 08080008908 PAC +-~---~ ++-—-- $-- aH poe eee tare n- $o--~— pone tao---- +----- +----- tome n en + TIME 0 2 4 6 8 10 le V4 16 18 20 22 24 J. Lederberg 50 Privileged Communication DETAILED PROGRESS REPORT Section 1.3.2.12 Figure 12. Average Diurnal Loading (3/77): Percent Overhead 200-} Total Day (Low= 24.4, Ave= 46.7, High= 63.9) Prime Time (Low= 26.3, Ave= 52.5, High= 63.9) Non Prime Time (Low= 24.4, Ave= 39.5, High= 50.3) 1 | I —y| t 4 ' i | ‘ t t “tf 5 tl i i | @@00008000203000009090a9998 -| COCOSIGGGGAIOGARASAIGAAaRORAGABa =k @899aaa8a | @@ ee @ GC9009090800908 90080929 200900R a0 aS ROB AB AAR AAaRARRRGA (GG@908098 CACD90GER0REA2G800800000009000005990998908906a00G09889 8008008 | 886900 0ae90900daRaaszaeaanasaaeaag_as0990900000099084900RsAaR;aaESAE0— | @80900090G89992089RAG09099D2A0GaIa0Gaa098000000000909990G8009000088088000 PAC $o---- tone n +an ee tmee en tame e a +——- = to---- pene nH t—- +----~ pam + TIME 0 2 4 6 8 10 12 14 16 138 20 22 24 Figure 13. Average Diurnal Loading (3/77): Balance Set — Jobs in Core 12-} Total Day (Low= .7, Aves 2.4, High= 4.9) | Prime Time (Low= .7, Ave= 3.1, High= 4.9) | Non Prime Time (Low= .8, Ave= 1.6, High= 2.8) i 1 $ 1 “I i i i 1 ' ' i i ‘ ! i ea | C8e@e0@33 @ aasaas { 8809902030999893993a000 -| 80999000000909009089099009809 @ | @380880A98009900099998038000000 &2296930 2 0809800909800908598330394090000990980003998000a0080 188809099 efGe 2008090900 9000300900990339988a900908900000090998R 0000 | 08000980929890909990090000000000090900399090990959940980000098989889009a PAC pone +----~ +----= $—-—--- tama $----- +~---~ bon eee penne panne $a —-~ + TIME 0 2 K 6 8 10 12 14 16 18 20 22 24 Privileged Communication 57 J. Lederberg Section 1.3.2.12 DETAILED PROGRESS REPORT Figure 14. Average Diurnal Loading (3/77): Runnable Jobs Total Day (Low= .7, Aves 2 Prime Time (Low= .7, Ave= 3 Non Prime Time (Low= .8, Ave= 1 9 , High= 8, High= T ’ High= Wan =a NN -7) . -7) . 1) ] ‘ ‘ t i \ 1 ' i i a =! aeee 029990@ 2@ @ G2 GaRGGeRadeIaG90990 @ @820@3000809089000920 i OG PGQGATIARARVAGAIAVEA2IIE G@earaaaoaeagagagaeazaaaaadsa ee i GG GG8ARAIGAOGRAIEGRGAIADGAEAIAAS a@32@49@ i @ GECOCCORAIGZERAABAAAERAIIAIGAGAIIA @ i a@ @aage C2CCBOE9898E928 90009920 A9A90999989808 i a @ ae Q PAC +----- y 6 8 10 12 144 16 1 J. Lederberg 52 Privileged Communication DETAILED PROGRESS REPORT Section 1.3.2.13 1.3.2.13 NETWORK USAGE STATISTICS NETWORK USAGE PLOTS The plots in Figure 15 show the major billing components for SUMEX-AIM TYMNET usage. These include the total connect time for terminals coming into SUMEX and the total number of characters transmitted over the net. The ratio of characters received at SUMEX to characters sent to the terminal is about 1:12 over our period of usage. Also shown for recent months is a plot of ARPANET connect time which tracks the corresponding data for TYMNET usage fairly closely. No data for "character" transmission is available for ARPANET since file transfers and terminal traffic use different byte sizes and these data are not resolved and maintained for the ARPANET. Privileged Communication 53 J. Lederberg Section 1.3.2.13 1900+ 8004+ 500+ 400+ Connect Time (Hrs) 200+ 0 1974 204 184 164 144 124 104 Characters Transmitted (x 10°) On at Opt ttt ASOND J 1974 J. Lederberg TYMNET ————— ARPANET —— — ASONDJ DETAILED PROGRESS REPORT + 4 + + 4 : 4 ‘ + . 4 p> t JFMAMJI J 1977 JJASONDIFPMAMITASOND 1975 1975 FMAM TYMNET -——-—~ AMJJASONDJIJFMAMJIJASO 1975 1976 femme FM NDJFMAMJ J 1977 Figure 15. TYMNET and ARPANET Usage Data 54 Privileged Communication Section 1.3.2.14 DETAILED PROGRESS REPORT 1.3.2.14 PUBLICATIONS The following are publications for the SUMEX staff and have included papers describing the SUMEX-AIM resource and on-going research as well as documentation of system and program developments. Publications for individual collaborating projects are detailed in their respective reports (see Section 6 on page 44 in Book II). {1] Carhart, R.E., Johnson, S.M., Smith, D.H., Buchanan, B.G., Dromey, R.G., and Lederberg, J, "Networking and a Collaborative Research Community: a Case Study Using the DENDRAL Programs", ACS Symposium Series, Number 19, COMPUTER NETWORKING AND CHEMISTRY, Peter Lykos (Editor), 1975. [2] Levinthal, E.C., Carhart, R.E., Johnson, S.M., and Lederberg, J., "When Computers Talk to Computers", Industrial Research, November 1975 [3] Wilcox, C. R., "MAINSAIL - A Machine-~Independent Programming System," Proceedings of the DEC Users Society, Vol 2, No 4, Spring 1976. Mr. Clark Wilcox also chaired the session on "Languages for Portability” at the DECUS DECsystem10 Spring °76 Symposium. In addition as reported earlier, a substantial effort has gone into developing, upgrading, and extending documentation about the SUMEX-AIM resource, the SUMEX-TENEX system, the many subsystems available to users, and MAINSAIL. These efforts include a number of major documents (such as SOS, PUB, and TENEX~- SAIL manuals) as well as a much larger number of document upgrades, user information and introductory notes, an ARPANET Resource Handbook entry, and policy guidelines (see Appendix VI, and Appendix VII in Book ITI). Publications for individual user projects are summarized in the respective reports (see Section 6 in Book II). J. Lederberg 56 Privileged Communication DETAILED PROGRESS REPORT 1.3.2.15 RESOURCE STAFFING HISTORY PROFESSIONAL PERSONNEL (YEARS 01-04) Name Title of Position Lederberg, Joshua Principal Investigator Rindfleisch, Thomas Facility Manager Levinthal, Elliott AIM Liaison Cower, Richard System Programmer Crossland, James. System Programmer Gilmurray, Frank System Programmer Heathman, Michael System Programmer Lieb, James System Programmer Reiss, Steven System Programmer Sweer, Andrew System Programmer Tucker, Robert System Programmer schulz, Rainer System Programmer - IMSSS Roberts, Ronald System Programmer - IMSSS w bd " " tt Smith, Robert - System Programmer - IMSSS Quam, Lynn syst. Prog. - Cardiology Johnson, Suzanne Applications Programmer Snito, Nancy Applications Programmer Kahler, Richard User Consultant Jackson, Phillip User Support Specialist Wilcox, Clark Syst. Prog. - Res. Asst. Veizades, Nicholas Electronics Engineer ~ IRL Nozaki, Thomas Electronics Engineer - IRL (#) The figures shown give the 4% of effort during the respective employment. Privileged Communication 57 (*) 2 of Effort ee 10 100 22 100 100 1090 100 100 100 100 100 61 50 52 50 50 109 100 100 190 63 50 Section 1.3.2.15 Period of Appointment 10/1/73 - present 10/1/73 - present 12/1/73 - present 6/24/74 = 6/15/77 8/6/74 - 1/16/76 6/1/77 (tent. start) 10/1/73 = 8/15/75 T/1/74 = 11/14/75 10/1/73 - 7/31/74 1/19/76 - present 6/1/77 (tent. start) 2/1/74 - present 2/1/TH - 7/31/74 5/1/75 - 7/31/75 5/1/75 - 7/31/75 3/1/76 ~ 5/31/76 T/22/T4 - present 3/25/74 = 8/20/76 12/1/75 - present 11/18/74 ~ 7/28/75 3/25/74 — present 10/1/73 - present 5/1/74 - present periods of J. Lederberg SPECIFIC AIMS 2 SPECTFIC AIMS The following outlines the specific objectives of the SUMEX-AIM resource during the follow-on five year period. Note that these objectives cover only the resource nucleus; objectives for individual collaborating projects are discussed in their respective reports (see Section 6 on page 41 in Book II). We break our research aims into the categories 1) resource operations, 2) training and education, and 3) core research. 2.1 RESOURCE OPERATIONS AIMS The broad objectives remain to provide an effective computing facility with extensive network access to support the community of projects developing ATI applications in medicine. This goal includes the limited dissemination of these programs to outside research groups to provide the necessary feedback from actual research applications for effective program development. Specific aims include: 1) Continue the building of a community of projects applying AI techniques to medical problems including improving mechanisms for inter- and intra- group collaborations and communications. We plan to extend the existing AIM community management structure to accommodate justified growth in computing resources at other sites including a close collaboration between nodes on such a "resource network" and a meaningful division of responsibilities and regional expertise. To minimize administrative barriers to the community-oriented goals of SUMEX-AIM, we plan to retain the current user funding arrangements; user projects will fund their own manpower and local needs and will actively contribute their special expertise to the SUMEX-AIM community in return for an allocation of computing resources under the control of the AIM management committee structure. There will be no "fee for service" charges for community members. While AI is our defining theme, we may entertain exceptional applications justified by some other unique feature of SUMEX-AIM essential for important biomedical researcn. 2) Provide an effective computing resource to support the development and research dissemination of large and complex computer programs for a broad range of medical AI applications. This will include the continued development and refinement of the existing resource and the development and implementation of a plan for the upgrade of current hardware to the emerging next generation when justified by community, technical, and economic advantages. 3) Provide effective and geographically accessible network comnunication facilities to the SUMEX-~AIM community for effective remote collaborations and to allow external users to experiment with available AI programs. We also plan to demonstrate the utility of network communications for scientific collaboration, in selected cases which do not interfere with our primary mission, to groups in other areas of computer science related to medicine. The ONET collaboration (see the Rutgers Resource progress J. Lederberg 58 Privileged Communication RESOURCE OPERATIONS AIMS , Section 2.1 report on page 144) illustrates the value of these facilities apart from the AI programs themselves. 2.2 TRAINING AND EDUCATION AIMS Our goals during the follow-on period for assisting new and established users of the SUMEX-AIM resource are a continuation of those adopted for the first grant term. Collaborating projects will provide their own manpower and expertise for the development and dissemination of their AI programs. The SUMEX resource will provide community-wide support and will work to make resource goals and AI performance programs known and available to appropriate medical scientists. Specific aims include: 1) Provide documentation and assistance in interfacing users to resource facilities and programs. We will continue to exploit particular areas of expertise within the community for developing pilot efforts in new application areas. 2) Continue to allocate "collaborative linkage" funds to qualifying new and pilot projects to provide for communications and terminal support pending formal approval and funding of their projects. These funds are allocated in cooperation with the AIM Executive Committee reviews of prospective user projects. 3) Provide support for a "visiting scientist" position to allow prospective qualified SUMEX-AIM project investigators or users to spend a term in close contact with on-going research work. The selection of appropriate candidates for this rotating position would be made in cooperation with the AIM Executive Committee. 4) Continue to support AIM Workshop activities in collaboration with the Rutgers Computers in Biomedicine resource. 2.3 CORE RESEARCH AIMS Our core research efforts will emphasize the generalization and documentation of tools and techniques available for AI research and applications and the examination of alternative approaches for implementing and exporting large and complex AI performance programs. These efforts will be important community-wide to facilitate the investigation of new application areas and to meet the demand, beyond SUMEX-AIM capacity, for external users to be able to run developed AI programs conveniently. Fortunately, we have independent funding from various agencies for research activities that overlap the core-research Privileged Communication 59 J. Lederberg Section 2.3 CORE RESEARCH AIMS opportunity, e.g., CONGEN, MOLGEN, Heuristic Programming Project, and DENDRAL mass spectrometry. Specific aims include: 1) Continue to encourage community efforts at organizing and developing AL techniques by supporting projects such as the AI Handbook, special language developments (e.g., KRL), and other projects community members may propose to contribute. 2) Explore the generalizations of AI tools for knowledge acquisition, representation, and utilization; reasoning in the presence of uncertainty; strategy planning; and explanations of reasoning pathways. This effort will attempt to extract and generalize some of the best concepts and functional capabilities developed in the context of particular projects (e.g., DENDRAL, MYCIN, MOLGEN, etec.). The objective is to evolve a body of software packages that can be used to more efficaciously build future knowledge-based systems and explore other medical AI applications. 3) Explore AI software implementation and export mechanisms such as network communication systems, machine-independent languages, and special purpose computer systems. This will include the continued development of the MAINSAIL system and the investigation of microprogrammable machines specialized for target languages or satellite general purpose machines capable of running existing systems. Even the present level of computer capacity is not sufficient to meet the demands of a number of our projects. The DENDRAL CONGEN program is a good example where the potential for effective application to real biochemical structure determination problems is close but it simply takes too long to run problems that are really interesting. Therefore new approaches to computing are needed that may involve parallel processing, multiple small machines, or new developments from commercial vendors such as very much cheaper analogs of the PDP~10 that eould be run in a more nearly dedicated mode. J. Lederberg 60 Privileged Communication METHODS OF PROCEDURE 3 METHODS OF PROCEDURE This section details our plans for SUMEX-AIM goals during the next five year period. As indicated earlier, objectives and plans for individual collaborating projects are discussed in Section 6 on page 41 (see Book II). In general SUMEX-AIM will retain its community orientation in formulating and implementing a resource for AI research in medicine. We have had good success at integrating the tools and expertise of on-going active research efforts where possible and building on these where extensions or innovations are necessary. . This orientation has proved to be an effective way to build the current facility and community and we expect it to be equally productive during the next period. We have assembled a growing community of projects which contribute to SUMEX-AIM resource goals and have at the same time come to depend on SUMEX for computing support and as a means of interacting with collaborators. We plan to continue our commitment to providing effective support to this community of projects. This opportunistic approach also places constraints in synchronizing particular advances with our community needs. We are presently facing demands for increased computing resources as well as for effective methods for exporting mature AI performance programs. At the same time a new generation of hardware and firmware systems is just becoming available. These will have a large impact as a means to meet our goals, providing economic and technical advantages while minimizing redesign and reprogramming requirements. The anticipated timing for the announcement of a new generation of general purpose machines that might run AI software using existing operating systeus and language support with substantially reduced capital investment is one to two years off. Such systems could be used to export software packages intact or to incrementally augment central resources like SUMEX. A similar situation exists for special purpose microprogrammable machines which can be tailored to particular language needs for increased throughput and efficiency. We aim to respond in a timely fashion to take advantage of this emerging technology but until concrete details are publically available, we can only describe our basic objectives and general design possibilities. : Thus the following description of research plans concentrates on software issues in planning for assimilation of the new technologies with the expectation that hardware announcements one to two years hence will impel careful reconsiderations of our strategies. Detailed budgets for computing hardware conversions are only approximate pending more detailed information on pricing. Our approach is to describe the research concept and gross estimated funding required, for review of these objectives at this time. We will further refine and elaborate the details of these plans during the first one to two years of the grant and submit them through the AIM Executive and Advisory Committees and the NIH Biotechnology Resources Program Office for approval prior to implementation. Privileged Communication 61 J. Lederberg section 3.1 RESOURCE OPERATIONS PLANS 3.1 RESOURCE OPERATIONS PLANS 3.1.1 SYSTEM HARDWARE AND MONITOR PLANS As discussed in the progress section and supported by collaborating project reports, we have implemented an effective computing resource to support AI applications to medical research. We have augmented tne present system to increase its effective capacity as far as we economically can to meet community needs. We do not propose any substantial changes either in scope of the existing resource or in its capacity. Other members of our community have proposals pending for other regional centers which may be justified on their own merits and the needs of the AIM community. We support the development of such regional expertise and specialization where justified which may allow a more coherent adaptation of a particular facility’s resources to the needs of a subset of the AIM community. For example, a substantial group of biochemical structure analysis projects has grown up (DENDRAL, Chemical Synthesis Project, Protein Structure Project, and Molecular Genetics Project) as well as a group of medical diagnostic projects (MYCIN, Rutgers ONET, and INTERNIST as well as several pilot efforts). If regionalization becomes indicated, AIM facilities could be reoriented to serve the special needs of these research and target communities via separate systems, while maintaining close administrative and informational ties. We cannot predict the funding support such new facilities might receive but we will cooperate fully in getting them started and in assuring effective management for the benefit of the overall AIM community. Our own facility has operated at capacity since early in our present grant term owing to the continuing maturing of on-going projects and the recruitment of new users, despite the periodic augmentation. As indicated earlier, our present hardware cannot be augmented further witnout upgrades to major mainframe and memory components. This should be done only after optimizing with respect to available new systems which are scheduled for announcement in the next year or so. There have been a number of recent relevant announcements but these machines have not yet been of a capacity or economic advantage to warrant immediate upgrade (indeed our decision to develop the dual KI-10 processor system was made on the basis of optimum cost-effectiveness within current technology and budgets). Furthermore, these systems are being sold packaged with relatively expensive memory and file storage and future releases may allow a more cost- effective mix of components from multiple vendors. Our hardware design is now approximately five to six years old and will be twelve years old by the end of the follow-on 5 year grant term. The economics and technical performance of the newer systems, the evolving software gaps from inherent backward incompatibilities, and the reliability and maintainability of our existing equipment will pose new opportunities and problems. They may point to a strong rationale for an upgrade of the SUMEX-AIM system to meet the needs of the AI community we are supporting. The costs of this new generation of hardware will represent a progressively smaller part of the overall effort, compared to human resource inputs, especially if user participation is fairly weighted. J. Lederberg 62 Privileged Communication SYSTEM HARDWARE AND MONITOR PLANS Section 3.1.1 The TOPS-20 system DEC is currently marketing is derived from TENEX but already, DEC has made changes which cause incompatibilites with earlier systems. Many of these are in the direction of improved system performance (file system redundancy, system call enhancements, etc.) while others are of less obvious value (file naming conventions, message file formats, ete.). Whatever the reason, DEC’s TOPS-20 system will likely doninate future system purchases and will increasingly diverge from ours. This causes a larger burden in our pursuit of software sharing and will affect the ease with which we can cooperate with other potential AIM network nodes. To avoid effective isolation, we will have to maintain effective compatibility. DEC has no plans for making TOPS-20 run on KI- 10°s and it is not likely others will undertake this within the currently strict licensing restrictions and DEC’s motivations to sell KL-10’s. Our apparent alternatives are to upgrade to some KL-"n" system when this product line matures and fills out so a proper choice can be made or to progressively modify our current system to remain as compatible as possible. A hardware conversion would likely cost at least $500,000 (based on current prices, but presumably much less as time passes) while system modifications for compatibility will entail 1-2 additional people per year in software effort. The cost of the latter approach must also include a measure of user community investment to circumvent unavoidable residual incompatibilities. The choice for optimum return will depend on the timing of major price declines for a given hardware capability, and on the way that cognate facilities evolve and participate in sharing software burdens. We do not expect these trade-offs to be clear before 1979. We tentatively propose to expend the man-effort required to maintain compatibility between our existing system and TOPS-20 so long as this remains tenable. We budget initially one person for this purpose and add an additional programmer at the middle of the grant term. If this approach proves too costly and ineffective, we may propose reallocating tnese funds for a hardware conversion. Such a contingency would be thoroughly reviewed with AIM management committees and the NIH-BRP before finalizing a plan or requesting additional funding. In the meantime we plan to reevaluate the performance of our existing system to wring out any remaining inefficiencies for more effective community Support. The dual processor system has stabilized nicely and with the memory augmentation we are implementing, we will have taken advantage of all of the obvious sources of inefficiency. We will rereview the detailed operation of the facility to try to uncover remaining areas of cleanup. Recent measurements show that a high percentage of available time (80-90% in one recent test) is spent in various system routines which provide the rich set of monitor calls available through the TENEX system. It is therefore important to optimize tne efficiency of the most widely used calls. We also plan as part of this investigation to examine alternative strategies for managing memory allocations to running jobs. This will include attempting to minimize paging overhead by preloading job working sets to better utilize and overlap swapping I/O with other activities rather than waiting for page faults to read in pages on demand. We will also consider giving some program control over working set definition. Privileged Communication 63 J. Lederberg Section 3.1.2 COMMUNICATION NETWORK PLANS 3.1.2 COMMUNICATION NETWORK PLANS Networks remain centrally important to the research goals of SUMEX-AIM. We have had good success at meeting the geographical needs of the community during the early phases through our ARPANET and TYMNET connections. The major problems focus on terminal interaction delays through relatively slow or congested network facilities. In the next year or so TYMNET will be announcing their upgraded network (TYMNET IL) which may offer additional advantages for our community such as higher terminal speeds, more dynamic terminal routing, and inter-host communications. If additional AIM servers are implemented, it will be important to coordinate their network access with that of SUMEX for effective user interactions and system collaborations. During this same period ARPANET may be undergoing similar redesigns and possible further specialization to defense needs. In parallel, the TELENET facilities are evolving rapidly and whereas they offer a symmetric service for file transfer and terminal traffic, character delays are currently too high to warrant connecting immediately. We expect to retain our present connections over the early phases of the follow-on grant and to evaluate new upgrades as they become available. The specific goals for this upgrade will be improved terminal support and effective file transfer mechanisms available community-wide, particularly to interact with other AIM nodes. 3.1.3 SOFTWARE SUPPORT PLANS We will continue to maintain the system, language, and utility support software on our system at the most current release levels, including up-to-date documentation. We will also be extending the facilities available to users where appropriate, drawing upon other community developments where possible. We rely heavily on the needs of the user community to direct system software development efforts. Two specific areas we plan to pursue are extensions to the bulletin board system and improved facilities for managing and organizing collections of related information as for example, program libraries and documentation, bulletin board or message files, collections of user profile information, ete. Bulletin board extensions will include improved facilities for searching for relevant information, associating a given bulletin with multiple topic labels, and more effectively apprising users of new information of interest. We are also examining extensions of the TENEX file system syntax and design to allow better logical organization and access to groups of file information. This may include facilities to define a hierarchical data structure, a"file system within a file", to name and manipulate logically related but independent pieces of information. A number of programs use ad hoc directories to access segments of information. We would hope to better standardize and improve such tools, J. Lederberg 64 Privileged Communication COMMUNITY MANAGEMENT PLANS Section 3.1.4 3.1.4 COMMUNITY MANAGEMENT PLANS We plan to retain the current management structure that has worked out well for the recruitment and review of new projects and the guiding of resource policy formation. We expect the Executive and Advisory Committees to play a continuing important role in advising on priorities for facility evolution and on-going community development efforts such as MAINSAIL in addition to their recruitment efforts. The composition of the Executive committee will grow as needed to assure representation of major user groups and medical and computer science applications areas. The Advisory Group membership rotates with each member serving one to two years and spans both medical and computer science research expertise. We expect to maintain this policy. The AIM workshops under the Rutgers resource have served a valuable function in bringing community members and prospective users together. We will continue to support this effort in terms of the Stanford community participation and providing a computing base for workshop demonstrations and communications. Privileged Communication 65 J. Lederberg Section 3.2 TRAINING AND EDUCATION PLANS 3.2 TRAINING AND EDUCATION PLANS We have an on-going commitment, within the constraints of our staff size, to maintain a high level of documentation of the evolving software support on the SUMEX-AIM system and to provide user help facilities such as the HELP and Bulletin Board systems. These latter aids are the best way we can assist resource users to find the information they need when they need it to solve access problems. Since much of our community is geographically remote from our machine, these on-line aids are indispensible for self help. We will also provide on-line personal assistance to users within the capacity of available staff through the SNDMSG and LINK facilities. We allocate funds in our budget to continue the "collaborative linkage" Support initiated during the first term of the SUMEX-AIM grant. These funds are allocated under Executive Committee authorization for terminal and communications Support to help get new users and pilot projects started. We also have requested support for a "visiting scientist" position which will allow selected prospective investigators to gain first hand experience by visiting on-going projects such as at Stanford. We feel this can serve an important role in catalyzing the development of new application areas and in disseminating the AI programs and techniques developed within the SUMEX-AIM community. The selection of appropriate individuals will be coordinated with the AIM committees as well. Finally, we will continue to actively support the AIM workshop series in terms of planning assistance, participation in program presentations and discussions, and providing a computing base for AI program demonstrations and experimentation. J. Lederberg 66 Privileged Communication CORE RESEARCH PLANS section 3.3 3.3 CORE RESEARCH PLANS 3.3.1 GENERALIZATION OF AI TECHNIQUES The SUMEX-AIM facilities have made it possible to explore many of the frontiers of Artificial Intelligence research within the context of specific systems of medical relevance. Among those issues are the acquisition, representation and utilization of knowledge (both formal and judgmental), reasoning under uncertainty, explanation of a program’s reasoning steps, and strategy planning. During the next period we wish to extract some of the best concepts and programming techniques from the specific programming systems, demonstrate their generality by incorporating them into other working programs, and design and implement packages which can be used to construct other high performance, knowledge based systems. The five projects described below are proposed as basic core research in Support of the various AIM community projects applying the techniques of AI research to biomedical problems. References for this material can be found on page 76. Because these projects are extensions of on-going work, we are able to generalize from existing programs without requesting support for maintenance or development of the programs themselves. This is another example of the synergistic community interactions of the SUMEX-AIM resource. 3.3.1.1 DESIGN OF KNOWLEDGE-BASED CONSULTATION SYSTEMS Objective Recent work has suggested that one key to the creation of intelligent systems is the incorporation in programs of large amounts of task-specific knowledge. We intend to develop (i) methods of using large stores of expert knowledge as a foundation for computer-based reasoning, and (ii) methods of facilitating the knowledge transfer from human experts to computer programs. We believe that this will lead to principles that may help turn the art of building large systems into more of a science, and thus aid other investigators who are building large knowledge-based systems. To do this, we will work on a number of problems involving knowledge representation, accumulation, management, and use, in the context of a software "laboratory" designed to facilitate the construction and use of large knowledge bases. Motivation Some of the earliest work in artificial intelligence centered around the attempts to create generalized problem solvers. Work on programs like GPS [Newel172] and theorem proving [Nilsson71], for instance, was inspired by the apparent generality of numan intelligence and motivated by the belief that it might prove possible to develop a single program applicable to all (or most) problems. While this early work demonstrated that there was a large body of Privileged Communication 67 J. Lederberg Section 3.3.1.1 GENERALIZATION OF AI TECHNIQUES useful general purpose techniques (such as problem decomposition into subgoals, and heuristic search in its many forms), these techniques did not by themselves offer sufficient power for high performance. Recent work has instead focussed on the incorporation of large amounts of task specific knowledge in what have been called "knowledge-based" systems. Rather than non-specific problem solving power, knowledge based systems have emphasized high performance based on the accumulation of large amounts of knowledge about a single domain. A second successful focus in work on intelligent systems has been the emphasis on the utility of solving "real world" problems, rather than artificial problems fabricated in simplified domains. This is motivated by the belief that artificial problems may prove in the long run to be more a diversion than a foundation for further work, and by the belief that the field has developed sufficiently to provide techniques that can aid working scientists. While artificial problems may serve to isolate and illustrate selected aspects of a task, solutions developed for those selected aspects often do not generalize well to the complete problem. There are numerous current examples of successful systems embodying both of these trends, systems which apply task-specifie knowledge to real world problems. They include efforts at symbolic manipulation of algebraic expressions [Macsyma74], speech understanding [Lesser74], chemical inference [Buchanan71], and interactive consultants in a few specific areas [Pople75, Shortliffe75]. While all of these systems display an encouraging level of performance, however, two fundamental problems remain. First, assembling the knowledge base for each of these is a difficult, continuous task that has in most cases extended over several years. Second, the result of this effort is typically a system with an impressive level of performance, but only within a sharply limited domain of application. High performance has been achieved at the cost of generality and man-years of work in knowledge base construction. But if programs require large stores of knowledge for high performance, can we take a step back and discover powerful and broadly applicable techniques for accomplishing this transfer of knowledge? That is, can we discover ways of facilitating the communication, management and use of large amounts of task- specific knowledge? The result would be an intelligent system whose generality arose from access to the appropriate human experts, and whose power was based on the store of knowledge it acquired from them. Two central themes of the proposed work are facilitating knowledge base construction and improving the generality of the reasoning programs that use the knowledge base. We intend to employ a computer system based on broadly applicable techniques for knowledge encoding and use, and couple it with powerful techniques for accomplishing the transfer of knowledge from human experts to computer programs. The foundation for the computer system will be provided by the domain independent core of the Mycin system [Shortliffe75, Davis77]. This will be the basis for a software "laboratory" in which we can examine the relevant issues of knowledge representation, accumulation, management, and use. By setting this work in the context of a specific, existing body of software, a number of a very general issues become focussed into specific questions. Since J. Lederberg 68 Privileged Communication GENERALIZATION OF AI TECHNIQUES section 3.3.1.1 the program that constitutes our "laboratory" has been demonstrated to have a strong degree of domain independence, the results of this work will be widely applicable. This should produce a new form of generality. Unlike GPS, we do not offer one program which can solve problems in any domain. Rather, we offer the foundation for a system, along with a methodology for instantiating that system in any one specific domain. The foundation and methodology provide a framework for the expression, management, and use of domain specific knowledge, to make this instantiation task a reasonable one. It is there in the foundation and the methodology that our generality lies, not in the final performance program which results. 3-3.1.2 ATTEMPT TO GENERALIZE (AGE) PACKAGE The objective of this research is to isolate inference, control and representation techniques from previous knowledge-based programs; reprogram them for domain independence; write a rule-based interface that will help a user understand what the package offers and how to use the modules; and make the package available to SUMEX users, other research groups engaged in knowledge- based systems development, and the general scientific community. Detailed Discussion: The goal of this new effort is to construct a computer program to facilitate the building of knowledge-based systems. The design and implementation of tne program will be based primarily on the experience gained in building knowledge-based systems at the Heuristic Programming Project in the last decade. The programs that have been built are: DENDRAL[Buchanan71], meta- DENDRAL[ Buchanan72], MYCIN[ Shortliffe76], AM[Lenat76], HASP[Nii77], Protein Structure Modeler[Engelmore77], and MOLGEN[Stefik77] (the latter two currently under development). Initially, The AGE program will embody methods used in our programs. However, the long-range objective is to integrate methods and techniques developed at other A.I. laboratories. The final product is to bea collection of useful "building-block" subprograms, combined with a knowledge. based front-end that will assist a user in constructing knowledge-based programs. It is hoped that AGE can speed up this process and facilitate transfer of the technology by: (1) packaging common AI software tools so that they do not need to be reprogrammed for every problem; and (2) helping people who are not knowledge-— engineering specialists to write knowledge-based programs, Two Specific Research Activities of the AGE Effort are: 1. The isolation of techniques used in knowledge-based systems. It has always been difficult to determine if a particular problem-solving method used in a knowledge-based program is "special" to a particular domain or whether it generalizes easily to other domains. In the currently existing knowledge-based programs the domain-specific knowledge and the manipulation of such knowledge using AI techniques are often so closely Privileged Communication 69 J. Lederberg Section 3.3.1.2 GENERALIZATION OF AI TECHNIQUES coupled that it is difficult to make use of the programs for other domains. We need to isolate the AI techniques that are general to determine precisely the conditions for their use. 2. Guiding users in the application of these techniques. Once the various techniques are isolated and programmed for use, an "intelligent front end" is needed to guide users in their application. Initially, we assume that the user understands AI techniques and knows what he wants to do, but that he does not understand how to use the AGE program to accomplish his task. The program at this stage of the development will need to have the basic tools coupled with a package to guide the user in applying these tools. A longer-range interest involves helping the user determine what techniques are applicable to his task. That is, we assume that the user does not understand the necessary techniques of writing knowledge-based programs. Some questions to be posed are: What are the criteria for determining if a particular application is suited to a particular problem-solving framework? How do you decide the best way to represent knowledge for a given problem? There are some smaller, but by no means trivial, questions which also need answering. Is there a "best way" to write production rules which would apply to many task domains? Is there a data representation that would cover many tasks? What is the best way to handle differences in the ability of the users of the AGE program? Research Plan: The AGE program will be developed along two separate fronts, both of which are divided into incremental development stages. The first of these fronts is the development of the ability to help build many different types of knowledge- based programs (the "generality" front). The second front is the development of "intelligence" in the interaction between tne user and the AGE program; i.e. moving from dialogues on "how to use the tools in AGE" to "what tools to use" (the "how-to-what" dialogue front). The proposed development plan contains the following stages: a. Generality: The development of a program package that will enable the user to build "HASP-like" knowledge-based programs characterized by the integration of multiple sources of knowledge, multi-level representation of solution hypotheses, opportunistic problem-solving methods, and explanation capability of the reasoning steps. The HASP-like paradigm has been used to solve problems of interpreting large amounts of digitized physical signals, but can also be extended to problems of processing large amounts of symbolic data. Dialogue: The development of dialogue to show the user how to utilize the packaged components in AGE to build HASP-like programs. The interactive capability will be limited to: specifying how to build multi-level hypothesis structure; how to write production rules to represent domain knowledge; and how to use various techniques available for opportunistic hypothesis formation. J. Lederberg 70 Privileged Communication GENERALIZATION OF AI TECHNIQUES section 3.3.1.2 b. Generality: Supplement the ability to build HASP-like programs with a capability to build MYCIN-like goal oriented programs. Dialogue: Same level of dialogue capability with additional ability to discuss how to chain rules and how to specify the necessary parameters for the context tree. e. Generality: Same level as for b., i.e. ability to build HASP-like, MYCIN-~ like or combination of HASP-~ and MYCIN-Like knowledge-based programs. Dialogue: Begin to extract from the user some key characteristics of the task, and using that information begin to suggest appropriate knowledge representation and problem-solving techniques for the user’s task. This interactive capability will be limited to the generality level at this point in the AGE development. d. Test phase: Test the usefulness of the AGE system by developing an application program in some task domain. (a) An application program will be chosen from among on-going program development efforts within our own project or within the SUMEX-AIM community. An application will be chosen whose primary task is that of interpreting large amounts of symbolic data or described signal data. (b) Collect specific knowledge needed for the application program and begin to develop the program using the AGE system. 3.3.1.3 PLAN PACKAGE The PLAN package is oriented toward the representation of plans-of~action and toward an expert’s knowledge of the best problem solving strategies to employ in his domain. A feature of the package is its ability to make inferences on components of planning and strategy rules so that new plans and strategies can be constructed readily from previous ones. The representation will allow the manipulation of various "levels of detail" of plans and strategies. The package will be made available as previously mentioned in connection with AGE. Detailed Discussion: Before starting a technical presentation of the ideas for the Plan Package, it is worth highlighting some of the issues which motivate its development. a. How can a variety of types of domain actions be accommodated in a knowledge base? b. How can a variety of types of strategy and control knowledge be incorporated in a knowledge base? e@. How can a variety of types of problem solving states be expressed and manipulated by the system? d. How should plans be represented? Privileged Communication ~ 71 J. Lederberg Section 3.3.1.3 _ GENERALIZATION OF AI TECHNIQUES e. How can the problem statements for a variety of types of problems be acquired? f. How does the expression and representation of problem solving states relate to the expression of the domain and strategy knowledge? The Plan Package consists of two major entities -- the Planning Network and the Strategy Package. The Planning Network is a set of software which manages the representation of the plans created during the problem solving process. When a problem is acquired from a user, it is represented as an initial planning network. Problem solving takes place as the active strategy rules manipulate the planning network to create solutions. The Strategy Package itself is discussed in the next section. Since the planning state knowledge is important for the expression of Strategy in the Plan Package, it is worthwhile exploring briefly the nature of this knowledge. It is useful to consider the planning network as being composed of three parallel planes -- the solution plane, the planning plane, and the focus plane. These planes contain (1) the solution steps (domain rule applications) and world states, (2) the planning and design steps and (3) the focus of attention knowledge respectively. All three planes of the network are built dynamically during the problem solving process. Different types of nodes in the network correspond to the different components of the problem solving process, A number of issues have been raised about the management of strategy knowledge. a. How should strategies be expressed? b. How can strategy information be assimilated so that the system will use it appropriately when designing or explaining solutions? ec. How can a Knowledge based system assist a domain expert in structuring and expressing his ideas about strategy? Means-ends analysis is one of the simplest ideas in the current stock of methods for problem solving. As such, it should exist as a standard strategy in a strategy package of artificial intelligence techniques to be used as needed. The current state of artificial intelligence, where a researcher must re-code Means- ends analysis any time ne wishes to use it is akin to a carpenter forging a new hammer for each job. One approach for making an instance of Means-ends analysis available as a tool would be to provide a packaged program which accepts arguments for the various components of Means-ends analysis (e.g. a difference table, difference function, etc.). The alternative being proposed here is a system which uses schemata to drive the strategy acquisition process and which can guide a user through the details. The goal is to create a supportive environment for the painless testing of fairly high level strategies. Such a system should be able to draw on its knowledge base to provide assistance in casting a problem into a Means-ends framework. J. Lederberg 72 Privileged Communication GENERALIZATION OF AI TECHNIQUES Section 3.3.1.3 In summary, other systems have stumbled over the expression of more complex forms of domain and strategy rules and have been limited to solving a Single kind of problem. We propose extending this work by developing what we have termed the Plan Package. The Plan Package consists of two major components — a schema-based representation for the problem-solving states termed the Planning Network and a schema~based representation for domain rules and strategies termed the Strategy Package. The Planning Network will provide a representation for a variety of types of problem solving so that the problem solving system will be able to solve more than one type of problem. The Strategy Package will provide a set of Standard artificial intelligence strategies in the form of schemata, which may be instantiated into strategy rules when they are supplied with the particulars of domain knowledge. These schemata will facilitate the acquisition of tailored Strategies by guiding a user a step at a time through the particulars of the acquisition process. Tne Plan Package will be developed and tested in the domain of molecular genetics as part of the MOLGEN project. It will be further developed and extended to other domains as a test for generality as part of the AGE project. 3.3.1.4 HEURISTIC KNOWLEDGE ACQUISITION Automatic Rule Formation Methods Given a body of data from which rules are to be formed, together with a basic approach to rule induction, there remains a range of ways in which the data may be utilized, which differ in the degree of parallelism involved in the examination of instances. At one extreme are methods in which rules are formed and refined in a sequence of steps, each step involving the examination of one new instance. At the other extreme are methods which involve a single-pass rule formation process, using all available data. There are, of course, many intermediate possibilities. We propose to investigate, within the Meta-DENDRAL framework, whether some of these methods are optimal in the sense of yielding rules of comparatively high quality with the expenditure of comparatively little computing effort. It is hoped that the investigation will lead us to some general insights concerning the optimal utilization of data in automatic rule formation. Research Plan: a. Develop and implement one or more procedures for updating an evolving set of rules on the basis of newly examined data. These procedures will make use of existing capabilities of the RULEGEN and RULEMOD programs, and will make possible the implementation of a variety of schemes for data utilization, as described above. b. Select and implement.a representative subset of the class of data utilization schemes indicated above, and test their performance in the application area of mass spectrometry. Privileged Communication 73 J. Lederberg Section 3.3.1.4 GENERALIZATION OF AT TECHNIQUES ce. Describe in a technical report these experiments, their results, and the lessons learned. Rule Acguisition via Dialogue Since large stores of knowledge appear to be required for high performance, the process of accumulating that information should be made as easy as possible. The fundamental question here is, how can we make it easy for the expert to tell the system what he knows about the domain. Some initial steps in this direction are described in [Davis76], which reports on the use of what has been labelled "meta-level knowledge" as a basis for establishing communication between the System and an expert. In the simplest terms, meta-level knowledge refers to giving the system the ability to "know what it knows", and can support a wide range of useful abilities. The basic approach developed there relies on the notion of knowledge acquisition in the context of a shortcoming in the knowledge base. That is, rather than simply asking an expert to "explain all he knows about the field", we allow him to challenge the system with difficult problems and observe its behavior. If he indicates at some point that the system has made a mistake, there is available a large amount of contextual information which can aid in the process of knowledge explication and communication. Thus rather than asking "What is there to know about this domain?", we can say "Here is a problem on which you claim tne system made a mistake. Here is the knowledge it used to reach its answer. Now WHAT IS IT THAT YOU KNOW AND THE SYSTEM DOESN’T that allows you to avoid making that mistake?” This appears to be an effective approach to the problem, since it creates a well defined context, allowing the expert to focus his attempt to describe his knowledge of the domain, and provides the system with a set of expectations about the content of the new knowledge it is going to receive. Both of these offer Significant advantages in helping to build up the knowledge base. Working from this foundation, we plan to extend these ideas to provide a powerful system for knowledge acquisition. Currently, for example, the scope of the context is limited to a particular error in the knowledge base during a particular session with the expert. It ought to be extended to provide a wider perspective, so that the system could form more sophisticated expectations about a particular tutor, thereby making communication between them more effective. Thus rather than forming expectations concerning only the shortcoming presently under examination, for example, the system might be able to consider also the past several shortcomings, in an attempt to detect a broader "theme" in the knowledge it was acquiring. Tnere ought also to be more effective control over its use of context. The system is currently too "single-minded", in that it holds tenaciously to any expectations it may have formed. There should be a way of indicating to the system that it has formed incorrect assumptions, and that it should "sit back and observe" for a while until it can get "reoriented". Dealing with large knowledge bases also requires a range of auxiliary capabilities that assist the expert in keeping track of and organizing his work. J. Lederberg . 74 Privileged Communication GENERALIZATION OF AI TECHNTQUES Section 3.3.1.4 Together these constitute a "scratch pad” of sorts that allows him to annotate his new additions, mark existing rules that may need further work, or perhaps examine selected parts of the knowledge base to find areas that may presently be weak. All of these should be aimed at making it possible for the expert to extend his work over several sessions without loss of continuity, and to keep track of both changes that are required and work that has been done, no matter how large the knowledge base may eventually grow to be. 3.3.1.5 GENERAL EXPLANATION SYSTEM The function of an explanation capability is to permit the user or builder of a knowledge based system to determine: 1. in general, how the system solves problems or uses information; 2. retrospectively, how the system solved a particular problem; 3. interactively, how and why the system came up with its current answers. The success of the explanation capability for the MYCIN rule based system indicates the usefulness of this capability in debugsing the system and in making it easier for a user to learn and believe the system’s operations. To make it easier to build explanation capabilities for future knowledge based systems, including systems whose knowledge is embedded in procedures, we intend to construct a system which will provide explanations for a wide class of problem solvers. Given the appropriate trace of a program’s decisions and states, and a model of its problem solving process, it should be possible to answer a variety of well constrained but informative questions about program operation, in general or in a specific run. The aim of this research is to determine what sorts of traces and process models are needed to support selected types of explanations in several classes of knowledge based problem solvers. When the requirements for a class are determined, we intend to implement a general explanation facility to provide the selected explanations for programs in that class. Such a facility should be made useful for several classes of problem solver. The steps of the research will include: 1. Choose the types of problem solvers to wnich the explanation system will be applied; . 2. Select example knowledge based systems of each class (e.g. protein structure modelling as an example of event/medel driven hypothesis formation systems); 3. For each system selected, determine questions to be. asked, and what information, such as traces and process descriptions, are needed to answer them; Privileged Communication 75 J. Lederberg section 3.3.1.5 GENERALIZATION OF AI TECHNIQUES h. Implement a facility which accepts descriptions of problem solver class and enables the user to ask the questions for that class about an example system; 5. Investigate new kinds of explanation capabilities -- for example, how a program’s operation might be meaningfully summarized for several kinds of users, such as domain experts and programmer/system designers. References for this section [ Buchanan? 1] Buchanan BG, Lederberg J, The heuristic DENDRAL program for explaining empirical data, IFIP, 1971, pp 179 ~ 188. { Buchanan72] Buchanan BG, et.al., Heuristic theory formation: data interpretation and rule formation, in Machine Intelligence 7, (Meltzer & Michie, eds), pp 257-292, 1972. {Davis76] Davis R, Applications of meta level knowledge to the construction, maintenance, and use of large knowledge bases, (thesis), AI Memo 283, Stanford University, July 1976. {Davis77] Davis R, Buchanan B, Shortliffe E, Production rules as a representation for a knowledge-based consultation program, Artificial Intelligence (to appear, Jan 77). [Engelmore77] Engelmore, R, and Nii, H Penny, A Knowledge-Based System for the Interpretation of Protein X-Ray Crystallographic Data, 1977. [Lenet76] Lenat, D, AM: An Artificial Intelligence Approach to Discovery in Mathematics as Heuristic Search, Ph.D. Thesis in Computer Science, 1976. (Lesser74] Lesser V R, Fennell R D, Erman L D, Reddy D R, Organization of the HEARSAY II speech understanding system, IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-23, February 1975, pp 11-23. [ MACSYMAT 4] The MACSYMA reference manual, Seotember 1974, The MATHLAB Group, MIT. (Nii77] Nii, H. Penny and Feigenbaum, E, Rule-based Understanding of Signals, presented at Workshop on Pattern-directed Inference Systems, 1977. [Newel172] Newell A, Simon H, Human Problem Solving, Prentice-Hall, 1972. [Nilsson71] Nilsson N J, Problem Solving Methods in Artificial Intelligence, McGraw Hill, 1971. J. Lederberg 76 Privileged Communication GENERALIZATION OF AI TECHNIQUES Section 3.3.1.5 [Pople75] Pople H, Meyers J, Miller R, DIALOG, a model of diagnostic logic for internal medicine, 4IJCAI, pp 848-855. (the system has recently been renamed INTERNIST). {Shortliffe76] Shortliffe E H, Computer-based clinical therapeutics: MYCIN, American Elsevier, 1976. [Stefik77] Stefik, M and Martin, N, A Review of Knowledge-Based Systems as a Basis for a Genetics Experiment Designing System, 1977. Privileged Communication 17 J. Lederberg Section 3.3.2 SOFTWARE EXPORT ALTERNATIVES 3.3-2 SOFTWARE EXPORT ALTERNATIVES Over the past few years, a number of the programs being developed by SUMEYX- AIM projects have reached a developmental maturity where we need to consider ways of meeting the demands to make them operationally available to a larger user community and to export them where appropriate to other sites. Current examples of such programs include the CONGEN biochemical structure elucidation progran, the SECS chemical synthesis analysis program, and the MYCIN, ONET, and INTERNIST medical diagnosis programs. Our present PDP-10 facilities are quite insufficient for meeting the operational needs of this growing group of users, even if providing this level of service were within the SUMBX-AIM mandate. These programs have been written in a variety of source languages (principally various dialects of LISP or SAIL) and are characterized by very large address space requirements. The development medium for these programs at Stanford has been the PDP-10 TENEX environment and the choice of language made to facilitate development and representation of logical program concepts. In contemplating the export of such programs, several points seem relevant: - Development is continuing on the programs to extend their conceptual framework and operational effectiveness. This implies that there must be a low threshold between developmental versions of the programs and operational ones during this phase and that the implementation environment of the programs must be conducive to both. - Because of the complexity of the programs, it is likely that their maintenance and upgrade should be centralized. This implies a convenient means of receiving user feedback and of providing program updates. - Because of the address space requirements for these programs (even after possible rewrite for increased efficiency), it does not appear reasonable to export them via 16-bit mini computers where unwieldy overlay structures would be required to circumvent the addressing constraints. - The target community for these types of programs will be fairly heterogeneous. Users may include academic research groups, industrial houses, hospitals, and educational institutions. One can expect the native computing resources in these various user sites to cover a wide range of hardware and operating systems, not ali existing PDP-10°s. We cannot expect many users interested in the programs to be able to set up a full-seale PDp- 10 site capable of running them. We have been considering a number of mechanisms for exporting such software. These include a) implementing individual programs on machines which could be accessed by interested users over some (commercial) network, b) implementing or (reimplementing existing) individual programs in an appropriate language which is "machine independent" and thereby could be run on a user’s existing computer given some minimum size, or c) making the programs available on an exportable machine (PDP-10 or its more cost-effective descendants) which is compatible with the existing programs and the centralized PDP-10 facilities used for continuing development. J. Lederberg 78 Privileged Communication SOFTWARE EXPORT ALTERNATIVES section 3.3.2.1 3.3.2.1 NETWORK ACCESS There is a growing number of uses of computer networks for program dissemination ranging from business accounting and modeling packages available from commercial vendors to attempts to consolidate research tools such as a collection of mass spectral library search and analysis programs (see for example S. R. Heller, G. W. A. Milne, and R. J. Feldmann, "A Computer-based Chemical Information System", Science, Vol. 195, Number 4275, page 253, 1/21/77). The existing network connections at SUMEX are well-configured for experiments within our capacity on this means of disseminating software. For many such programs, this seems to be well-suited for export; and indeed Heller reports 162 current user groups subscribing to his Chemical Information System. However, unless the network machine runs the same operating system and language in which the program was developed, a conversion would be required and perhaps at the same time a barrier would be established between the continued development of a program and its operational use. This appears to be the case for at least one proposal for a network-available version of our CONGEN program. The DENDRAL project has undertaken a very laborious conversion of CONGEN from its native LISP implementation to one in MAINSAIL to achieve a level of exportability for lack of other immediately available mechanisms. Other aspects of this approach involve security and privacy. Some of the data used with these programs are sensitive (patient records, or private, unpublished information on chemical structures, ete.). Having such a public access as over a network can create problems in protecting these data; and individual user groups may prefer to run the programs on machines which are under their local control. Finally, since many of these tools are in the research domain, it is not clear that they would be cost- effective in a commercial environment. 3.3.2.2 MACHINE-INDEPENDENT LANGUAGE TMPLEMENTATION An ideal which has been long sought for program sharing is to develop languages with "universally" accepted standards and which are implemented in machine independent ways so that programs running on one machine environment will run in another with a minimum of conversion effort. This of course involves both language implementation and application program implementation concessions to achieve effective machine independence. We are working on a machine independent version of the SAIL Language called MAINSATIL now to experiment with these sorts of issues. Our detailed plans for MAINSAIL development are given below including the possibility of special microprogrammed machines which may most economically and efficiently run MAINSAIL. Practically speaking, the machine-independent language approach is best-suited to the design of new program systems; and in the particular case of MAINSAIL, to those that can be effectively expressed by means of an ALGOL-like language. For existing programs, an extensive conversion would be required. We are still exploring tne full range of implications of language choice for AI programs such as are being developed on SUMEX but it is likely that MAINSAIL cannot be a universal substitute for the full range of languages (including LISP) useful for these programs in both operational use and on-going development. MAINSAIL is nevertheless a definitive step toward understanding the requirements, advantages, and costs of machine independent systems. It may offer a useful base for implementing all or parts of new systems as well as for the ultimate reengineering of existing systems as they become fully operational. Privileged Communication 79 J. Lederberg Section 3.3.2.2 SOFTWARE EXPORT ALTERNATIVES 3.3.2.3 EXPORTABLE (PDP-10) SYSTEM An alternative view is that with the dramatic downward plunge of hardware costs, the costs of software development should play a larger and larger role in determining software/hardware optimizations. An attractive solution involves a PDP-10-like machine which could run the existing software intact and which could be made available for a reasonable cost to interested user (or network) groups. Since the machine could run the native operating system and language in which the program was developed, the initial conversion would be minimized and future developments (either conceptual or for improved efficiency) would be readily incorporated. Furthermore, a given user group could (perhaps with a change of microcode or system) run programs from various PDP-10 environments. By using network communication facilities, such satellite machines could retain contact with central development efforts, share files or data bases where appropriate, and provide a means for cost-effective incremental expansion by adding more such satellite machines or upgrading to a larger PDP-10 configuration when usage justifies. In this sense, this option is really a variant on the first network option using a more flexible hardware capability which can adapt better to individual program and development group/user community needs. This approach may be best suited for this intermediate stage in AI program development where continued research and improvement is going on while extensive operational access is demanded. An economical export by tnis means defers the need for reprogramming until the design is fully stabilized and ready to be "cast in concrete", Nevertheless, even if the host machine is very inexpensive, in the long term if a factor of 10 improvement in speed or the number of users supported is possible by reprogramming, then a reimplementation will likely be warranted eventually as development tapers off and more and more users demand efficient production runs. J. Lederberg 80 Privileged Communication EXPORTABLE MACHINE PLANS Section 3.3.3 3.3.3 EXPORTABLE MACHINE PLANS Because of the already large effort that has gone into other existing software systems we are attempting to export, the "exportable machine" option may offer a substantial advantage in minimizing conversion efforts, maintaining contact with program development groups, and offering a cost-effective way for even relatively small groups to use these programs. This is particularly important in just moving from the strictly developmental phase into a combined development/refinement/operational stage. For our purposes, such a machine could be either a hardware-designed PDP-10 or a microprogrammed emulation of this machine. As a tentative functional configuration we would like the machine to perform at about the speed of a KI-10 with several users including: - PDP-10 instruction set and "BB&N" paging facilities ~ at least 256K logical address space - 256K physical memory size (36 bit words, < 1 microsecond cycle) - memory interface for swapping device and small file system including at least 200M bytes of disk storage - facilties for about 16 terminals - 200-300 lpm printer ~ slow tapes - some kind of external bus interface (I/0 bus, UNIBUS, etc.) - facilities for network communication connections The cost for such a system (CPU, memory, and minimal peripherals) should ideally be in the range of $50,000 - $100,000. This may be below the initial announcement price for such machines but should represent realistic longer tern pricing possibilities. A number of vendors may be working on the planning stages of such a machines which could be announced within the next 18 months. We budget for an initial version of such a machine at $200,000 based on very general pricing estimates (noting also that no vendor announcement has been made). The detailed alternatives and plans for this acquisition will be reviewed with the AIM management committees before implementation. The detailed requirements for integrating such a machine into the SUMEX-AIM resource are also necessarily vague since this will depend on needed operating system and user support changes to accommodate the reduced size and perhaps different memory management system (paging). These changes may also reflect themselves in modifications for the language support underlaying the programs we want to export. We expect to track these develooments closely during the first year of the follow-on grant and to formulate a plan for acquiring such a machine for experiments in packaging our AI programs for export. We will only be able to assess the required level of system software work when the details of the vendor systems become known. The budgetary details are discussed in the "justification" section of the five year budget plan. These kinds of machines may also offer an effective way to incrementally expand the capacity of facilities like SJUJMEX and we will review them in this context as well (see the discussion of facility hardware upgrade plans on page 62). The main issues arising in coupling such satellite systems to the central facility as independent machines involve managing a distrivuted file system, Privileged Communication 81 J. Lederberg Section 3.3.3 EXPORTABLE MACHINE PLANS convenient terminal routing, and allocating users between machines. These are all manageable problems within existing technology such as we employed in developing the initial dual processor implementation. Since we are operating on fully amortized hardware, the indicated time table is driven by the real costs of system software modernization and compatibility of maintenance. Local users will be less injured by persevering with dated systems than a wider community to which software must be efficaciously exported in a contemporaneous idiom. J. Lederberg 82 Privileged Communication MAINSATL DEVELOPMENT PLANS Section 3.3.4 3.3.4 MAINSAIL DEVELOPMENT PLANS The on-going MAINSAIL development effort was described earlier as part of our detailed progress report. A summary of language features can be found in Appendix III on page 231 (see Book II). This section summarizes the planned directions for future MAINSAIL developments. These efforts have two complementary thrusts: 1) development as a programming system and research tool and 2) demonstration of implementations for additional target systems. The first area is independent of what machines are used as hosts and seeks to explore the design ramifications, programming techniques, and advantages and costs of machine independence. The second area addresses the acquisition of practical experience in the export and use of MAINSAIL on real systems and the issues involved in gaining user acceptance of MAINSAIL as a programming tool. 3.3.4.1 DEVELOPMENT MANAGEMENT In the early phases, the design for MAINSAIL was developed by Mr. Wilcox with a range of community inputs collected in relatively informal exchanges. These have included discussions with the designers of the SAIL language, studies of other languages (PASCAL, ALGOL-60/68, and SIMULA in particular), comments on our preliminary design documents from interested groups, presentations and discussions at several DECUS symposia, and community experimentation and critique of evolving MAINSAIL implementations. Our network connections have been invaluable in this regard, providing access to our documents, allowing rapid responses to suggestions, and providing a means for network collaborators to experiment with MAINSAIL on their own machines as implementations have become available. As MAINSAIL achieves a more operational status and we receive feedback from a larger community, we will reexamine many of these initial design decisions based on criteria of generality and effective portability as well as community acceptability. In this process we will formalize our user community contacts to take better advantage of their suggestions for system evolution and for effective system maintenance. We will, of course, provide a mechanism for reporting community comments (most easily done via networks) and may organize workshops or participate in other meetings to disseminate and discuss MAINSAIL. The AIM Executive Committee will play a key role in advising about development plans and making priority trade-offs within our limited available resources. 3.3.4.2 LANGUAGE DEVELOPMENT Interrupts: We are currently investigating the implementation of both deferred and immediate interrupt facilities for MAINSAIL to give the ability to stop a program in the midst of execution, communicate with an interrupt-driven i/o device, or synchronize cooperating processes. A key issue is how to coordinate interrupt control transfers with on-going dynamic memory and storage management. This is particularly critical for immediate interrupts as may be Privileged Communication 83 J. Lederberg Section 3.3.4.2 MAINSAIL DEVELOPMENT PLANS needed for real time applications. It may be necessary to restrict the range of language facilities available during such interrupts. We will continue these studies and implement appropriate interrupt handling support. Concurrency: The current implementation of MAINSATL has been designed with concurrency in mind, and appears to provide a solid base. We must complete the definition of the role of concurrency in MAINSAIL, then specify a set of primitives needed to support concurrency. There will then be an efficient implementation of these primitives including a convenient and flexible user interface, Minimize runtime checking: Much of the code produced for runtime checking could be eliminated if the compiler "understood" more about the program. We propose to give MAINSAIL the ability to verify that certain conditions are met within the program so that more checking can be done at compiletime, and less at runtime. This involves exploration of what features MAINSAIL should include to allow the programmer to help in this process. LEAP: LEAP is a facility in SAIL which provides an associative data store to allow the retrieval of data based on the partial specifications. We have encountered a number of prospective MAINSAIL users who have used and feel a need for LEAP. We plan to investigate the most useful features in LEAP which should be incorporated into MAINSAIL. It should be pointed out that many of the facilities of LEAP can easily and efficiently be coded in MAINSAIL using RECORD’s. 3.3.4.3 COMPILER DEVELOPMENT Increase speed of compilation: There is much room for improvement in the speed of compilation. The current version was designed for flexibility rather than efficiency. Most important is a close look at the synbol-table lookup, for that is where (the first pass of) the compiler spends most of its time. Improve error detection and recovery: The compiler’s error detection and recovery is now rather primitive. In general the entire edit-compile-debug loop should be streamlined for user convenience. We propose the utilization of a text editor as an integral part of compilation, so that MAINSAIL can automatically Switch between compiling and user editing. Machine~Independent code optimization: The first pass of the compiler produces an intermediate language which is the same for all target machines. This intermediate language is simply a recoding of the source file into an assembly-like language which reflects the properties of MAINSAIL. Various machine-independent transformations could be carried out on this intermediate text to translate it into an equivalent but more efficient representation of the source program, Machine-dependent code optimization: The MAINSAIL code generators, themselves being MAINSAIL procedures, can be more readily written to utilize J. Lederberg 84 Privileged Communication MAINSAIL DEVELOPMENT PLANS Section 3.3.4.3 complicated algorithms and data structures if necessary to generate efficient code. At present, the primary hurdle to a thorough analysis of the intermediate code by the code generators is the lack of a "look ahead" facility. We propose adding to the second pass the ability to build a machine-independent structure, on the procedure level, which can be interrogated by the code generators prior to generating code for a procedure. This would allow the code generators to make decisions based on a global knowledge of a procedure. 3.3.4.4 RUNTIME DEVELOPMENT The runtime system is composed of modules which support the code generated for a user module. A single small module, called the kernel, is permanently resident, while all other modules are swapped as necessary. Tne modularity of the runtime system is what allows MAINSAIL to run in a small address space. Optimize system modules: To a large extent, the efficiency of the system modules determines the efficiency of user programs. Thus it is well worth our time to optimize these modules. We propose to develop some modules which measure system performance. These would also be made available to users to help them evaluate their programs. A profile of a program, reporting how many times each Statement is executed, is also proposed. The primary use of these performance measurements will be for the tuning of memory allocation, swapping and garbage collection. MAINSAIL is largely independent of the exact strategies utilized, thus providing much leeway in working with alternate approaches. These algorithms need to be separately tuned for each implementation. Virtual data space: MAINSAIL now supports the swapping of control sections, which could be considered a form of virtual control space. We are interested in studying whether this same form of support can be extended to data. Now that MAINSATL can support a virtually unlimited control space (by breaking the program into modules), an implementation will be limited primarily by the amount of data which must be resident. We propose to add facilities to the language which allow the user to help structure the data so that it can be efficiently moved between memory hierarchies. Support data operations: Machines which do not directly support the data types which MAINSAIL offers will need additional support modules. In particular, we need to write machine-independent modules to perform arithmetic on long integers, reals and long reals. Runtime certifier: We will need a runtime certifier, i.e., a set of modules which give new MAINSAIL implementations a thorough workout, comparing the results with those obtained from running MAINSAIL on other machines. We have been using the compiler for this purpose, but it does not exercise all facilities of MAINSAIL, e.g., real and long real. Privileged Communication 85 J. Lederberg Section 3.3.4.5 MAINSAIL DEVELOPMENT PLANS 3.3.4.5 DEBUGGING SYSTEM DEVELOPMENT We feel an effective and integrated debugging system will play a key role in the utility of MAINSAIL. Our goal is to provide interactive debugging capabilities comparable to those of INTERLISP which can significantly increase programming productivity. The combination of comprehensive debugging facilities with efficient production execution will help bridge the gap between program development and operational use. The basic approach involves the integration of the now distinct phases of source text editing, compilation and execution. An internal representation of the program will be maintained which can serve a variety of purposes. This representation will be interpreted during debugging so that MAINSAIL can monitor execution and interact with the user in a manner which reflects the program structure. Errors can be corrected by editing this structure, and execution continued with no need for recompilation. Program text can be generated from the structure in a standard format, including the original variable names. Machine code can be generated from this same structure, and compiled and interpreted code intermixed during execution. This provides fast execution of debugged modules along with interpreted execution of modules under scrutiny. Interpreted execution will allow for the interrogation of variables, setting and removal of break points, procedure trace, and single stepping. We plan to integrate these capabilities with a display terminal under the control of an editor, though the debugger will also operate from a hard-copy terminal. A split- screen facility will allow the program text to be viewed during execution along with any output from the program. There are a number of difficult problems to be resolved concerning the relationship between the original source text (if any) and its internal representation which may be edited during debugging. Unlike LISP, the MAINSAIL syntax requires a significant amount of compilation before it can be put into a form which can be interpreted with reasonable efficiency. 3.3.4.6 DOCUMENTATION PLANS Language manual: The currently available documentation for MAINSATL consists of a preliminary language reference manual. It will be rewritten and expanded to be useful to users unfamiliar with SAIL. Runtime manual: We will also provide a runtime manual which explains what happens during program execution. This information can be enlightening when designing a program, though its primary purpose is to document the machine- independent runtime system. This manual will also be necessary for the implementation of MAINSAIL on a new machine. Code generation manual: A third manual, the code generation manual, will describe how to write code generators. This involves a description of the intermediate code, and how it is presented to the code generators. The goal is to J. Lederberg 86 Privileged Communication MAINSAIL DEVELOPMENT PLANS section 3.3.4.6 describe the code generation process in sufficient detail to allow any user to write a complete set of code generators. In this way the burden of implementing MAINSAIL on new machines can be dispersed. System implementation manual: The system implementation manual will describe how to write the machine-dependent parts of the runtime system. This manual will describe what procedures need be written, and the data structures and other procedures with which they interact. It will also describe all the parts of MAINSAIL, how they fit together, and how to build a new system. 3.3.4.7 MAINTENANCE AND DISTRIBUTION PLANS The maintenance and distribution of MAINSAIL could easily overwhelm us if we do not carefully plan for it. This is a good opportunity to bring someone else into the project, since it presents the chance to become familiar with the inner workings of the system. Local experts: Each site must have a local expert who can repair errors in the machine-dependent portions and make patches to the machine-independent parts prior to receiving a new version which incorporates the changes. Another role for the expert would be that of liaison between the local user community and SUMEX. Questions and bug reports should first be directed to the local contact, and then directed to SUMEX in a form standardized across all sites. SUMEX liaison: As MAINSAIL begins to be used at a number of sites, we would expect the number of inquires from potential users to rise to the point where it could require an inordinate amount of time from the developers. We propose that an additional person be hired at SUMEX as a liaison for MAINSAIL. This individual must be capable of fixing bugs and generally keeping current versions of the system healthy. The liaison will keep in touch with the local experts, and pass to them any necessary updates. This involves making tapes and sending them through the mail; editing the documentation, overseeing its printing and distribution; responding to inquiries from potential users; consulting with new users concerning program design (but not actually writing user’s programs); and new user orientation. 3.3.4.8 PLANS FOR ADDITIONAL IMPLEMENTATIONS The current implementations are for the PDP-10 and PDP-11. These give us experience on medium and small scale machines. We plan to nold off on introducing additional implementations until we have received sufficient feedback from these. It appears that the orchestration of parallel implementations on a wide variety of machines will rival the technical problems. Privileged Communication 87 J. Lederberg Section 3.3.4.8 MAINSAIL DEVELOPMENT PLANS We have surveyed a large number of computer systems while designing MAINSAIL. Most of these are known to us only through manuals, so that further study will be necessary to determine how well a particular system could support MAINSATL. Among the machines surveyed are: IBM (350/370, Series/1), CDC (69000 Series, 7600), UNIVAC (1100 Series), Texas Instruments (990), Honeywell (Level 6), Varian (V70), Hewlett-Packard (3000 and 2100), Data General (NOVA, ECLIPSE), Interdata (16 and 32 bit series), SEL (32), Harris (Slash series), Burroughs (B1700) and MODCOMP. We plan to keep abreast of new computer announcements, since we are in the position of relatively easily providing software for emerging hardware. Choices for target systems will be based on user demand and priorities established in consultation with the AIM management committees. We are projecting approximately two man-months to create a new implementation, though this will vary according to how well the target machine and operating system fit MAINSAIL, and the availability of a target system during the early design iterations. Additional time will be required to actually install the implementation at the target site, have it thoroughly tested, distribute documentation and make it generally available. There are, of course, problems in developing MAINSAIL for a machine to which we have no access. The code generators and operating-system interface can be written independently of the target machine, but the debugging of these will require access for a period of at least a few weeks. It would not be acceptable to implement a machine by sending tapes through the mail. There appear to be four possibilities: access over a network; access to a nearby machine for which MAINSAIL has been implemented; rent or borrow a machine for the duration of the development; emulation of the target machine. 3.3.4.9 MAINSAIL OPERATING SYSTEM PLANS In the course of designing the operating systen interfaces it has become apparent that MAINSAIL needs very little support from any machine-dependent operating system, at least with regard to the execution of a single program. We feel that in many cases we could provide our own stand-alone version of MAINSAIL for single-jobd environments. Technology seems to be pointing in the direction of less expensive computers which can be dedicated to a single user at a time, and these would be the initial target of our operating system. In the context of a single-job systen, MAINSAIL’s primary need is a file system and device drivers. Once our primitive operating system is written in MAINSAIL, it should not be difficult to add monitor commands and utilities such as file manipulation. Of course the MAINSAIL operating system would be special Purpose in that it would support a single language, with everything designed around that language: The main elements of our operating system would be the compiler, a text editor, the MAINSAIL runtime system, and the additional modules to support the file system and i/o. MAINSATL does not need a linker, overlay system, or loader (the swapping of modules takes care of those). Additional components of the system could simply J. Lederberg 88 Privileged Communication MAINSAIL DEVELOPMENT PLANS Section 3.3.4.9 be added as new modules. A goal would be to design an open~ended operating system kernel which could be extended by the user as desired. 3.3.4.10 MICROCODED MAINSALL MACHINE PLANS We have thus far been discussing the achievement of portability by making MAINSAIL fit existing machines. If the reason for portability is understood as the desire to provide an economically viable way of distributing software, then another approach is to make the hardware fit MAINSAIL, and distribute the hardware along with the software. We propose to design an "optimal" representation of MAINSAIL code for emulation by a microprogrammable computer; to purchase a suitable computer for MAINSAIL emulation; to implement MAINSAIL and the supporting microcode on this computer; and to evaluate the resulting system to determine the economic and technical feasibility of distributing such an integrated hardware-—software programming environment. Details of our plans are given in Appendix IV on page 235 (see Book II). We expect considerable improvement over implementations for existing machines which have been accommodated to less than optimal, and in some cases quite poor, instruction sets. Many benefits accrue from such an approach, and it is likely that microcoded hardware, specialized to a particular language or application, will play an increasingly important role in the development and operational use of future software systems. We expect a microcoded MAINSAIL to outperform other MAINSAIL implementations in much the same way that DELtran (a "directly executable language" (DEL) implementation for FORTRAN II) outperforms FORTRAN II(4). Initial measurements show that the DELtran representation is less than one fifth the size of the code generated by the FORTRAN-H optimizing compiler, and executes about five times faster. MAINSAIL is perhaps better suited to the emulation approach than FORTRAN because of the locality of reference provided by procedures, records and modules. A preliminary DEL has already been designed for MAINSAIL, but further work is necessary before we can predict (or demonstrate) size and execution comparisons with standard implementations. This work will complement the on-going implementations of MAINSAIL on conventional hardware. Thus we will be in a unique position to compare the two approaches. The combination of a microprogrammed machine with the MAINSAIL operating system could result in a system optimized for the execution of MAINSAIL programs. AS hardware costs continue to fall we see this approach as a realistic way of providing a powerful system at a low price. We are interested in determining em ne ee ee are ee ee ae ee re re re a ee re ee re ee re re ee ee na ce ar ee ae ae ee ae ee re ee ee ee re ee ee ee ee we ee ee (4) See Hoevel, L. W. and Flynn, M. J., "The Structure of Directly Executed Languages: A New Theory of Interpretive System Support," Stanford Digital Systems Laboratory, Technical Note No. 108, Stanford University, March 1977. Privileged Communication 89 J. Lederberg Section 3.3.4.10 MAINSAIL DEVELOPMENT PLANS whether a "soft" machine of this sort can be provided cheaply enough to serve as a basis for the export of software which presently requires extensive hardware facilities. 3.3.4.11 DEVELOPMENT OF PORTABLE SOFTWARE We would like to see a collection of portable programs developed in MAINSAIL both to serve as examples of portable software, and to provide support to those sites which begin to rely on MAINSAIL as the primary programming resource. Such software development will also help us debug MAINSAIL, familiarize the programmers with it, and spread its use. We are aiming for the complete support of a stand-alone MAINSAIL implementation which is aligned with developing hardware trends, i.e. video displays and compact, relatively inexpensive computers and peripherals. We do not now have the facilities to implement all of this software at SUMEX, and thus expect to collaborate with others in its design and implementation. It is imperative that the software be portable except possibly for certain well-defined modules which need support outside MAINSAIL (e.g., special device support). Display editor: A MAINSAIL text editor is at the core of a number of planned developments. Our interest is centered around a display-oriented editor because of its clear superiority over hard-copy editors. The TV~EDIT program now in use at Stanford and a few other sites is an excellent base of development, especially since it is written in SAIL. We would like to see additional features added to what TV-EDIT now possesses. Our intended applications for compilation and debugging require a split-screen facility, and a multi-file capability. It must direct all communication with the display through a display package, as described below. This separates the editing functions from the display functions, so that the editor is independent of the display and hence can be used with a variety of displays. Display package: A display package is necessary as part of the editor, and is also important as a package for use by other programs. The display package will accept standard commands to control a display terminal. It must be smart enough to simultaneously maintain several areas on the screen. Such a package will be machine-independent (as much as possible), but have terminal-dependent modules which feed the terminal hardware commands to effect the machine- independent commands. It should be able to drive a hard-copy terminal as if it were a limited display terminal. Graphics package: Similar to the display package is a graphics package for drawing pictures on a graphics display device. This package would allow for the description of pictures, the choice of display device, and the display of the pictures. This package would be machine-independent and display-independent. The OMNIGRAPH systen developed by Sproull at NIH may form the basis for this package. Document preparation: A simple document preparation program would serve as J. Lederberg 99 Privileged Communication MAINSAIL DEVELOPMENT PLANS © Section 3.3.4.11 the "back end" to the display editor. We feel that much of the work of current document programs could be provided by the editor in a form providing instant feedback. Thus the primary purpose of the document program would be to provide Sliobal processing, e.g., to generate a table of contents or index, and fill in symbolic references with appropriate chapter or section numbers. Math and statistics packages: MAINSAIL currently has a mathematics package with trigonometric and logarithmic functions. These functions need additional testing for accuracy, and should be augmented with other functions, e.g., a random-number generator. There is also a need for a statisties package. Privileged Communication 91 J. Lederberg AVAILABLE FACILITIES 4 AVATLABLE FACILITIES The existing SUMEX-AIM computer and communications configurations have been described in earlier sections. The number of personnel to support this follow-on work will remain at approximately the same level as before so no additional office space will be required. We anticipate no changes will be needed for the machine-room facilities. J. Lederberg 92 Privileged Communication