3.2 An Experimental Planning Program The central, and most interesting part of the Molecular Genetics project will be the experiment planning program (PLANEX). PLANEX is meant to be an interactive progran which conbines the intuition and expert knowledge of a molecular genetics investigator with the thoroughness of a conputer having a detailed knowledge base. The investigator will sketch the initial conditions for an experinent and the desired final condition. PLANEX will be developed to allow the user to specify required or suggested intermediate steps. PLANEX will Suggest intermediate steps, additional options, and verify the expected results within the limits of the knowledge base The progran will initially be designed to check the steps of an experiment, and possibly Fill in the details between small steps. The direction of development of PLANEX will be toward a program which can eventually take bigger steps, interpret less precise requirements by the experimenter, and offer more useful alternatives based on the knowledge base. The investigator could request varying degrees of detail and, at all tines, the heuristics and reasoning tools used by PLANEX as it evaluates alternatives would be accessible to a user via an explanation systen. Freed from the constraints of checking all details, the experimenter could explore the possibilities of many experiments before choosing one and also have novel experiments presented for his consideration. A scenario for a possible run of PLANEX might be the folloving. We suppose that a molecular biologist has a new restriction enzyme 26 (call it R12) and that he wants to consider alternative experiments for determining its specificity, i. e.. the specific site on the DNA molecule for its application. His first step might be to create an enzyme description of R12, using the procedures for entering information about any enzyme in the MOLGEN knowledge base. The description would include such information as its nane, enzyme classification ("“endonuclease” if nothing more specific is known), IUPAC number, cost, availability, stability, salt activity tables, substrate description, names and concentrations of impurities known to be present. When the available information about the new enzyme was entered, the user would then call in PLANEX. He would tell PLANEX that he wanted an experiment to determine the specificity of the enzyme R12. PLANEX would ask if the user has digested some DNA to exhaustion and determined the initial and final segment sizes. (Fron this, the length of the restriction sequence can be estimated. We assume that this length is estimated to be five nucleotides.) PLANEX now interprets the user’s experiment as the following: Given the initial state as follows: Initial State: Segments of unknown base sequence (resulting fron complete digestion of phage DNA by R12). construct a sequence of steps to the final state where: Final State: 1. The identities of the last few nucleotides on the 5” ends of the fragments have been determined. 2. The identities of the Last few nucleotides on the 3° ends of the fragments have been determined. 27 For simplicity, let us assune that the user is willing to limit his initial goal to determining the identity of the terminal nucleotide and that he wishes first to do this for the 5” ends of the fragments. The cuvices fur doing tuis rtueiude A) Label the 5° termini of the fragments with radioactive phosphate groups followed by a separation procedure, B) Convert the 5% end to hydroxyl using phosphotase. The terminal base can then be distinguished from the other nucleotides by chromatographic means after a 3% to 5’ exonuclease digestion. Successful reasoning by PLANEX at this level will depend on characterizing the available options. The nore general the classifications and heuristics, the more apt PLANEX will be at generating new conbinations of techniques. For example, the two methods of terminal nucleotide analyses mentioned above would fit into the following general scheme: To determine the identity of the 5° terminal nucleotide on an oligonucleotide, 1) "Label" the end nucleotide. 2) Break the oligonucleotide into pieces which can be separated. 3) Identify the pieces which are labeled. In this context, a "labeling" means any technique which makes the piece containing the terminal nucleotide distinguishable in sone separation and identification procedure. It would include the above techniques as well as, for example, replacement of the terminal base in a predictable way by a base analog (as in the "turnover" technique using Polymerase). 28 Let us suppose that one of the experiments that the user wants to consider at this point is method (8) fron above, that the Snake Venon 3° to 5% exonuclease has been chosen to break the oligonucleotide into oieces, and that the separation technique is a type of chromatography capable of distinguishing nucleotides from nucleosides and determining their identity. The experimental plan at this point looks like the following: Initial State: Mixture of oligonucleotides of average length 200 nucleotides with unknown 5° terminal nucleotides. Operation: Apply Phosphotase. State: Mixture of oligonucleotides of average length 200 nucleotides with unknown 5° terminal nucicusides. ~ . , oUt a + ad Vpertairuus AppayY Olkane VettunM o Ly oO eCKAUMUL LEASES» wlatc: raAtcuce ut fluGteurctiucs anu (cetmattar) Nutacues.uecse Vperacsvuns oO¢partate uutreusiues ifrum nuctievtiaes ana determine identity of nucleosides. Final State: Nucleosides have been identified. (Identity of 5° terminal nucleotides of fragments has been determined.) At this point, the experiment is thoroughly outlined, although there are a number of smaller steps still to be determined. The user asks PLANEX to fill in some more details. This means that PLANEX should generate the intermediate steps so that the "required input” for each operation is matched by the "output" of the previous step, that is, so that there is a complete sequence of states and operations fron the 29 initial state to the final state. In this case, PLANEX suggests the use of Pancreatic endonuclease after the Phosphotase step and before the Snake Venom step to reduce the length of the oligonucleotides as required for more rapid action by the Snake VYenon exonuclease, Similarly, a denaturation step may be inserted before the Phosphotase step. The generation of both of these steps is caused by the interpretation of the enzymatic knowledge base for the enzymes used in the operation. At a finer level of detail, PLANEX will consider steps which adjust the pH or ionic concentrations to maximize the reaction yield. Heuristics, under user control, weigh the various considerations which lead to the generation of these subgoals into a hierarchy - so that the “more important" criteria are considered first. Finally, the user may ask PLANEX to estimate the yields, costs, and time required to perform the overall experiment. At any point in a session, a user could backtrack to explore a different possibility. The ability to compare several different experiments is useful in cases where confirming experiments are used to guard against experimental error. In many cases, planning would not proceed to the end of an experiment -- as when the results ata particular step dramatically affect the selection of the following step. An example of this occurs in the prologue of the scenario experiment, when PLANEX asked the user for information necessary to estimate the length of the recognition sequence of the enzyme. Had the user elected to determine more of the sequence than the end nucleotide, 30 this information would have been essential in choosing between methods which use overlapping sequences. The user’s choice to identify only the end nucleotide greatly simolified the experiment. 3.3 An Enzyme Simulation Program Enzymes form the primary tools geneticists use to manipulate DNA structures. The nost common types include exonucleases, which break the backbone phosphodiester bond starting fron an end, gap, or nick; endonucleases, which break a internal backbone bond; ligases, which seal a break in the DNA backbone; and polymerases, which add bases to a primed single DNA strand and fill in gaps in double strands. As mentioned, a special type of endonuclease, the restriction enzyne, functions to break the DNA backbone at very precisely specified sites. All of these processes must be simulated to provide accurate modelling of enzymatic action. One of the first processes that we will model will be the ligation of endonuclease~generated DNA fragments into linear and circular structures. The sinulation program will operate in the following manner. The program is given the detailed action to be carried out (e.g. apply a 3° to 5° exonuclease) and the initial pool of the various types and concentrations of DNA structures present. [It will decide, using advice from the user, what structural features are important in this experiment, and focus on those types as the simulation proceeds. The program will choose an operator function and apply it to a structure 31 selected stochastically from the pool, producing a possihly new structure. This may either increase the concentration of one of the present structures (decreasing that of another) or add a new structure to the pool. The process will continue until all structures are inert to enzymatic action, or until specified time interval has passed. One major representation difficulty for the simulation program is that the nunber of DNA structures present in an actual experinent is often in the billions. Offseting this problem is the fact that many of the DNA structures can be considered essentially identical, but only within the context of a particular experiment. That is, the criteria under which structures may be considered to be identical are dependent upon the particular experiment. For example, topology and lengths of segment are most important in the ligation experiment mentioned above and precise nucleotide sequences interior to the DNA chains are of little significance. In other experiments, doninant features involve the locations of nicks and gaps. During the simulation, a structure must be “instantiated" from a description in the pool of structures to a level of detail consistent with the intent of the experiment. Then the enzyme action is carried out on the structure resulting perhaps in several changes. Finally, the resulting structure must be reincorportated into the pool. If it is "equivalent" to another structure, then it is a simple matter to increase the appropriate concentration. Otherwise, a new structure must be added to the pool. The idea is to pick “equality criteria” and “instantiation details” 32 broad enough to keep conputations reasonable but narrow enough so that the results of the simulation correspond to laboratory results. A second problem in simulation is the handling of impure enzymes, as for exanple, an exonuclease with endonuclease impurities. This may involve the construction of an event queue type of simulation in which the minor enzymatic action occurs as often as the relative concentrations indicate. Finally, a difficulty occurs when not only qualitatively accurate answers are required from a simulation progran, but also precise values of DNA structure concentrations at any monent in experimental real time. This means careful checking, probably by our geneticist consultants in the laboratory, of all contradictory rate constants, as well as possibly adding a level of mathematical rigor to sone already designed models of physical processes, e.g. probability of DNA ends ina test tube solution coming close enough to join. Again, we wish to emphasize the human engineering aspects we intend to ‘puild into all of our programs and probably we can rely on our experience with DENDRAL and MYCIN. Full facilities for exanining intermediate results in a natural manner to geneticists will be provided, as will powerful interactive methods of control. The user will be able to easily modify rate constants, starting DNA concentrations, and physical properties like temperature and pH during the simulation, and he will be able to trace a process’ backwards and restart from any point with new parameters. The simulator will 33 interact with the MDNA structure editor to allow facile entry, modification, and display of all structures. 34 3.4 Knowledge Base The knowledge which must be represented in a problen solving systen can be classified into three major categories: 1. knowledge which can be computed using a fornal algorithm 2. knowledge (rules or procedures) for which no well- defined algorithm exists but for which good heuristics (based on expertise in the field) exist or can be deveivpeu. 3. factual data A strong attempt will be made to represent knowledge in a uniform manner. Every item in the base could be viewed by the systen in terms of a transformation at sone level of detail. Sone transformations combine, separate or modify substances, sone merely increase knowledge. A planning progran could view all data in this manner. Certainly much of the knowledge in the first two categories can be represented by procedures or rules, while nany different data structures will be used for the representation of factual data. Sone of the factual data may be incorporated into an algorithn or heuristic procedure, The knowledge base will be organized in a hierarchical manner so that it is easy for the system to access specific subclasses of inivrcmatitun, such as enzyme knowledge, speciric experimental tecuniques, Or YUNA Structural aata. Central to the design of the knowledge base will be ensuring 35 that data entry and modification by the expert geneticist is done ina way natural to him. This means providing a descriptive language which allows the geneticist to express the diverse types of knowledge in a language that is appropriate to the problem domain. The MYCIN systen offers an excellent example to follow. It translates the input of the expert to an internal representation and then gives the expert a paraphrase of the input. The expert can correct the paraphrase interactively until he is satisfied that the program has understood correctly. With the diversity of knowledge MOLGEN is intended to handle, we may ultimately have several different language subsets for specialized use. It is particularly important in a rapidly growing field such as molecular genetics, that the knowledge base be easy to modify and expand. Again, the MYCIN example is an excellent one. Any user can add new rules to his own working space. If these rules prove useful, the system staff adds them to the MYCIN program. A cdifiacull, suporcanct prooiem 1s tne cnecking tor internal consistency of the knowledge base. Eventually, we hope to develop methods to check the internal consistency of subsets of the knowledge base. For example, inconsistency in the enzyme descriptions could cause application errors which would appear as incorrect planning steps. Checking for consistency of the enzyme subset of the knowledge base could alleviate this problen. Another feature of our knowledge base will be a Literature 36 reference or other source identification for each item represented. This source documentation will be referred to by the explanation systen and will also be directly retrievable. All of the design criteria outlined for the knowledge base in general apply to the enzyme knowledge. It can be ordered hierarchically: by enzyme function, initial substrate, product substrate, pH levels. There is knowledge that fits into each of the three general categories mentioned above. Furthermore, the type of information needed for each enzyme is similar: name, reference, basic type, substrate description, reaction catalyzed, and modifying information about parameters such as pH, salt concentration and inhibitors. We expect the design of our enzyme knowledge base to bea dynamic process lasting at least a year. The description language will surely change as geneticists attempt to supply information using it. Building a reasonably complete file for basic experiments will take time and effort for both computer scientists and geneticists. An exanple of how an enzymatic description might be used by the simulation and planning programs would serve to clarify the need for conprehensive data. Ligase, and its simple function of "sealing" a nick in the DNA backbone by making a single phosphodiester bond, has been briefly mentioned previously. A straightforward simulation problem would be to determine relative populations of circular and linear DNA after given periods of time of application of ligase to 37 known DNA Structures. For this simulation to be accurate, vrecise rate constants of ligase action, and how they are affected by conditions like pH, salt concentration, temperature, etc. must be provided in the enzyme knowledge base. In general, the simulator will be accessing the chemical details of the enzymatic mechanism. The planning program, however, requires more information on applicability of enzymes’ to the problem being considered--what substrate will a given enzyme act on, what types of DNA will compete with, or inhibit the desired enzynatic action. For example, if the geneticist wished a plan for inserting a segment of foreign DNA into a host molecule for replication, the planning program would have to pick an appropriate ligase from a selection of possible candidates. Discriminating factors would be those just discussed, substrates and inhibitors, as well as how well experimental conditions would fit in with the rest of the plan. To summarize, the simulator needs "acting" information; the planner requires "discriminating" information. The organization of the knowledge base is central to the design of the system. The enzyme knowledge base will be used to test the ideas sketched here. Of course, we will need to add other types of knowledge concerning heuristics for planning, information about laboratory techniques and physical processes in order to have a workable system. 38 3.5 A DNA structure entry and editing system One of the basic routines proposed for the molecular genetics progscu is ani Gdatur, Lut Una (EuwA), alireauy pattrairy Cumpaeccecu.e iiie auUca a>) tu Have) al AMLYLatLiive routine whith atcepts “text eurtor = Styrze CUunmmanus aLLtuwing casy Mabpubab ion OL VNA SLructures which are Presented Lo tne user itt PpLCLOLLaAL LOLWs ine inspiration roe such an editor is drawn from an analogous routine in the DENDRAL project which facilitates the viewing and manipulation of chenical structures. The creation of the chemical structure editor has brought the internal representations of chemical structures out to the expert chemist user in a form that is natural to hin and easy to use. The result has been a tremendously increased use of DENDRAL by chemists and an innediate incorporation of the tool by other programmers working on various parts of the project. We expect the EDNA routine to be used as a basic tool in many programs within the molecular genetics project. In its completed form, EDNA will provide the user with the ability to edit DNA structures, build large structures fron smaller ones, view them with several optional levels of detail, and save then on file. In many cases, structures and parts of structures will be referenced by name. It would be a simple matter, for exanple, for a user to read a "T6-phage” DNA structure from a file and print out its genetic map or any other level of detail to the extent that it is known by the system. New details could be entered using easy “insert segment" or “edit segment" commands. EDNA would be called by other 39 prograns, for example, by the simulator. The simulator will call EDNA so that the user can specify the initial DNA mixtures and again to print out the results of the simulation or in explaining the actions on structures. Underlying the pictorial representations created by EDNA is an internal list structure representation of DNA. For exanple, a nucleotide is represented by a node which contains information to distinguish between DNA and RNA, the pyrimidine and purine bases, as ugen ug7u well as their methylated derivatives. The node includes > : and "H" pointers to other nodes in the structure representation corresponding to the naturally occurring chemical bonds of the same names. Nicks and gaps in the DNA can be represented implicitly in the list structure. Other formalized types of nodes are used to represent Seetivaias vi wna where tune intormation 18 Less complete, tnat 1S, wnere the bases or the exact locations of particular features are not known. The EDNA program is already partially written and tested. At this time various routines for drawing structures at different levels of detail are running as are the basic routines for manipulating the nodes in the list space. Several trial structures have been drawn and saved on files including some structures with hairpin configurations and cthers inVulViug matnds anu gapse Tie sSstruccure euating cCunmanas are currently being implemented and the methods for superimposing higher biological orders of structure, for exanple, the superstructures of genes and special codons, are still in the design stage. 40 4 Resources The principal computer science personnel involved in the design and construction of the system components described in part III of this proposal will be Professor Nancy Nartin at the University of New Mexico, and two computer science doctoral thesis students at Stanford University, Peter Friedland and (ttark Stefik. Molecular genetics knowledge, expertise, insights, techniques, and experimental heuristics will be provided by the researchers in Professor Joshua Lederberg’s laboratory at Stanford, particularly post-doctoral fellow Stanislav Ehrlich, and graduate student Jerry Feitelson. Professor Lederberg himself will provide substantial amounts of time on a regular basis for directing the project from the genetics viewpoint. Professor Edward Feigenbaun and Dr. Bruce Buchanan will direct the computer science aspects of the project. Offices for the MOLGEN project will be provided within the Stanford Heuristic Programming Project so as to foster interaction and exchange of ideas with workers on similar projects. Active projects within the Heuristic Programming Project include DENPRAL, a knowledge- based system for the analysis of organic conpounds from spectroscopic data, MYCIN, a system for the diagnosis and treatment of infectious disease, and a project for the determination of protein structures fron x-ray diffraction data. Approximately thirty workers including faculty, research associates, and graduate students are involved anong the projects. All of these projects are active in the design of intelligent systems for specific application areas and there has been considerable benefit from exchange and comparison of ideas. The superb computing facilities of the NIH-supported SUMEX~AIM timesharing installation (Carhart 1975) will be available at no charge to this project. The SUMEX~AIM facility, with Prof. Lederberg as principal investigator, is a national resource for the application of artificial intelligence techniques to problems in biology and medicine. Resources to be provided will include all CPU-time and storage required. Those involved at Stanford will be operating through hard- wired or dial-up equipment to the SUMEX PDP-10, while those at the University of New Mexico will access the system through either the ARPA network or TYNMNET. The SUMEX-AIM facility is a powerful interactive conputing system open to a national community. SAIL (Stanford Artificial Intelligence Language) and other high level languages are available and supported by a large system staff. Many convenient text editors for developing programs are provided. The TENEX operating system supports flexible file handling and sophisticated storage management’ for a highly interactive computing environment. 42 5 Bibliography Bates, D. J. and Frieden, C., 1973. "A Small Computer Systen for the Routine Analysis of Enzyme Kinetic Mechanisms," Comp. and Bioned. Res., 6, op. 474-486. . Bertazzoni, U., Ehrlich, S. 9D., and Bernardi, G., 1973. "Radioactive Labeling and Analysis of 3’-terminal Nucleotides of DNA Fragnents,” Biochimica et Biophysica Acta, 312, pp. 192-2901. Bloonfield, V. . “petermination of Cenes, Restriction Sites, and DNA Sequences Surrounding the 6S RNA Template of Bacteriophage Lambda," Proc. Nat. Acad. Sei, USA, 72, pp. 1817-1821. Smith, D. H., Buchanan, B. G., Engelmore, R. S., Aldercreutz, H., and Djerassi, C., 1973, “Applications of Artificial Intelligence for Chemical Inference IX. Analysis of Mixtures without Prior Separation as Illustrated for Estrogens,” J. Am. Chem. Soc., 95, 6078 Sobell, H. M., 1973, “Symnetry in Protein-Nucleic Acid Interaction Advances in Genetics, Academic Press, pp 411-490 Sulkowski, E. and Laskowski, Me, Sr., 1962. "Mechanism of Action of Micrococcal Nuclease on DNA," J. Bio.w Chem., 237, pp. 2620-2625, Winograd, T., 1972. Understanding Natural Language, Academic Press. Wipke, W. T., Gund, P., and Friedland, P., 1975. “ALCHEM: A Language for Describing Chemical Transformations,” in preparation. Wong, A. K. C., Reichert, T. A., Cohen, D. N., and Aygun B. O., 1974. "s Generalized Method for Matching Informational Macromolecular Code Sequences," Conp. in Bio. and Ned., 4, pp. 43-57. 46 7. BUDGET 48 NATIONAL SCIENCE FOUNDATION fashington, D. C. 20550 RESEARCH GRANT PROPOSAL BUDGET (TWO YEAR TOTAL) 2Year Beginning 6/1/76 Institution: Stanford University Principal Investigator(s): E. A. Feigenbaum, J. Lederberg NS¥ Funded Program Name: MOLGEN: A Computer Science Application Man—months Proposed to Molecular Genetics Cal Acad Sum Amount A. SALARTES AND WAGES: 1. Senior personnel: a a. (Co) Principal Investigator J. Lederberg — = = - (list by name) ..E..A..Feigenhaum............ 18 11,954 b. Faculty Associates (10%) (100%) ~ (list by name) cece cece cece cece sew encceees (Sub-total) Sees esses sesaeses Cw eee nce ae 2. Other personnel (Non-faculty ) / a. Research Assoc. (Post-doctoral) (list separately by name if available, otherwise give numbers Bruce, G; Buchanan, Research Computer Scientist | }oc9) 6858. b. Non-Fac. Professionals (Other) (list separately--by category, giving number, €.g. one computer programmer ) e. ( 3) Grad Students (Res. Asst.) ............ 32,478 d. ( ) Pre-Baccalaureate Students ...eseeceeee e. ( ) Secretarial-Clerical ..... sea c ens eeune f. ( ) Technical, Shop & Other ........ vee eeae Total Salaries and Wages wo... eee eee ee eee 21,270 B. STAFF BENEFITS: .....000s Cece rte ncaa eee e cence aaa 9,71] C. TOTAL SALARTES, WAGES AND STAFF BENEFITS (A + B) .iccccccccececcceccreeeeeeseenesees 60,981 D. PERMANENT EQUIPMENT: (List as Required) _ Purchase of, two, computer terminals 3, 55180 Total Permanent Equipment 5,180 E. EXPENDABLE SUPPLIES AND EQUIPMENT ...........ee0000: . 1,000 F. TRAVEL: , 1. Domestic .....04. Pree meet tener eee eet n eas eene 2,000 2. Foreign (list as required) ...... Lene eeee Lec eeeee Total Travel wee ec c ec cc ccc cence cece nee ccnes ~ 2,000" G. PUBLICATION COSTS ~ 400 H. COMPUTER COSTS (if charged as direct costs) T. OTHER COSTS: (itemize by major type) Terminal Maintenance ; 960 Communigatians .(tenminalzto-computer sy project, business 1,500 Total Other Costs 2,460 J. TOTAL DIRECT COSTS (C through I) ..... eee c cae e ee eeee . 72,021 K. INDIRECT COSTS: + 1. On Campus ..... OL weve ccc cccccccuccvecvuceaues hy 523 2. Off Campus ..6...%8 Of ceecceccccccccccvcccuccvaces Total Indirect CostS ..secceseeseeecceceevees WY 523° L. TOTAL COSTS (J plus K) wavs. ccc e ee eee eee See e cece eee 113,544 M. TOTAL CONTRIBUTIONS FROM OTHER SOURCES ........ce0ee% N. 113,544 TOTAL ESTIMATED PROJECT COST ......... Geet eseeeenscas NATIONAL SCIENCE FOUNDATION Washington, D. C. 20550 RESEARCH GRANT PROPOSAL BUDGET Year Beginning 6/1/76 Institution: Stanford University . Principal Investigator(s): E. A. Feigenbaum, J. Lederberg Bee Ended Program Name; MOLGEN: A Computer Sctence Application Man—months Proposed To Molecular Genetics Cal Acad Sum Amount A. SALARTES AND WAGES: 1. Senior personnel: — seek a. (Co) Principal Investigator J+ Lederberg ~ ~ ~ - (list by name) .....E..A..Feigenhaum..(LQ%)... o 2,753 b. Faculty Associates (list by name) Lees cece ccc eee eee etre tenance . (Sub-total) ..... eee beeen ence eee ees 2. Other personnel (Non-faculty) a. Research Assoc. (Post-doctoral) (list separately by name if available, otherwise give numbers) bd. Non-Fac. Professionals ( ther) (list separately--by category, giving number, e.g. one computer programmer) ec. ( 3) Grad Students (Res. Asst.) ............ 16,224 a. ¢ ) Pre-Baccalaureate Students .......-e eee e. ( ) Secretarial—Clerical .icsceceeaee seen f. ( ) Technical, Shop & Other ...........eaee a Total Salaries and Wages .....- ec eee eee eee 18,977. B. STAFF BENEFITS: ........00c00- eee eee ess essseereuees 3,409 C. TOTAL SALARIES, WAGES AND S'TAFF a BENEFITS (A + B) ..ccecccccccnccceucceeseeeecuneeeucus 22 , 386 D. PERMANENT EQUIPMENT: (List as Required) teak _ Purchase ,of ,two computer. terminals. lee, 5,180 Total Permanent Equipment 5,180 E. EXPENDABLE SUPPLIES AND EQUIPMENT .....ccceeeee ta aeae 500 F. TRAVEL: LT. Domestic we. cece ccc ccc cece eee tee eter e ne nene 1,000 2. Foreign (list as required) eee e een ceases teens Total Travel .. cee ccc ce ccc cer een e en eeee 1,000 G. PUBLICATION COSTS 200 H. COMPUTER COSTS (if charged as direct costs) T. OTHER COSTS: (itemize by major type) Maintenance of computer terminals 480 Communicatiens ,(terminal-to-computer,, project, business 750 Total Other Costs Phone, postage 1,230 J. TOTAL DIRECT COSTS (C through I) we. ee eee ec eee eee 30,496 K. INDIRECT COSTS: - 1. On Campus ..... Of vec ccccccccvccacececcevecees 17,438 2. Off Campus .1...%8 Of Lecce cece cece cee cence ee eees Total Indirect Costs wesc e cece cece cence ee eenee 17,438 L. TOTAL COSTS (J plus K) veces ece cee cec cece eee eeeneee . 47,934 M. TOPAL CONTRIBUTIONS FROM OTHER SOURCES ..... ve eeeeeee N. 47,934 TOTAL ESTIMATED PROJECT COST 4.1... eee eeaes ae nee aee RAYIONAL SCLBNCE FOUNDATLON Washington, D. C. 20550 RESEARCH GRANT PROPOSAL BUDGET Year Beginning 6/1/77 Institution: Stanford University . Principal Investigator(s): E. A. Feigenbaum, J. Lederberg NSF Funded Program Name; MOLGEN: A Computer Science Application Man-months Proposed to Molecular Genetics Cal Acad Sum Amount A. SALARTES AND WAGES: 1. Senior personnel: setek a. (Co) Principal Investigator J. Lederberg — _ se a (list by name) ..E..A..Feigenkaum..........., 9 2 9,201 b. Faculty Associates (10%) {T00%) (list by name) co.cc cece cece ee cece ee ee enes : (Sub-total) ..cseesccccc cece tence cece cease 2. Other personnel (Non-faculty) a. Research Assoc, (Post-doctoral) (list separately by name if available, otherwise give numbers) ; Bruce, &., Buchanan,, Research .Compyter Sclentist 11 6 838 b. Non-Fac. Professionals (Other) (25%) (list separately—-by category, giving number, e.g. one computer programmer) ce. (3 ) Grad Students (Res. Asst.) ...cceeeeeee 16,254 a. ( ) Pre-Baccalaureate Students ..........06. — e. ( ) Secretarial-Clerical ..eseeeeeceeceeee ; _ f. ( ) Technical, Shop & Other ..........e000. Total Salaries and Wages ............. cece eee 32,293 B. STAFF BENEFITS: ......... ee . 6,302 C. TOTAL SALARTES, WAGES AND STAFF BENEFITS (A +.B) .csecccceccecccccececeeeeeeeenes Leeee 38,595 _ D. PERMANENT EQUIPMENT: - (List as Required) Total Permanent Equipment E. EXPENDABLE SUPPLIES AND EQUIPMENT ....... see eee beweee . 500 F. TRAVEL: LT. Domestic cece ccc cece cece ne ne eee enter aren sneeas 1,000 2. Foreign (list as required) ...ee ccc ce cece eu eecees Total Travel ...... ee eee ee eee ewe eee wae 1,000 G. PUBLICATION COSTS , 200 H. COMPUTER COSTS (if charged as direct costs) I. OTHER COSTS: (itemize by major type) Terminal maintenance 480 Fommunications. (terminals to-computer. project, business | 790 Total Other Costs 1,230 J. TOTAL DIRECT COSTS (C through I) wi... ee eee cece ee ees AY, 525 K. INDIRECT cosTs: + 1. On Campus ..... Of cece ccc cccccecceuccucvcenene 24,085 2. Off Campus ..... b OL cece cece cece cet e ese eeuenn Total Indirect Costs ........ Lecce eee eee ees 24,085 L. TOTAL COSTS (J plus K) ........ sate cece eee tereace . 65,610 M. TOTAL CONTRIBUTIONS FROM OTHER SOURCES ...........00.% N. TOTAL ESTIMATED PROJECT COST .... ccc eee cece eee __ 65,610 BUDGET NOTES Salary increases estimated at 10%, effective Sept. 1. * Equal to 2/9 academic year salary. ** Qver two-year period, lease price exceeds purchase price plus maintenance. However, leases can be arranged if administratively more convenient to NSF. *&4Professor Lederberg's activity on this project will be done without charge to the budget. + INDIRECT COSTS: On Campus 56% of Total Direct Costs thru 9/1/76 58% of Total Direct Costs thereafter.