November 2, 1991 Preliminary report, Laboratory of Molecular genetics and Informatics The Rockefeller University Our laboratory research is focussed on ways in which the intracellular environment can influence differential mutagenesis. Our cognitive modelling research is tracking how we approach this problem in day to day laboratory planning, and supports the latter through more systematic plan-generate-test paradigms. These are founded on the DENDRAL style knowledge-based systems. An important element is a hypothesis generator (or classifier) which can expect in principle to generate a tree of all possible hypotheses. In DENDRAL, this was facilitated by a structure-generating algorithm which could produce all valid molecular structures. In our current effort, we have an n-dimensional phase space where we seek to identify a heuristically useful set of orthogonal variables, each of which can be exhaustively scanned. Later, our plan and test modules will restrict the combinatorial generator to a reasonable scope. For some time before we implement this in computer programs, we monitor our own laboratory dialectic, seeking the most economical and rigorous axes. Before turning to regionally differential mutagenesis, we have done a logical survey of mutagenesis in general, this being defined as changes in DNA base sequence which are propagated in successive generations of DNA replication in vivo. After several trials, we concluded that the problem can be dissected into two major branches, each with a number of non-orthogonal components: a) The attributes of the DNA target b) The environmental features (biological, chemical, physical) playing on the target. Under a), we can consider its primary ... quaternary structure (the last being the cellular or extracellular localization). But these components are non-orthogonal, as a given primary sequence can be found in a variety of conformational states. These attributes also encompass the possible chemical reactions of DNA, and it is helpful to know that DNA structure allows a limited range of chemical alterations (some very familiar, others purely hypothetical). These range from nicking (ss- or ds-) scission of phosphodiester bonds, to reactions of the furanose sugar, to change or extraction of the purine/pyrimidine bases. They can in principle also be followed by religation of the phosphodiester (possibly in configurations other than the canonical 3’-5’) and further chemistry of the sugars and bases. At the secondary level, base pairing and ss- to ds- transitions, and v.v., come immediately to mind. This approach is in its early stages, but we have already found it to be very useful in organizing the enormous mass of relevant information about mutagenesis processes, direct and indirect. The following lists are a further elaboration of the preceding narrative. Subsequently, we will superimpose a planning perspective that can limit the hypothesis space by heuristics concerning sources of specificity, namely how some DNA (genes, sequences, loci) could be expected to be more impacted than other by an environmental insult or stimulus. 1) Characterize the various forms of the substrate (DNA) on which mutagenic processes operate, 2) Identify environmental and cellular signals and reagents which may participate in the process of genetic change, 3) Compile the components of cellular metabolism which act on the substrate, to create a set of primitive transformations which DNA can undergo, 4) Describe mutation-generated changes in primary sequence as the result of the application of these metabolic primitives to the appropriate forms of DNA. To characterize DNA as a substrate for mutagenesis, it is necessary to identify sets of physical attributes which are orthogonal to each other. Informally, we mean that these physical qualities are mostly independent of each other, and that a given molecule may be characterized by the values of several attributes. Three such important attributes are: 1) The quaternary structure of the DNA target. This includes the strandedness (single, double, triple or quadruple), 2) Conformational aspects such as breathing, writhing, supercoiling status, catenation and cruciform structures. These attributes are the consequence of temperature, local sequence, ionic strength, nucleotide pools, etc. 3) Primary lesions. Examples are strand scission at the phosphodiester bond, deletions, insertions, chemical modifications, and non-covalent intercalations of small molecules. The features described above (large-scale structure, conformation and characterization of primary lesions) can form the basis for a classification of mutations from the perspective of the substrate. To illustrate, we can look more closely at the third group (primary lesions) and suggest a classification of mutations with respect to to the chemistry of unreplicated DNA (i.e. before its presentation to a polymerase). As we do so, we can anticipate the actions which may result from the presence of such a DNA substrate: 1) Backbone changes -- mutations which interrupt the phosphodiester backbone A) Backbone scission of ssDNA (as a lesion in dsDNA) Possible actions: a) religation b) formation of gap via exonucleolytic action c) displacement synthesis B) Backbone scission of dsDNA (can produce blunt or staggered ends) Possible actions: a) introduction of a stable end (telomere) b) recombinogenic repair 2) Changes in which the backbone is not affected -- (base altered instead) A) Removal of base (depurination) B) substitution (C -> U, pseudo-bases) C) base-base union (T dimers) D) destruction or modification of base Possible actions: a) reaction with other ligands b) interference with replication [modulo recA binding] Signals and reagents which may influence information metabolism can be classified as follows: 1) Molecules A) macromolecules B) small molecules 2) Physical agents A) emissions through the entire electromagnetic spectrum B) pressure (including sonic effects resulting from pressure fluctuation) C) temperature D) osmolality E) surfaction F) pH G) humidity 3) Physical trauma to cells 4) Electric charge 5) Gravity 6) Surface constraints We can further describe each of these signals from another perspective, which identifies the spatial, metabolic or physical point of interaction with the cell. Many cellular responses to external signals are mediated by changes in DNA conformation or DNA-protein binding, suggesting a nexus of feedback from environment to differential mutagenesis. 1) Those transduced by a receptor: A) membrane B) cytoplasm C) nucleus -- in bacteria, e.g., enzyme inducers and derepressors 2) Nutrients (C and N metabolism) A) elementary sources of C, N, P ... B) specific growth factors C) light or cognate sources of chemical energy 3) Reagents A) low molecular weight e.g. alkylating agents (mustard gas; formaldehyde; hydrazines) B) high molecular weight enzymes ribozymes other nucleic acids (e.g. antisense RNA) It is then possible to identify action primitives which can alter DNA which has been thus characterized. The cellular metabolic processes which act on DNA include such components as replication, excision repair, nucleolytic erosion, chemical modification, recombination, incorporation of base analogs, and recognition and subsequent repair of mismatch and modifications. We intend to describe these processes as ordered compilation of primitives. To illustrate primitives, we can examine replication in finer detail. The preconditions on the DNA template include the availability (melting of the helix to form ssDNA), the location of protein binding sites and the presence of a primer. The primitive actions then include base- pairing, ligation and excision. Given these primitives, it should be possible to describe the transformation of one DNA sequence into another as the result of a series of operations on well-defined states. For example, we can describe the potential sequence changes in a DNA molecule by a state diagram which describes editing. Editing occurs immediately after the addition of a new base to the 3’ end of a growing chain. It must happen before ligation of the next base and involves a 3’-5’ exonuclease. 5 Vaelenunery tancesh ohingemn ----- > base-pair <---------------------- 7 J tszemble nutaetiAd | no 1 l i" “ | | «phosphodiester bond ligate > exonuclease (3'-5’) | | ______> ; { tes | yes cctot erode <-~------------- +n Uo OGaIW This diagram can be elaborated by including endonuclease nicking of DNA, often in response to chemical cross-links, or to base-pair mismatch. The free ends then can be processed just as above. These finite state machines are a gratifyingly compact summary of systems as diverse as standard DNA-replication (with editting), excisional and mismatch repair, and error-prone repair associated with activation of the "SOS" system. Most of the detail of this report is taken from Dr. M. Noordewier’s notes of weekly discussion sections of the laboratory. Yours sincerely, Joshua Lederberg