A PROPOSAL FOR NAMING HOST CELL DERIVED INSERTS IN RETROVIRUS GENOMES 1 2 John M. Coffin” and Harold E. Varmus lDepartment of Molecular Biology and Microbiology Tufts University School of Medicine Boston, Massachusetts 02111 “Department of Microbiology University of California San Francisco, California 94143 ABSTRACT We propose a system for naming inserted sequences in transforming retreviruses (i.e. onc genes), based on using trivial names derived from a prototype strain of virus. A number of retroviruses have been isolated from naturally occurring or laboratory-induced tumors. Some of these are able to induce rapid disease in laboratory animals and to induce transformation of morphological and/or growth properties of appropriate tissue culture cells (for review see ........). All such viruses whose genomes have been closely examined have been found to share a common feature: the presence of a nucleotide sequence which encodes a protein unnecessary for viral replication but required for the induction of the transformed phenotype (...c.eeeeeseeeceeee}s Such sequences have been generally referred to as onc genes (...............). AS shown in Table 1, there are at least twelve distinct onc genes which have been identified in at least twenty isolates of transforming retroviruses. Where tested, all such genes have been found to be closely related to a sequence present in the uninfected host cell, yet distinct from any endogenous viruses which might be present. It has been proposed that the transforming viruses have arisen by a mechanism involving recombination between virus and cellular information, with the consequence that an apparently normal cellular gene has come under the replicative and expression controls provided by the viral genome (........-e cee eee eee eeads and by virtue of modification in structure and/or mode of expression has acquired the ability to cause cell transformation. While there is general agreement among workers in the field concerning the nature of onc genes and their relationship to the ~ host cell, there is substantial confusion surrounding the names of these sequences and their cellular relatives. For example, the name -3- src, originally used to designate the onc gene of Rous sarcoma virus (cece eccccecsenccaes eve ae esses }, has recently been applied generally to sequences which are completely unrelated in sequence, in nature of the gene product, and in location in the genome. The use of identical names for genes of unrelated sequence and function can lead to serious problems in communication. An additional problem has arisen in the description of the endogenous sequence related to an onc gene. The sequences related to the various sre genes, for example, have been often called "sarc", with the result that the virus and cellular sequences have identical pronounciation. More Eoifbersome designation for such sequences have been proposed, but not widely accepted. Retrovirus genes encoding replicative function’ (i.e. gag, pol, and env) been accorded three letter names derived from their function of some other feature (.........0.- peeseeveeees ). We propose, for simplicity and readability, that this system be extended to include the non replicative inserts found in many strains of retrovirus. According to this proposed system, such inserts (or onc genes) will be given trivial three letter designations. These names are not meant to imply specific diseases, target cells, or functions, rather they are to be simply names of sequences which are not derived from viral replicative information, and which encode a protein (or a portion of a polyprotein) likely to be involved in transformation of the infected cell. We also propose a system for distinguishing the viral from the related cellular sequence and, where necessary, the sequences in related viral strains from one another. -4- The names for these sequences are to be assigned according to the following guidelines: l. 2. The names should be 3 letters, lower case italics. The names should be trivial; that is no target cell specificity or functional significance is implied, and they are to be considered as names of coding sequences only. They are to be derived in some mellifluous, yet mnemonic way from the name of the prototype virus or viruses or some other memorable feature of them. Related sequences in different viruses from the same species are to be called by the same name, in a way Y net oe should gZ wea eH (when completely resolved) point to the sane cell sequence and the same or a closely related protein product, although it should not be necessary to have identified all of these to assign a name. When necessary for clarity, the differences between inserts in related viruses can be indicated by prefixing the name with the abbreviation or name for the virus or virus strain. The related sequence found in the cell of origin will be designaed with a lower case c- preceding the sequence name, e.g. c-src. The animal species of the cellular homologue should be indicated in paenthesis following the name of the sequence (e.g. c-src (chicken)). The unadorned name will always indicate the viral sequence only. Protein products will be designated according to previous convention except that no superscripts will be used; thus, pp6O0src, Pl50c-ab1, PllO0gag-abl stand for the product of 10. 11. 12. ~5- src, the product of the endogenous cell sequences related to abl, and the polyprotein containing both gag and abl specific information, respectively. Should the same virus be found to have two independently expressed inserts (i.e. coding for different proteins through distinct mRNAs), then they can be distinguished by affixing ~A, -B, etc. to the name. Such names should be reserved for nonviral related sequences only. Such situations as spleen focus~forming virus, which seems to have only variants of viral replicative genes ) and the 30S region of Ha and Ki MSV which is apparently derived from an endogenous virus — Tike element (...... ccc ce cee cece eee ) should not be so named. In this way, it can be assured that the names are unique. Names along the same lines can also be given to nontransforming inserts if found in retroviruses or deliberately put there, but should be limited to genetically significant regions, i.e. those with protein (or functional RNA) product. An exception to rule 4 can be made (although it need not) in the case where somewhat different yet related inserts are found in viruses of different species. Strict genetic evidence is not required to assign a name, but it should be shown A) that the region is non-viral, and B) that it has either a protein (or functional RNA) product or a genetically identifiable funtion. -6- A list of suggested names is shown in Table 1. We note that many of the assignments are tentative and that more names will likely be added in the future. Three of the names on this Vist (src, myb, erb) are already in use. Erb and myb were originally proposed with a different rationale; ie. that. they were indicative of. transformed cell type (...... ccc ec c we ee eee }. We do not consider transformed cell type to be useful criterion for such assignments, since many of the viruses cause a variety of diseases, since at least seven of the onc sequences are in viruses that cause sarcoma as their most common disease, and since even in these viruses that do cause a relatively unique definable disease (such as Abelson MuLV), there is 7 general agreement concerning the nature of the transformed cel]. The three names mentioned, however, should in this context be considered as trivial names derived from the name of the prototype virus, and we suggest they be so used. We do suggest changing the name proposed for the transforming insert of avian myelocytomatosis virus MC29and related viruses (MaC3....... ce see cece eeevcees ) to myc to match more closely the name of the protoype virus. If the name of an onc "gene" is considered to desribe a name of inserted sequence, all or at least part of wc enpuces a functional product, then (at least in principle) it can be precisely defined as that sequence which is unrelated to the genome of any replication- competent nontransforming virus (i.e. not belonging to a gag, pol, or env gene or to some noncoding internal or terminal region of such a virus). With many of these sequences, it is quite difficult to obtain a definition by purely genetic techniques, since they are usually found in replication-defective viruses. In all cases, however, -7- it is possible to use physical, biochemical, and recombinant DNA techniques to define the limits of onc sequences with precision, for example by comparing nucleotide sequences of a transforming virus, its nontransforming but replication competent helper, and the related cellular sequence or sequences with each other and with the amino acid sequence of the suspected gene product. A region of a genome defined in this way is not, in the strictest sense, a "gene". However, to refer to a defined sequence as an onc gene, while imprecise, should not create serious confusion, so long as it is understood that not all of the sequence may be directly involved in encoding a product and that additional viral sequnces may encode part of the final gene product. Some of the names proposed may not at first seem as mellifluous as might be desirable. However, with practice they seem to be fairly easy to pronounce; for example, abl can be pronounced like "able" and fps like "fips". We also suggest that mas be pronounced "mass" to avoid confusion with mos ("mos"). _—_— The fotlowing investigators have agreed to these guidelines: — S. Aaronson, P. Balduzzi, J. Ball, D. Baltimore, H. Bauer, J. M. Bishop, D. Dina, R. Eisenman, R. Friis, D. Fujita, A. Goldberg, H. Hanafusa, S. Hughes, W. Joklik, G.S. Martin, S. Rasheed, F. Reynolds, N. Rosenberg, C. Sherr, J. Stephenson, H. Temin, G. Theilen, K. Toyoshima, G. Vande Woude, I. Verma, P. Vogt, M. Weber, R. Weinberg and M. Yoshida. Viral Insert rel RSV-sre B77-sre rASV-sre PR-RSV-src AMV-myb E26-myb MC29-myc CMII-myc MH2-myc OK10-myc AEV-erb-A AEV-erb-B FSV-fps PRCIT-fps Moloney-mos Gazdar-mos Rasheed-ras Kirsten-ras Harvey-ras abl ST-fes GA-fes MSs. Cras WOS CG Y73~yés ESV-yes TABLE 1. PROPOSED NAMES FOR onc GENES Virus Strain avian reticuloendotheliosis virus-T Rous sarcoma virus B77 avian sarcoma virus recovered avian sarcoma virus Prague strain Rous sarcoma virus avian myeloblastosis virus strain BAI-1 avian leukemia virus strain E26 avian myelocytoma virus MC29 avian myelocytoma virus CMII avian myelocytoma and carcinoma virus MH2 avian myelocytoma virusO0K10 avian erythroblastosis virus avian erythroblastosis virus Fujinami sarcoma virus PRCII sarcoma virus Moloney murine sarcoma virus Gazdar murine sarcoma virus Rasheed rat sarcoma virus Kirsten murine sarcoma virus Harvey murine sarcoma virus Abelson murine leukemia virus Snyder-Theilen feline sarcoma virus Gardner-~Arnstein feline sarcoma virus McDonough feline sarcoma virus Woolly monkey sarcoma virus ¥73 avian sarcoma virus Esh sarcoma virus Probable Animal Origin turkey chicken chicken chicken, Japanese quail chicken chicken chicken chicken chicken chicken chicken chicken chicken chicken chicken mouse mouse rat rat rat mouse cat cat cat woolly monkey chicken chicken Protein. Product ? pp60src. ppo0src pp60src pp60src ? ? Ly P1l0gag-mac P90gag-mgc PL0Ugagrnéc P75gag-erb-A p4sgag-erb-B P140gag-fps P105gag-fps ? ? P29gag-ras P2lras P2lras P120gag-ab1 P85gag-fes P1l0gag-fes P170gag-mag JI ? P90gag- yes PaOeae SES ip