The present invention builds upon an extensive body of prior art including the physical and chemical nature of nucleic acids, the principles of genetics and biochemistry, and on specific chemical and enzyme catalyzed reactions. The following is a non-exhaustive list of general background references useful for explaining operating principles and defining terms generally used in the art.
1. General
A. Watson, J. D., The Molecular Biology of the Gene, 3rd Ed., Benjamin, Menlo Park, Calif. (1976). PA1 B. Hayes, W., The Genetics of Bacteria and Their Viruses, 2nd Ed., Blackwell Scientific Publ, Oxford (1968). PA1 Roberts, R. J., Crit. Rev. Biochem. 4, 123 (1976). PA1 a. Itakura, K., et al., J. Biol. Chem. 250, 4592, (1975). PA1 b. Itakura, K., et al., J. Am. Chem. Soc. 97, 7327 (1975). PA1 Scheller, et al., Science, 196, 177 (1977). PA1 a. Sanger, F., et al., Proc. Nat. Acad. Sci. USA, 74, 5463 (1977). PA1 b. Maxam, A. M., and Gilbert, W., Proc. Nat. Acad. Sci. USA, 74, 560 (1977). PA1 Kornberg, A., DNA Replication, PA1 a. Sinsheimer, R. L., Ann. Rev. Biochem., 46 415 (1977). PA1 b. Ullrich, A., et al., Science, 196, 1313 (1977). PA1 c. Villa-Komaroff, L., et al., Proc. Mat. Acad. Sci. USA, 75, 3727 (1978). PA1 d. Burrell, C. J., et al., Nature, 279, 43 (1979). PA1 e. Bolivar, F., et al., Gene, 2, 95 (1977).
2. Restriction Endonucleases:
3. Chemical Synthesis of Oligonucleotides:
4. Restriction Site "Linkers":
5. Nucleotide Sequence Determination:
6. DNA Replication:
7. Recombinant DNA:
Developments in recombinant DNA technology have made it possible to isolate specific genes or portions thereof from higher organisms, such as man and other mammals, and to transfer the genes or fragments to a microorganism species, such as bacteria or yeast. The transferred gene is replicated and propagated as the transformed microorganism replicates. As a result, the transformed microorganism may become endowed with the capacity to make whatever protein the gene or fragment encodes, whether it be an enzyme, a hormone, an antigen or an antibody, or some portion thereof. The microorganism passes on this capability to its progeny, so that in effect, the transfer has resulted in a new strain, having the described capability. See, for example, Ullrich, A. et al., supra. and Seeburg, P. H., et al., Nature, 270, 486 (1977). A basic fact underlying the application of this technology for practical purposes is that DNA of all living organisms, from microbes to man, is chemically similar, being composed of the same four nucleotides. The significant differences lie in the sequences of these nucleotides in the polymeric DNA molecule. The nucleotide sequences are used to specify the amino acid sequences of proteins that comprise the organism. Although most of the proteins of different organisms differ from each other, the coding relationship between nucleotide sequence and amino acid sequence is fundamentally the same for all organisms. For example, the same nucleotide sequence which codes for the amino acid sequence of a human protein in human cells, will, when transferred to a microorganism, be recognized as coding for the same amino acid sequence, and may result in synthesis by the microorganism of the same human protein.
Abbreviations used herein are given in Table 1.
TABLE 1 ______________________________________ DNA -- deoxyribonucleic acid A -- Adenine RNA -- ribonucleic acid T -- Thymine cDNA -- complementary DNA G -- Guanine (enzymatically synthesized C -- cytosine from an mRNA sequence) U -- Uracil mRNA -- messenger RNA ATP -- Adenosine triphosphate dATP -- deoxyadenosine triphos- TTP -- thymidine phate triphosphate dGTP -- deoxyguanosine triphos- phate dCTP -- deoxycytidine triphos- phate ______________________________________
The coding relationships between nucleotide sequence in DNA and amino acid sequence in protein are collectively known as the genetic code, shown in Table 2.
TABLE 2 ______________________________________ Genetic Code U C A G ______________________________________ U Phe Ser Tyr Cys U Phe Ser Tyr Cys C Leu Ser Non.sup.2 non.sup.3 A Leu Ser Non.sup.1 Trp G C Leu Pro His Arg U Leu Pro His Arg C Leu Pro Gin Arg A Leu Pro Gln Arg G A Ile Thr Asn Ser U Ile Thr Asn Ser C Ile Thr Lys Arg A Met Thr Lys Arg G G Val Ala Asp Gly U Val Ala Asp Gly C Val Ala Glu Gly A Val Ala Glu Gly G ______________________________________
An important feature of the code, for present purposes, is the fact that each amino acid is specified by a trinucleotide sequence, also known as a nucleotide triplet. The phosphodiester bonds joining adjacent triplets are chemically indistinguishable from all other internucleotide bonds in DNA. Therefore the nucleotide sequence cannot be read to code for a unique amino acid sequence without additional information to determine the reading frame, which is the term used to denote the grouping of triplets used by the cell in decoding the genetic message.
Essential to recombinant DNA technology are the relatively small, autonomously replicating, DNA elements including plasmids and virus DNA molecules. These are used to infect, transfect, or transform host cells in which they are replicated. The process of uptake of such DNA by a living cell and replication in that cell is termed herein the transfer of such DNA to a host cell. The term "transfer" used herein, includes processes termed transformation, transfection or DNA uptake in the art. The genetic information contained in such molecules includes at a minimum that needed to insure the replication of the DNA after transfer to a host cell. Normally, these autonomously replicating elements carry additional genetic information. For example, plasmids often carry antibiotic resistence genes which confer survival value on host cells carrying the plasmids, and virus DNAs contain genes for virus coat proteins and for functions associated with maturation of intact viruses. In addition, wholly unrelated genes may be included. Autonomously replicating DNA elements exist as double stranded ring structures generally on the order of a few million daltons molecular weight, although some are greater than 10.sup.8 daltons in molecular weight. They usually represent only a small fraction of the total DNA of a host cell. The DNA is usually separable from host cell DNA by virtue of a great difference in size between them. In addition, they are usually isolated as intact ring structures, whose topological constraints may be exploited to product separations on the basis of density, from host cell DNA which is typically isolated in the form of linear fragments. Many such DNAs may be induced to carry a segment of heterologous DNA by technique of opening the ring, inserting the fragment of heterologous DNA and reclosing the ring, forming an enlarged molecule comprising the inserted heterologous DNA segment. Alternatively, a segment of heterologous DNA may be inserted in place of a previously deleted non-essential region. In either situation, the DNA then serves as a carrier for the heterologous inserted fragment and is termed a transfer vector.
Transfer may be accomplished by any process whereby host cells are induced to incorporate the DNA into the cells. Although the mechanics of the process remain obscure, it is a widely observed phenomenon, known to occur in many species of bacteria, yeast, and cultured mammalian cells. Once a cell has incorporated a transfer vecter, the latter is replicated within the cell and the replicas are distributed to the daughter cell when the cell divids. Any genetic information contained in the nucleotide sequence on the transfer vector DNA can, in principle, be expressed in the host cell. Typically, a transformed host cell is recognized by its acquisition of traits carried on the plasmid, such as resistence to antibiotics. Any given transfer vector may be made in quantity by growing a pure culture of cells containing the transfer vector and isolating transfer vector DNA therefrom.
The means for insertion of heterologous DNA segments into transfer vector DNA includes the restriction endonucleases. Restriction endonucleases are hydrolytic enzymes capable of catalyzing site-specific cleavage of DNA molecules. The locus of restriction endonuclease action is determined by the existence of a specific nucleotide sequence. Such a sequence is termed the recognition site for the restriction endonuclease. Many restriction endonucleases from a variety of bacterial species have been isolated and characterized in terms of the nucleotide sequence of their recognition sites. Some restriction endonucleases hydrolyze the phosphodiester bonds on both strands at the same point, producing blunt ends. Others catalyze hydrolysis of bonds separated by a few nucleotides from each other, producing free single stranded regions at each end of the cleaved molecule. Such single stranded ends are self-complementary, hence cohesive, and may be used to rejoin the hydrolyzed DNA. Since any DNA susceptible of cleavage by such an enzyme must contain the same recognition site, the same cohesive ends will be produced, so that it is possible to join heterologous sequences of DNA which have been treated with restriction endonuclease to other sequences similarly treated. See Roberts, R. J., supra.
Restriction sites are relatively rare, those having short recognition sequences being more commonly encountered than those having longer recognition sequences. An important feature of the distribution of restriction endonuclease recognition sites is the fact that they are randomly distributed with respect to structural genes and with respect to reading frames. In the prior art, there is no known way to predict, (a) whether a given restriction site will exist in a given region and (b) where within a given region they will be located. Consequently, a major initial effort of all prior art recombinant DNA projects involves the construction of a restriction map, which is a diagram of the DNA under investigation, showing the relative locations of various restriction sites. Once the restriction map is known, it is then possible to devise the optimum strategy for inserting and deleting fragments, approximately as desired. An important consequence of the random distribution of restriction sites is that the investigator must devise essentially ad hoc strategies for each project, utilizing the nature and location of restriction sites as best he can for his purposes. An exception to this general statement is that individual restriction sites may be deleted by known techniques. DNA which has been cleaved by restriction endonuclease may be rejoined in a reaction catalyzed by the enzyme DNA ligase. Fragments having cohesive ends are covalently joined with high efficiency under conditions conducive to base pairing of the cohesive ends. The enzyme also catalyzes the joining of fragments having base-paired, blunt ends. However, fragments having protruding single stranded non-complementry ends are not joined.
The term "expression" is used in recognition of the fact that an organism seldom if ever makes use of all its genetically endowed capabilities at any given time. Even in relatively simple organisms such as bacteria, many proteins which the cell is capable of synthesizing are not synthesized, although they may be synthesized under appropriate environmental conditions. When the protein product, coded by a given gene, is synthesized by the organism, the gene is said to be expressed. If the protein product is not made, the gene is not expressed. Normally, the expression of genes in E. coli is regulated as described generally, infra, in such manner that proteins whose function is not useful in a given environment are not synthesized and metabolic energy is conserved.
The development of capabilities for synthesizing DNA chemically (See Itakura, et al, supra) has made it possible to synthesize restriction enzyme recognition site sequences, Scheller, et al, supra. A desired oligonucleotide recognition site sequence may be chemically synthesized, attached to the ends of a DNA fragment by blunt end ligation, treated with the appropriate restriction endonuclease, to provide cohesive ends, then inserted in a transfer vector cleaved by the same restriction endonuclease, whereby the cut ends of the transfer vector DNA and of the heterologous fragment are cohesive and may be joined together with high efficiency in a ligase catalyzed reaction.
The organization of structural genes coding for individual amino acid sequences within the total genome is a matter of much current interest. It appears that the genome of virtually all organisms, with the exception of certain viruses, consists of both essential and nonessential regions. These terms are functionally defined and depend upon the growth conditions. For example, the ampicillin resistance gene is not essential for growth of bacteria in a medium lacking ampicillin; it is essential if ampicillin is in the medium. In addition to specific structural genes whose essentiality depends upon conditions, there appear to exist regions which are non-coding in the conventional sense, whose presence is not essential. In prior art genetic analyses, such regions have been identified by deletion mutations in which segments of the genome are permanently excised without loss of viability. An example is the b2 of lambdaphage. In eucaryotes there are untranslated regions to be found on either side of most structural genes. In addition, current evidence indicates that the structural genes in some instances contain internal non-coding regions, terms introns, whose function is at present unknown. (See Crick, F. Science 204, 264 (1979)).
The concept of essential and non-essential genes is fundamental to the present invention. For purposes of this application, a gene of an autonomously replicating DNA element is essential if, under the growth conditions employed, the function for which it codes is required for continued replication of the element. A region which is non-essential, with reference to a given autonomously replicating element, is one whose function is not required, under the growth conditions, for continued replication of the element. It will be understood that a given region may be defined as essential under one set of growth conditions and non-essential under another. A case in point is the ampicillin resistance gene previously described. A gene which is essential under one set of conditions may be rendered non-essential by complementation. In complementation, the function of a gene provided by one genetic element is duplicated by providing a second gene within the cell, coding for the same function. If the first should be defective, bearing an insertion, deletion or base change leading to loss of function, the growth and replication of the DNA element is still permitted by providing a normal function from the second gene. Therefore, by manipulation of growth conditions and/or complementing functions, it is possible to render virtually any gene essential or non-essential.
Nucleotide sequence analysis of DNA has proven to be a powerful technique for the elucidation of gene structure and for the prediction of amino acid sequences. The methods of Maxam and Gilbert, supra, and of Sanger, supra, provide powerful tools for rapid sequence analysis. Both methods are presently limited by the need to have detailed restriction mapping of the DNA to be sequenced. This is so because both sequencing methods employ gel electrophoresis for the separation of the oliognucleotides. The length of sequences which can be determined from a given starting point are limited by the resolution of the electrophoresis gels. In order to extend the sequence beyond the nucleotide length resolvable by current gels, it is necessary to provide, by means of restriction endonuclease cuts, new starting points for the analysis procedure. Although the sequence procedures themselves are rapid, getting sufficient information about restriction sites is relatively time consuming. Furthermore, as with other aspects of recombinant DNA technology, all experiments must be designed on an ad hoc basis, depending upon what restriction sites exist and how they are located with respect to each other, in the DNA to be sequenced. Prior to the present invention, no general method has existed for undertaking routine sequence analysis of DNA fragments.