The present invention relates to the isolation of a specific nucleotide sequence which contains the genetic information coding for a specific protein, the synthesis of DNA having this specific nucleotide sequence and transfer of that DNA to a microorganism host wherein the DNA may be replicated. More specifically, the present invention relates to the isolation of the insulin gene, its purification, transfer and replication in a microbial host and its subsequent characterization. Novel products are produced according to the present invention. These products include a recombinant plasmid containing the specific nucleotide sequences derived from a higher organism and a novel microorganism containing as part of its genetic makeup a specific nucleotide sequence derived from a higher organism.
The symbols and abbreviations used herein are set forth in the following table:
DNA--deoxyribonucleic acid PA0 RNA--ribonucleic acid PA0 cDNA--complementary DNA (enzymatically synthesized from an mRNA sequence) PA0 mRNA--messenger RNA PA0 tRNA--transfer RNA PA0 dATP--deoxyadenosine triphosphate PA0 dGTP--deoxyguanosine triphosphate PA0 dCTP--deoxycytidine triphosphate PA0 A--Adenine PA0 T--Thymine PA0 G--Guanine PA0 C--Cytosine PA0 Tris-2-Amino-2-hydroxyethyl-1,3-propanediol PA0 EDTA--ethylenediamine tetraacetic acid PA0 ATP--adenosine triphosphate PA0 TTP--thymidine triphosphate PA0 A=adenine PA0 G=guanine PA0 C=cytosine PA0 T=thymine PA0 X=T or C if Y is A or G PA0 X=C if Y is C or T PA0 Y=A, G, C or T if X is C PA0 Y=A or G if X is T PA0 W=C or A if Z is A or G PA0 W=C if Z is C or T PA0 Z=A, G, C or T if W is C PA0 Z=A or G if W is A PA0 QR=TC if S is A, G, C or T PA0 QR=AG is S if T or C PA0 S=A, G, C or T if QR is TC PA0 S=T or C if QR is AG PA0 J=A or G PA0 K=T or C PA0 L=A, T, C or G PA0 M=A, C or T
DNA is a high molecular weight polymer of biological origin. The structural units of the polymer are deoxyribonucleotides, each of which contains a purine or pyrimidine base to which is linked deoxyribose having a phosphate moiety esterified at the 3' or 5' hydroxyl of the deoxyribose. The polymer is constructed by the linking together of deoxyribonucleotides by the formation of phosphodiester bonds between the 5' position of one nucleotide and the 3' of its next neighbor. A linear polymer of nucleotides is thus formed, sometimes having at one terminus a free 5' phosphate and/or at the other a free 3' phosphate. In some instances one or both of the phosphate termini may be removed by hydrolysis leaving free 5' or 3' hydroxyl ends. The significant feature of this linkage mode is that it results in a polynucleotide strand which is directional in the sense that one end can be distinguished from the other.
There are four purine or pyrimidine bases found in the vast majority of DNA's which have been analyzed. These are the purines, adenine and guanine, and the pyrimidines, cytosine and thymine (hereinafter A, G, C, and T, respectively). It is the specific sequence of the bases A, G, C, and T which confers the biological functions of DNA as the repository of genetic information in a living cell.
The native conformation of DNA is in the form of paired polynucleotide strands of opposite directionality. The strands are held together by the cooperative effect of multiple hydrogen bonding between specific purine-pyrimidine pairs. The molecular sizes and hydrogen bonding angles are such that A and T form a specific pair and G and C form a specific pair. As a result, the base sequence in one strand of native DNA is mirrored by a complementary sequence in the other strand, due to the base pairing relationships just described. By way of illustrating this relationship, a heptanucleotide having the sequence ACCGTTG, reading from the 5' end to the 3' end, will be found paired with a complementary strand having the sequence CAACGGT from 5' to 3'. By convention, however, the native structure is depicted with one strand in the 5' to 3' orientation and the complementary strand in the 3' to 5' orientation: EQU 5' ACCGTTG 3' EQU 3' TGGCAAC 5'
DNA may exist in several alternate states in addition to the native configuration, a linear double stranded polymer, as described above. It may also exist as individual single strands and it may exist as a double stranded molecule for a portion of its length but containing single stranded gaps or single stranded ends. Of particular biological significance is the fact that DNA commonly forms closed rings by the formation of phosphodiester bonds between its opposite ends. The 5' end of one strand joins its 3' end by means of a phosphodiester linkage, and a similar linkage is formed between the 5' and 3' ends of the complementary strand. Such rings have been found ranging in molecular weight from less than 1.times.10.sup.6 to more than 1.times.10.sup.9. Rings which are not covalently closed can also be formed. A procedure for forming such rings from double stranded linear DNA has been developed in the prior art and will be described in detail below. In general, the procedure involves the addition of complementary sequences to either the 5' or 3' ends of the linear molecule. Such complementary single strand sequences are termed cohesive ends because they are capable of pairing with each other by means of the specific hydrogen bonded base pairing relationships described. When such pairing occurs, under the appropriate conditions of temperature, ionic strength and solvent composition, a double stranded ring can be formed, held in place by the hydrogen bonding interactions of the cohesive ends. Similarly a small linear piece may be joined with a large linear piece provided the two have cohesive ends of complementary sequence, and the combination can also form a closed ring if both ends of both molecules are mutually cohesive. Enzyme reactions can form covalent bonds joining the ends and stabilizing the structure.
The biological significance of the base sequence of DNA, as previously stated, is as a repository of genetic information. It is known that the sequence of bases in DNA is used as a code specifying the amino acid sequence of all proteins made by the cell. In addition, portions of the sequence are used for regulatory purposes, to control the timing and amount of each protein made. The nature of these controlling elements is only partially understood. Finally, the sequence of bases in each strand is used as a template for the replication of DNA which accompanies cell division.
The manner by which base sequence information in DNA is used to determine the amino acid sequence of proteins is a fundamental process which, in its broad outlines is universal to all living organisms. It has been shown that each amino acid commonly found in proteins is determined by one or more trinucleotide or triplet sequences. Therefore, for each protein, there is a corresponding segment of DNA containing a sequence of triplets corresponding to the protein amino acid sequence. The genetic code is shown in the accompanying table.
In the process of converting the nucleotide sequence information into amino acid sequence structure, a first step, termed transcription, is carried out. In this step, a local segment of DNA having a sequence which specifies the protein to be made is first copied with RNA. RNA is a polynucleotide similar to DNA except that ribose is substituted for deoxyribose and uracil is used in place of thymine. The bases in RNA are capable of entering into the same kind of base pairing relationships that exist with DNA. Consequently, the RNA transcript of a DNA nucleotide sequence will be complementary to the sequence copied. Such RNA is termed messenger RNA (mRNA) because of its status as an intermediary between the genetic apparatus and the protein synthesizing apparatus of the cell. Isolation of intact mRNA is technically extremely difficult due to the presence of the enzyme RNase which catalyzes the hydrolysis of the phosphodiester bonds in the ribonucleotide sequence. This enzyme is ubiquitous, extremely stable and highly active. The hydrolysis of a single phosphodiester bond within the mRNA chain cannot be tolerated since that would destroy the sequence continuity necessary to preserve the genetic information. Within the cell, mRNA is used as a template in a complex process involving a multiplicity of enzymes and organelles within the cell, which results in the synthesis of the specified amino acid sequence. This process is referred to as the translation of the mRNA.
There are often additional steps, called processing, which are carried out to convert the amino acid sequence synthesized by the translational process into a functional protein. An example is provided in the case of insulin.
______________________________________ Genetic Code ______________________________________ Phenylalanine (Phe) TTK Histidine (His) CAK Leucine (Leu) XTY Glutamine (Gln) CAJ Isoleucine (Ile) ATM Asparagine (Asn) AAK Methionine (Met) ATG Lysine (Lys) AAJ Valine (Val) GTL Aspartic acid (Asp) GAK Serine (Ser) QRS Glutamic acid (Glu) GAJ Proline (Pro) CCL Cysteine (Cys) TGK Threonine (Thr) ACL Tryptophan (Try) TGG Alanine (Ala) GCL Arginine (Arg) WGZ Tyrosine (Tyr) TAK Glycine (Gly) GGL Termination signal TAJ Termination signal TGA ______________________________________
Key: Each 3-letter triplet represents a trinucleotide of mRNA, having a 5' end on the left and a 3' end on the right. The letters stand for the purine or pyrimidine bases forming the nucleotide sequence.
The immediate precursor of insulin is a single polypeptide, termed proinsulin, which contains the two insulin chains A and B connected by another peptide, C. See Steiner, D. F., Cunningham, D., Spigelman, L. and Aten, B., Science 157, 697 (1967). Recently it has been reported that the initial translation product of insulin mRNA is not proinsulin itself, but a preproinsulin that contains more than 20 additional amino acids on the amino terminus of proinsulin, See Cahn, S. J., Keim, P. and Steiner, D. F., Proc. Natl. Acad. Sci. USA 73, 1964 (1976) and Lomedico, P. T. and Saunders, G. F., Nucl. Acids Res. 3, 381 (1976). The structure of the preproinsulin molecule can be represented schematically as NH.sub.2 -(pre-peptide)-B chain-(C peptide)-A chain-COOH.
Many proteins of medical or research significance are found in or made by the cells of higher organisms such as vertebrates. These include, for example, the hormone insulin, other peptide hormones such as growth hormone, proteins involved in the regulation of blood pressure, and a variety of enzymes having industrial, medical or research significance. It is frequently difficult to obtain such proteins in usable quantities by extraction from the organism, and this problem is especially acute in the case of proteins of human origin. Therefore there is a need for techinques whereby such proteins can be made by cells outside the organism in reasonable quantity. In certain instances, it is possible to obtain appropriate cell lines which can be maintained by the techniques of tissue culture. However, the growth of cells in tissue culture is slow, the medium is expensive, conditions must be accurately controlled, and yields are low. Moreover, it is often difficult to maintain a cultured cell line having the desired differentiated characteristics.
In contrast, microorganisms such as bacteria are relatively easy to grow in chemically defined media. Fermentation technology is highly advanced, and can be well controlled. Growth of organisms is rapid and high yields are possible. In addition, certain microorganisms have been thoroughly characterized genetically and in fact are among the best characterized and best understood organisms.
Therefore it is highly desirable to achieve the transfer of a gene coding for a protein of medical significance, from an organism which normally makes the protein to an appropriate microorganism. In this way it is possible that the protein could eventually be made by the microorganism, under controlled conditions of growth, and obtained in the desired quantities. It is also possible that substantial reductions in the over-all costs of producing the desired protein could be achieved by such a process. In addition, the ability to isolate and transfer the genetic sequence which determines the production of a particular protein into a microorganism having a well-defined genetic background could provide a research tool of great value to the study of how the synthesis of such a protein is controlled and how the protein is processed after synthesis.
The present invention provides a means for achieving the above recited goals. A process is disclosed involving a complex series of steps involving enzyme-catalyzed reactions. The nature of these enzyme reactions as they are understood in the prior art is described herewith.
Reverse transcriptase catalyzes the synthesis of DNA complementary to an RNA template strand in the presence of the RNA template, an oligodeoxynucleotide primer and the four deoxynucleoside triphosphates, dATP, dGTP, dCTP, and TTP. The reaction is initiated by the non-covalent bonding of the oligo-deoxynucleotide primer to the 3' end of mRNA followed by stepwise addition of the appropriate deoxynucleotides, as determined by base pairing relationships with the mRNA nucleotide sequence, to the 3' end of the growing chain. The product molecule may be described as a hairpin structure containing the original RNA together with a complementary strand of DNA joined to it by a single stranded loop of DNA. Reverse transcriptase is also capable of catalyzing a similar reaction using a single stranded DNA template, in which case the resulting product is a double stranded DNA hairpin having a loop of single stranded DNA joining one set of ends. See Aviv, H. and Leder, P., Proc. Natl. Acad. Sci. USA 69, 1408 (1972) and Efstratiadis, A., Kafatos, F. C., Maxam, A. F. and Maniatis, T., Cell 7, 279 (1976).
Restriction endonucleases are enzymes capable of hydrolyzing phosphodiester bonds in double stranded DNA, thereby creating a break in the continuity of the DNA strand. If the DNA is in the form of a closed loop, the loop is converted to a linear structure. The principal feature of an enzyme of this type is that its hydrolytic action is exerted only at a point where a specific nucleotide sequence occurs. Such a sequence is termed the recognition site for the restriction endonuclease. Restriction endonucleases from a variety of sources have been isolated and characterized in terms of the nucleotide sequence of their recognition sites. Some restriction endonucleases hydrolyze the phosphodiester bonds on both strands at the same point, producing blunt ends. Others catalyze hydrolysis of bonds separated by a few nucleotides from each other, producing free single stranded regions at each end of the cleaved molecule. Such single stranded ends are self-complementary, hence cohesive, and may be used to rejoin the hydrolyzed DNA. Since any DNA susceptible of cleavage by such an enzyme must contain the same recognition site, the same cohesive ends will be produced, so that it is possible to join heterologous sequences of DNA which have been treated with restriction endonuclease to other sequences similarly treated. See Roberts, R. J., Crit. Rev. Biochem 4, 123 (1976). Restriction sites are relatively rare, however the general utility of restriction endonucleases has been greatly amplified by the chemical synthesis of double stranded oligonucleotides bearing the restriction site sequence. Therefore virtually any segment of DNA can be coupled to any other segment simply by attaching the appropriate restriction oligonucleotide to the ends of the molecule, and subjecting the product to the hydrolytic action of the appropriate restriction endonuclease, thereby producing the requisite cohesive ends. See Heyneker, H. L., Shine, J., Goodman, H. M., Boyer, H. W., Rosenberg, J., Dickerson, R. E., Narang, S. A., Itakura, K., Lin, S. and Riggs, A. D., Nature 263, 748 (1976) and Scheller, R. H., Dickerson, R. E., Boyer, H. W., Riggs, A. D. and Itakura, K., Science 196, 177 (1977).
S1 endonuclease is an enzyme of general specificity capable of hydrolyzing the phosphodiester bonds of single stranded DNA or of single stranded gaps or loops in otherwise double stranded DNA. See Vogt, V. M., Eur. J. Biochem, 33, 192 (1973).
DNA ligase is an enzyme capable of catalyzing the formation of a phosphodiester bond between two segments of DNA having a 5' phosphate and a 3' hydroxyl, respectively, such as might by formed by two DNA fragments held together by means of cohesive ends. The normal function of the enzyme is thought to be in the joining of single strand nicks in an otherwise double stranded DNA molecule. However, under appropriate conditions, DNA ligase is capable of catalyzing blunt end ligation in which two molecules having blunt ends are covalently joined. See Sgaramella, V., Van de Sande, J. H., and Khorana, H. G., Proc. Natl. Acad. Sci. USA 67, 1468 (1970).
Alkaline phosphatase is an enzyme of general specificity capable of hydrolyzing phosphate esters including 5' terminal phosphates on DNA.
A further step in the overall process to be described is the insertion of a specific DNA fragment into a DNA vector, such as a plasmid. Plasmid is the term applied to any autonomously replicating DNA unit which might be found in a microbial cell, other than the genome of the host cell itself. A plasmid is not genetically linked to the chromosome of the host cell. Plasmid DNA's exist as double stranded ring molecules generally on the order of a few million molecular weight, although some are greater than 10% molecular weight, and they unually represent only a small percent of the total DNA of the cell. Plasmid DNA is usually separable from host cell DNA by virtue of the great difference in size between them. Plasmids can replicate independently of the rate of host cell division and in some cases their replication rate can be controlled by the investigator by variations in the growth conditions. Although the plasmid exists as a closed ring, it is posible by artificial means to introduce a segment of DNA into the plasmid, forming a recombinant plasmid with enlarged molecular size, without substantially affecting its ability to replicate or to express whatever genes it may carry. The plasmid therefore serves as a useful vector for transferring a segment of DNA into a new host cell. Plasmids which are useful for recombinant DNA technology typically contain genes which may be useful for selection purposes, such as genes for drug resistance.
In addition to the specialized techniques of the prior art just described, the present work also entails the use of numerous conventional techniques known in the art including chromatography, electrophoresis, centrifugation, solvent extraction, and precipitation. Reference is made to such specific techniques in the examples.
For general background see Watson, J. D., The Molecular Biology of the Gene, 3d Ed., Benjamin, Menlo Park, Calif., (1976); Davidson, J. N., The Biochemistry of the Nucleic Acids, 8th Ed., Revised by Adams, R. L. P., Burdon, R. H., Campbell, A. M. and Smellie, R. M. S., Academic Press, New York, (1976); and Hayes, W., "The Genetics of Bacteria and Their Viruses", Studies in Basic Genetics and Molecular Biology, 2d Ed., Blackwell Scientific Pub., Oxford (1968).
To illustrate the practice of the present invention, the isolation and transfer of the rat insulin gene is described in detail. Insulin was chosen for this effort because of its central significance from the standpoint of clinical medicine, and from the standpoint of basic research. The disclosed procedure is applicable by those of oridinary skill in the art to the isolation of the insulin gene of other organisms, including humans.
Insulin was first isolated in 1922. At the present time, the use of this hormone in the treatment of diabetes is well-known. Although slaughterhouses provide beef and pig pancreases as insulin sources, a shortage of the hormone is developing as the number of diabetics increases worldwide. Moreover, some diabetics develop an allergic reaction to beef and pig insulin, with deleterious effects. The ability to produce human insulin to quantities sufficient to satisfy world needs is therefore highly desirable. Manufacturing human insulin in bacteria is a technique which could achieve this desired goal. However, prior to the present invention, progress toward this desired goal has been thwarted by the fact that no technique has been developed to introduce the insulin gene into a bacteria. The present invention provides such a technique.
Further research is required before it is possible to make proteins, like insulin, on a commercial scale from bacteria which have received a specific DNA sequence that is the genetic determinant of that protein. Whether or not a gene within a cell makes protein depends on many factors, including the position and orientation of the DNA relative to special sequences of the host DNA that tell the host cell when to start and stop making protein. The first steps, isolating the appropriate gene and transferring it to bacteria, are now achievable by the processes of the present invention. These processes are described in detail for the insulin gene.
In addition to its direct usefulness in the production of proteins of therapeutic interest by microorganisms, the process of the present invention in research is designed to gain a further understanding of the expression of insulin genes in normal and pathological states such as diabetes. Little is currently known about the nature of such control. Although insulin is composed of two polypeptide chains, designated A and B, it is the product of a single gene. Insulin is produced specifically by certain endocrine cells, termed B cells, in the pancreas. The B cells are found as part of certain histologically distinct structures within the pancreas known as the islets of Langerhans, where they comprise the majority of cells.
The ability to obtain DNA having a specific sequence which is the genetic code for a specific protein makes it possible to modify the nucleotide sequence by chemical or biological means such that the specific protein ultimately produced is also modified. This would make it possible to produce, for example, a modified insulin tailored to suit a specific medical need. The genetic capacity to produce any insulin-related amino acid sequence having the essential functional properties of insulin may therefore be conferred upon a microorganism.
The ability to transfer the genetic code for a specific protein necessary to the normal metabolism of a particular higher organism to a microorganism such as a bacterium opens significant possibilities for culture production of such proteins. This in turn affords significant possibilities for augmenting or replacing the output of such proteins with those produced by microorganisms altered pursuant to this invention, whenever the ability of the higher organism to function normally in the production of such proteins has been impaired, and suggest, e.g., the possibility of establishing symbiotic relationships between microorganisms produced pursuant to this invention and human beings with chronic or acute deficiency diseases, whereby microorganisms genetically altered as herein taught might be implanted in or otherwise associated with a human to compensate for the pathologic deficiency in the metabolism of the latter.