In the past decade, the science of genetic engineering has developed rapidly. A variety of processes are known for inserting a heterologous gene into bacteria, whereby the bacteria become capable of efficient expression of the inserted genes. Such processes normally involve the use of plasmids which may be cleaved at one or more selected cleavage sites by restriction endonucleases, discussed below. Typically, a gene of interest is obtained by cleaving one piece of DNA and the resulting DNA fragment is mixed with a fragment obtained by cleaving a vector such as a plasmid. The different strands of DNA are then connected ("ligated") to each other to form a reconstituted plasmid. See, for example, U.S. Pat. No. 4,237,224 (Cohen and Boyer, 1980); U.S. Pat. No. 4,264,731 (Shine, 1981); U.S. Pat. No. 4,273,875 (Manis, 1981); U.S. Pat. No. 4,322,499 (Baxter et al., 1982), and U.S. Pat. No. 4,336,336 (Silhavy et al., 1982). A variety of other reference works are also available. Some of these works describe the natural processes whereby DNA is transcribed into messenger (mRNA) and mRNA is translated into protein; see, e.g., Stryer, 1981 (note: all references cited herein, other than patents, are listed with citations after the Examples); Lehninger, 1975. Other works describe methods and products of genetic manipulation; see, e.g., Maniatis et al., 1982; Setlow and Hollaender, 1979.
Most of the genetic engineering work performed to date involves the insertion of genes into various types of cells primarily bacteria such as E. coli, various other types of microorganisms such as yeast, and mammalian cells. However, many of the techniques and substances used for genetic engineering of animal cells and microorganisms are not directly applicable to genetic engineering involving plants.
As used herein, the term "plant" refers to a multicellular differentiated organism that is capable of photosynthesis, such as angiosperms and multicellular algae. This does not include microorganisms, such as bacteria, yeast, and fungi. However, the term "plant cells" includes any cell derived from a plant; this includes undifferentiated tissue such as callus or crown gall tumor, as well as plant seeds, propagules, pollen, and plant embryos.
A variety of plant genes have been isolated, some of which have been published and/or are publicly available. Such genes include the soybean actin gene (Shah et al., 1982), corn zein (Pederson et al., 1982) soybean leghemoglobin (Hyldig-Nielsen et al., 1982), and soybean storage proteins (Fischer and Goldberg, 1982).
The Reigons of a Gene
The expression of a gene involves the creation of a polypeptide which is coded for by the gene. This process involves at least two steps: part of the gene is transcribed to form messenger RNA, and part of the mRNA is translated into a polypeptide. Although the processes of transcription and translation are not fully understood, it is believed that the transcription of a DNA sequence into mRNA is controlled by several regions of DNA. Each region is a series of bases (i.e., a series of nucleotide residues comprising adenosine (A), thymidine (T), cytidine (C), and guanidine (G)) which are in a desired sequence. Regions which are usually present in a eucaryotic gene are shown on FIG. 1. These regions have been assigned names for use herein, and are briefly discussed below. It should be noted that a variety of terms are used in the literature, which describes these regions in much more detail.
An association region 2 causes RNA polymerase to associate with the segment of DNA. Transcription does not occur at association region 2; instead, the RNA polymerase normally travels along an intervening region 4 for an appropriate distance, such as about 100-300 bases, after it is activated by association region 2.
A transcription initiation sequence 6 directs the RNA polymerase to begin synthesis of mRNA. After it recognizes the appropriate signal, the RNA polymerase is believed to begin the synthesis of mRNA an appropriate distance, such as about 20 to about 30 bases, beyond the transcription initiation sequence 6. This is represented in FIG. 1 by intervening region 8.
The foregoing sequences are referred to collectively as the promoter region of the gene.
The next sequence of DNA is transcribed by RNA polymerase into messenger RNA which is not translated into protein. In general, the 5' end of a strand of mRNA attaches to a ribosome. In bacterial cells, this attachment is facilitated by a sequence of bases called a "ribosome binding site" (RBS). However, in eucaryotic cells, no such RBS sequence is known to exist. Regardless of whether an RBS exists in a strand of mRNA, the mRNA moves through the ribosome until a "start codon" is encountered. The start codon is usually the series of three bases, AUG; rarely, the codon GUG may cause the initiation of translation. The non-translated portion of mRNA located between the 5' end of the mRNA and the start codon is referred to as the 5' non-translated region 10 of the mRNA. The corresponding sequence in the DNA is also referred to herein as 5' non-translated region 12. The specific series of bases in this sequence is not believed to be of great importance to the expression of the gene; however, the presence of a premature start codon might affect the translation of the mRNA (see Kozak, 1978).
A promoter sequence may be significantly more complex than described above; for example, certain promoters present in bacteria contain regulatory sequences that are often referred to as "operators." Such complex promoters may contain one or more sequences which are involved in induction or repression of the gene. One example is the lac operon, which normally does not promote transcription of certain lactose-utilizing enzymes unless lactose is present in the cell. Another example is the trp operator, which does not promote transcription or translation of certain tryptophan-creating enzymes if an excess of tryptophan is present in the cell. See, e.g., Miller and Reznikoff, 1982.
The next sequence of bases is usually called the coding sequence or the structural sequence 14 (in the DNA molecule) or 16 (in the mRNA molecule). As mentioned above, the translation of a polypeptide begins when the mRNA start codon, usually AUG, reaches the translation mechanism in the ribosome. The start codon directs the ribosome to begin connecting a series of amino acids to each other by peptide bonds to form a polypeptide, starting with methionine, which always forms the amino terminal end of the polypeptide (the methionine residue may be subsequently removed from the polypeptide by other enzymes). The bases which follow the AUG start codon are divided into sets of 3, each of which is a codon. The "reading frame," which specifies how the bases are grouped together into sets of 3, is determined by the start codon. Each codon codes for the addition of a specific amino acid to the polypeptide being formed. The entire genetic code (there are 64 different codons) has been solved; see, e.g., Lehninger, supra, at p. 962. For example, CUA is the codon for the amino acid leucine; GGU specifies glycine, and UGU specifies cysteine.
Three of the codons (UAA, UAG, and UGA) are "stop" codons; when a stop codon reaches the translation mechanism of a ribosome, the polypeptide that was being formed disengages from the ribosome, and the last preceding amino acid residue becomes the carboxyl terminal end of the polypeptide.
The region of mRNA which is located on the 3' side of a stop codon in a monocistronic gene is referred to herein as 3' non-translated region 18. This region is believed to be involved in the processing, stability, and/or transport of the mRNA after it is transcribed. This region 18 is also believed to contain a sequence polyadenylation signal 20, which is recognized by an enzyme in the cell. This enzyme adds a substantial number of adenosine residues to the mRNA molecule, to form poly-A tail 22.
The DNA molecule has a 3' non-translated region 24 and a polyadenylation signal 26, which code for the corresponding mRNA region 18 and signal 20. However, the DNA molecule does not have a poly-A tail. Polyadenylation signals 20 (mRNA) and 26 (DNA) are represented in the figures by a heavy dot.
Gene-Host Incompatibility
The same genetic code is utilized by all living organisms on Earth. Plants, animals, and microorganisms all utilize the same correspondence between codons and amino acids. However, the genetic code applies only to the structural sequence of a gene, i.e., the segment of mRNA bounded by one start codon and one stop codon which codes for the translation of mRNA into polypeptides.
However, a gene which performs efficiently in one type of cell may not perform at all in a different type of cell. For example, a gene which is expressed in E. coli may be transferred into a different type of bacterial cell, a fingus, or a yeast. However, the gene might not be expressed in the new host cell. There are numerous reasons why an intact gene which is expressed in one type of cell might not be expressed in a different type of cell. See, e.g., Sakaguchi and Okanishi, 1981. Such reasons include:
1. The gene might not be replicated or stably inherited by the progeny of the new host cell.
2. The gene might be broken apart by restriction endonucleases or other enzymes in the new host cell.
3. The promoter region of the gene might not be recognized by the RNA polymerases in the new host cell.
4. One or more regions of the gene might be bound by a repressor protein or other molecule in the new host cell, because of a DNA region which resembles an operator or other regulatory sequence of the host's DNA. For example, the lac operon includes a polypeptide which binds to a particular sequence of bases next to the lac promoter unless the polypeptide is itself inactivated by lactose. See, e.g., Miller and Reznikoff, 1982.
5. One or more regions of the gene might be deleted, reorganized, or relocated to a different part of the host's genome. For example, numerous procaryotic cells are known to contain enzymes which promote genetic recombination (such as the rec proteins in E. coli; see, e.g., Shibata et al., 1979) and transposition (see, e.g., The 45th Cold Spring Harbor Symposium on Quantitative Biology, 1981). In addition, naturally-occurring genetic modification can be enhanced by regions of homology between different strands of DNA; see, e.g., Radding, 1978.
6. mRNA transcribed from the gene may suffer from a variety of problems. For example, it might be degraded before it reaches the ribosome, or it might not be polyadenylated or transported to the ribosome, or it might not interact properly with the ribosome, or it might contain an essential sequence which is deleted by RNA processing enzymes.
7. The polypeptide which is created by translation of the mRNA coded for by the gene may suffer from a variety of problems. For example, the polypeptide may have a toxic effect on the cell, or it may be glycosylated or converted into an altered polypeptide, or it may be cleaved into shorter polypeptides or amino acids, or it may be sequestered within an intracellular compartment where it is not functional.
In general, the likelihood of a foreign gene being expressed in a cell tends to be lower if the new host cell is substantially different from the natural host cell. For example, a gene from a certain species of bacteria is likely to be expressed by other species of bacteria within the same genus. The gene is less likely to be expressed by bacteria of a different genus, and even less likely to be expressed by non-bacterial microorganisms such as yeast, fungus, or algae. It is very unlikely that a gene from a cell of one kingdom (the three kingdoms are plants, animals, and "protista" (microorganisms)) could be expressed in cells from either other kingdom.
These and other problems have, until now, thwarted efforts to obtain expression of foreign genes into plant cells. For example, several research teams have reported the insertion of foreign DNA into plant cells; see, e.g., Lurquin, 1979; Krens et al., 1982; Davey et al., 1980. At least three teams of researchers have reported the insertion of entire genes into plant cells. By use of radioactive DNA probes, these researchers have reported that the foreign genes (or at least portions thereof) were stably inherited by the descendants of the plant cells. See Hernalsteens et al., 1980; Garfinkel et al., 1981; Matzke and Chilton, 1981. However, there was no reported evidence that the foreign genes were expressed in the plant cells.
Several natural exceptions to the gene-host incompatibility barriers have been discovered. For example, several E. coli genes can be expressed in certain types of yeast cells, and vice-versa. See Beggs, 1978; Struhl et al., 1979.
In addition, certain types of bacterial cells, including Agrobacterium tumefaciens and A. rhizogenes, are capable of infecting various types of plant cells, causing plant diseases such as crown gall tumor and hairy root disease. These Agrobacterium cells carry plasmids, designated as Ti plasmids and Ri plasmids, which carry genes which are expressed in plant cells. Certain of these genes code for enzymes which create substances called "opines," such as octopine, nopaline, and agropine. Opines are utilized by the bacteria cells as sources of carbon, nitrogen, and energy. See, e.g., Petit and Tempe, 1978. The opine genes are believed to be inactive while in the bacterial cells; these genes are expressed only after they enter the plant cells.
In addition, a variety of man-made efforts have been reported to overcome one or more of the gene-host incompatibility barriers. For example, it has been reported that a mammalian polypeptide which is normally degraded within a bacterial host can be protected from degradation by coupling the mammalian polypeptide to a bacterial polypeptide that normally exists in the host cell. This creates a "fusion protein;" see, e.g., Itakura et al., 1977. As another example, in order to avoid cleavage of an inserted gene by endonucleases in the host cell, it is possible to either (1) insert the gene into host cells which are deficient in one or more endonucleases, or (2) duplicate the gene in cells which cause the gene to be methylated. See, e.g., Maniatis et al., 1981.
In addition, various efforts to overcome gene-host incompatibility barriers involve chimeric genes. For example, a structural sequence which codes for a mammalian polypeptide, such as insulin, interferon, or growth hormone, may be coupled to regulatory sequences from a bacterial gene. The resulting chimeric gene may be inserted into bacterial cells, where it will express the mammalian polypeptide. See, e.g., Guarente et al., 1980. Alternately, structural sequences from several bacterial genes have been coupled to regulatory sequences from viruses which are capable of infecting mammalian cells. The resulting-chimeric genes were inserted into mammalian cells, where they reportedly expressed the bacterial polypeptide. Southern and Berg, 1982; Colbere-Garapin et al., 1981.
Restriction Endonucleases
In general, an endonuclease is an enzyme which is capable of breaking DNA into segments of DNA. An endonuclease is capable of attaching to a strand of DNA somewhere in the middle of the strand, and breaking it. By comparison, an exonuclease removes nucleotides, from the end of a strand of DNA. All of the endonucleases discussed herein are capable of breaking double-stranded DNA into segments. This may require the breakage of two types of bonds: (1) covalent bonds between phosphate groups and deoxyribose residues, and (2) hydrogen bonds (A-T and C-G ) which hold the two strands of DNA to each other.
A "restriction endonuclease" (hereafter referred to as an endonuclease) breaks a segment of DNA at a precise sequence of bases. For example, EcoRI and HaeIII recognize and cleave the following sequences: ##STR1##
In the examples cited above, the EcoRI cleavage created a "cohesive" end with a 5' overhang (i.e., the single-stranded "tail" has a 5' end rather than a 3' end). Cohesive ends can be useful in promoting desired ligations. For example, an EcoRI end is more likely to anneal to another EcoRI end than to a HaeIII end.
Over 100 different endonucleases are known, each of which is capable of cleaving DNA at specific sequences. See, e.g., Roberts, 1982. All restriction endonucleases are sensitive to the sequence of bases. In addition, some endonucleases are sensitive to whether certain bases have been methylated. For example, two endonucleases, MboI and Sau3a are capable of cleaving the following sequence of bases as shown: ##STR2##
MboI cannot cleave this sequence if the adenine residue is methylated (me-A). Sau3a can cleave this sequence, regardless of whether either A is methylated. To some extent the methylation (and therefore the cleavage) of a plasmid may be controlled by replicating the plasmids in cells with desired methylation capabilities. An E. coli enzyme, DNA adenine methylase (dam), methylates the A residues that occur in GATC sequences. Strains of E. coli which do not contain the dam enzyme are designated as dam-cells. Cells which contain dam are designated as dam.sup.+ cells.
Several endonucleases are known which cleave different sequences, but which create cohesive ends which are fully compatible with cohesive ends created by other endonucleases. For example, at least five different endonucleases create 5' GATC overhangs, as shown in Table 1.
TABLE 1 Endonuclease Sequence MooI Inhibited by me-A ##STR3## Sau3a same as MooI Unaffected by me-A BglII Unaffected by me-A ##STR4## BclI Inhibited by me-A ##STR5## BamHI Unaffected by me-A ##STR6##
A cohesive end created by any of the endonucleases listed in Table 1 will ligate preferentially to a cohesive end created by any of the other endonucleases. However, a ligation of, for example, a BglII end with a BamHI end will create the following sequence: ##STR7##
This sequence cannot be cleaved by either Bgl II or BamHI; however, it can be cleaved by MboI (unless methylated) or by Sau3a.
Another endonuclease which involves the GATC sequence is PvuI, which creates a 3' overhang, as follows: ##STR8##
Another endonuclease, ClaI, cleaves the following sequence: ##STR9##
If X.sub.1 is G, or if X.sub.2 is C, then the sequence may be cleaved by MboI (unless methylated, in which case ClaI is also inhibited) or Sau3a.