One of the major advances in molecular biology has been the ability to produce recombinant proteins, especially proteins which have therapeutic value. Recombinant polynucleotides encoding the proteins of interest can be introduced by way of expression vectors into any number of host cells which will then produce the desired protein. This technique is especially useful for the production of short peptides, particularly those which do not require extensive post-translational modification for biological activity.
Current manufacturing procedures allow for several methods for the manufacture of proteins. One such method includes the use of peptide synthesizers designed for research purposes. Production of small peptides of high value has been accomplished by peptide synthesizers in the past. Advances in peptide synthesis in the last 30 years have allowed the synthesis of peptides of up to approximately 120 amino acids long. While the technical peptide length limit is approximately 100–120, the yield drops off with increasing length. This fundamental yield limitation leads to drastically increased cost for synthesizing long peptides or for synthesizing large quantities of small peptides. For this and other reasons, the industrial scale synthesis of peptides via peptide synthesizers, chemical synthesis, or manual synthesis is not feasible for long peptides and proteins.
A second method includes the production of peptides through microbial fermentation. A number of peptides have been synthesized in this fashion including human insulin in yeast. This method may or may not be suitable depending on the size of the protein and the post-translational modification required.
A third method is the use of transgenic plants. Transgenic plants can be used as factories to produce proteins on a tons per year scale. Transgenic plants do not require the large investment in infrastructure that is required with large scale production of proteins by fermentation and plants can be consumed directly, thus eliminating the need to purify the protein. In addition, facilities for the harvesting, storage and processing of plants are largely in place. Edible transgenic plants also provide a means by which peptides of nutritional or therapeutic value can be administered without further processing through the direct consumption of the plants, their seeds or fruits, or edible products made from the plants.
The development of the polymerase chain reaction (PCR) has greatly aided in the production of recombinant polynucleotides for host cell transformation. The basic PCR procedure, which is described in U.S. Pat. Nos. 4,683,202, 4,683,195 and 4,800,159, typically involves the treatment of a double-stranded polynucleotide template with a pair of oligonucleotide primers which flank the sequence of interest. Conditions are manipulated so that the primers bind to the complementary templates and extension of the 3′ ends of each primer results in production of two new double stranded polynucleotides containing the sequence of interest. The newly produced polynucleotides are then denatured, usually by heating, and the process of primer annealing and extension repeated. By repeating the process many times, copies of the desired sequence can be produced in an exponential fashion. Using PCR, it is possible to rapidly produce large numbers of recombinant polynucleotides for host cell transformation. In addition, variations on the basic PCR technique allow for such things as the introduction of restriction enzyme cleavage sites, site directed and random mutations, and the production of chimeric proteins.
As part of the PCR reaction, the primers used become part of the newly synthesized molecule. In most cases, the presence of the primers does not create a problem since the value of the protein produced is not affected by the presence of the primers. In many cases, the presence of the primers is an advantage, because they allow the introduction of mutations, cleavage sites for the introduction of the sequence into a vector, or sites which can be used to link several sequences together to produce a longer sequence than can normally be produced using PCR alone.
One type of polynucleotide that can be produced by PCR is that which contains tandem repeats. Tandem repeats are especially useful in the production of short peptides. During expression of the protein encoded by the sequence, the presence of large numbers of small molecules can create an osmotic stress on the host cell. This osmotic stress can result in decreased translation or in extreme cases death of the host cell, thus limiting the amount of the protein produced in plants. The osmotic stress can be decreased, if instead of producing many small molecules, a lesser number of large protein molecules each containing multiple copies of the peptide are produced. These large protein molecules can then be processed to produce the smaller peptides.
Methods have been developed for the production of recombinant proteins containing repeating units. For example, Sadler et al. ((1980) Gene 8:279–300) discloses plasmids containing tandem repeats of a synthetic lactose operator constructed by combining Tinkered 40 base operator fragments. Gupta et al. ((1983) Bio/Technology 1:602–609) reports the construction of repeats of a palindromic dodecamer by annealing and ligation. Maugh et al. (U.S. Pat. No. 5,149,657) and Ferrari et al. (U.S. Pat. Nos. 5,243,038 and 6,018,030) teach the production polypeptides containing repeats of adhesive proteins by ligation of individual fragments. Although effective, production of nucleotide sequences containing tandem repeats by ligation is slow, labor intensive, and does not result in the rapid production of polynucleotides such as is possible with PCR.
White et al., ((1991) Anal. Biochem., 199:184–190) disclose a method for the production of polynucleotides containing repeating units in which oligonucleotide and partially complimentary linker pairs are ligated together to form concatemers. These concatemers then serve as templates in a PCR reaction which may or may not contain supplemental primers. In a variation, the oligonucleotide and linker pairs are not ligated together to form concatemers, but are simply combined in the PCR reaction mixture, where their complementary portions anneal to for a double stranded complex with single stranded extensions at their 5′ ends. White et al. teach the use of the products produced as hybridization probes, or targets in applications such as run-on transcription or analysis of repetitive DNA sequences. White et al. does not teach or suggest the use of the method for the production of polypeptides containing repeating units.
One limitation of many of the prior art methods is the presence of linkers within the polynucleotides produced. As with primers, the presence of linkers can serve useful functions, for example, providing a cleavage site for inserting the polynucleotide into an expression vector or encoding a cleavage site to allow isolation of the individual peptides after expression. The are some circumstances, however, where the elimination of linkers in repetitive polypeptides may be advantageous, for example, in small bioactive peptides where the presence of even a single additional amino acid can have a marked effect on biological function. In such instances, the presence of the linker or remnants of the linker following cleavage can have a detrimental effect on activity and requires that the linkers be cleaved from the peptide which then must be separated from the free linkers in order to obtain a purified product. The increased number of steps required can greatly add to the cost of production for peptides that are produced in large quantities. The present invention provides for the efficient assembly of repeating polynucleotides with or without intervening linkers or sequences.
An additional problem with polynucleotides containing tandem repeats is stability within a host cell. Gupta et al. ((1983) Bio/Techniques, 1:602–609), reported that a palindromic DNA containing a dodecamer was stable only when its size did not exceed 120 base pairs. The same authors noted, however, that stability could be achieve by insertion of a nonpalindromic sequence. Such a solution is not feasible, however, in the situation where the inclusion of additional sequences is undesirable. An alternative solution exploits the degeneracy in the genetic code. See e.g. U.S. Pat. Nos. 5,149,657 and 5,243,038. In this method, different codons are used resulting in sequences which encode the same amino acid sequence, but which contain different nucleotide sequences. In this way, the repetitiveness of the nucleotide sequence is decreased, resulting in greater stability. Until the present invention, however, degenerate sequences have not be used in conjunction with nucleotide chain extension reactions such as PCR. Instead, degenerate sequences were synthesized, ligated together and repeatedly inserted into vectors to produce sequences with large numbers of repeats. Previous methods to produce tandem repeats by chain extension, have utilized sequences of known composition. Supposedly this was done to insure proper annealing which is necessary for chain extension to take place. What is needed, therefore, is a method for the rapid production by chain extension of nucleotide sequences encoding repeating peptides wherein the nucleotide sequences utilized exploit the degeneracy of the genetic code. The present invention meets that need.
The present inventors have surprisingly discovered a novel method by which it is possible to rapidly produce nucleotide sequences with high, preferably maximum, degeneracy encoding repeating peptide units by chain extension. Unlike previous methods, the present invention does not require that the exact sequences of the oligonucleotides used in the chain extension reaction be known. Rather, oligonucleotides can be used that have been synthesized to result in the greatest possible variation in nucleotide sequence allowed by the genetic code. Thus, the present invention provides a novel method for the rapid, economical production of highly stable nucleotide sequences encoding large repeating protein molecules.