The present invention relates generally to processes for the synthesis of polynucleotides, such as DNA and fragments of DNA, RNA and fragments of RNA, plasmids, genes, and chemically and/or structurally modified polynucleotides. The present invention also relates to the generation of libraries of polynucleotides, library screening and identification of library members having desired characteristics.
Living cells can be xe2x80x9creprogrammed,xe2x80x9d in vitro or in vivo, to produce useful amounts of desired proteins or other compounds by introducing the appropriate nucleic acids (DNA or RNA) into them; this concept is the keystone of modem biotechnology. The construction of recombinant DNA molecules necessary to achieve this xe2x80x9creprogrammingxe2x80x9d or to perform a varied and growing number of other functions is a frequent and necessary activity of molecular biology research and of biotechnological endeavors in industrial and academic settings. By improving the process by which DNA or RNA molecules of arbitrary sequence are made, a significant increase of productivity in biotechnology could be achieved, resulting in benefits in many fields including medical research, agriculture and the chemical industry. For example, numerous efforts to sequence the entire genomes of a variety of organisms (microbes, animals and plants) have generated many large databases of gene sequences. These genes can be made and studied experimentally through laborious and time-consuming techniques involving the isolation and subsequent manipulation (generally referred to as molecular cloning) of DNA from the organism in which the gene is found and/or expressed. Alternatively, inefficient DNA synthesis methods can be used, as described below.
The ability to synthesize large RNA or DNA molecules (e.g., entire genes) is of value to any endeavor that relies on recombinant DNA technology. As alluded to above, DNA molecules of arbitrary sequence can be synthesized in vitro. A solid phase method to synthesize oligonucleotides that is now widely used in commercial DNA synthesizers is reported in U.S. Pat. No. 4,458,066. Current DNA synthesizers, however, are limited to the production of relatively short single-stranded DNA oligonucleotide molecules of length typically less than 200 nucleotides (nt). In contrast, the average prokaryotic gene is 1000 basepairs (bp) in length, a eukaryotic cDNA is frequently longer than 2000 bp, and most plasmids are larger than 3000 bp. Although state-of-the-art oligonucleotide synthesizers relying on beta-cyanoethyl phosphoramidite chemistry (U.S. Pat. No. 5,935,527) can make and purify 48 oligonucleotides in less than 48 hours (25 nt/oligoxc3x9748 oligonucleotides=1200 nt, a typical bacterial gene), it is still very time consuming and labor-intensive to assemble these oligonucleotides together into a single gene.
Gene synthesis, a service frequently offered commercially by oligonucleotide manufacturers, is expensive (approximately $10 to $20/bp) and slow (frequently requiring several weeks) because current methods are labor-intensive. A method to make relatively large DNA molecules by mixing two long oligonucleotides (up to 400 nt) and amplifying the desired double-stranded DNA fragment from the mixture using the polymerase chain reaction (PCR) is reported in European Patent Application 90201671.6. This method becomes more complicated and requires extensive manipulations by a skilled technician when molecules larger than 400 bp must be synthesized. Similar statements can be made of the method of Khorana, Science, 1979, 203, 614-625.
A method to synthesize long nucleic acid molecules in which a ribo- or deoxyribo-oligonucleotide attached to a solid support is extended by the sequential addition of other xe2x80x9cassemblyxe2x80x9d oligonucleotides is reported in U.S. Pat. No. 5,942,609 and Chen, et al., Nucleic Acids Res., 1990, 18, 871. Of key importance to this process is the annealing of a partially complementary xe2x80x9cbridgingxe2x80x9d oligonucleotide to the two oligonucleotides that will be covalently linked together by a ligase. Although this method will likely achieve its stated goal of synthesizing long polynucleotides, the need for the synthesis of a bridging oligonucleotide adds to the total number of oligonucleotides which must be synthesized and purified, with an attendant increase in costs and time of synthesis. In addition, the assembly of a complex mixture of oligonucleotides would greatly complicate this process because of the large number of different bridging oligonucleotides that would be needed to bring together the assembly oligonucleotides. Moreover, it would be advantageous to obviate the need for the annealing step required to productively bind the bridging oligonucleotide to its target assembly oligonucleotides. Such a step may introduce complications due to the need to avoid non-specific hybridization problems. Complications may include the need to carefully control hybridization temperatures over lengthy incubation periods as well as to carefully design each bridging oligonucleotides to bind specifically to the desired sequence.
International Publication WO 83/02626 reports a method of assembling a polyribonucleotide using the enzyme T4 RNA ligase, including time-consuming purification steps, but does not include the use of solid phase methods which would facilitate automation and increase the reliability of the process. In contrast, Mudrakovskaia et al. (Bioorg. Khim., 1991, 17, 819-822) report a xe2x80x9csolid-phase enzymic synthesis of oligoribonucleotidesxe2x80x9d but do not disclose how the method could be used to couple more than a few nucleotides to a tethered oligonucleotide. Similarly, Schmitz, et al., (Org. Lett., 1999, 1, 1729) describes the synthesis of short oligonucleotides from mononucleotide building blocks using T4 RNA ligase, but reports exceedingly long reaction times, militating against the formation of longer sequences. Neither International Publication WO 83/02626, Mudrakovskaia et al., nor Schmitz, et al. disclose how their methods could be used to synthesize large ( greater than 200 nt) DNA or RNA molecules without requiring numerous and laborious purification steps.
Harada et al. (Proc. Natl. Acad. Sci. USA, 1993, 90, 1576-1579) reports in vitro selection techniques to characterize DNA sequences that are ligated efficiently by T4 RNA ligase. Tessier et al. (Anal. Biochem., 1986, 158, 171-178) reports a set of reaction conditions for ligation of DNA fragments up to 40 bases in length. Zhang et al. (Nuc. Acids Res., 1996, 24, 990-991) reports single-stranded DNA ligation by T4 RNA ligase for PCR cloning of 5xe2x80x2 noncoding fragments and coding sequence of a particular gene. Ligation of oligonucleotides using T4 RNA ligase has also been reported in Walker, et al., Proc. Natl. Acad. Sci. USA, 1975, 72, 122 and Ohtsuka, et al, Nucleic Acids Res., 1976, 3, 1613, but the technique was recognized as problematic due to the accumulation of unwanted by-products (Krug, et al., Biochemistry, 1982, 21, 1858).
The enhanced ability for de novo synthesis of large polynucleotides or genes may greatly facilitate the preparation of combinatorial libraries of polynucleotides because it would be much more efficient than existing methods. For example, combinatorial libraries of genes can be made by cassette mutagenesis (Oliphant, et al., Gene, 1986, 44, 177 and Oliphant, et al., Proc. Natl. Acad. Sci. USA, 1989, 86, 9094) whereby genes with random combinations of nucleotides are created. Similarly, U.S. Pat. Nos. 5,723,323; 5,763,192; 5,814,476; and 5,817,483 describe libraries of expression vectors having stochastic DNA regions. By simultaneously randomly mutating fifteen nucleotides of a gene, a billion different sequences can be generated. Current methods of screening and molecular cloning often limit the number of sequences that can be screened to a much smaller number. Although there are examples of libraries with 108 individual mutants (Cwirla, et al., Proc. Natl. Acad. Sci USA, 1990, 87, 6378), certain screening methods to identify useful enzymes are limited to a few thousand mutants. A process to optimize combinatorial libraries has been proposed (Arkin, et al., Proc. Natl. Acad. Sci. USA, 1992, 89, 7811) and tested (Delagrave, et al., Protein Eng., 1993, 6, 327 and Delagrave, et al., Biotechnology, 1993, 11, 1548) to circumvent this problem. A related approach has also been proposed to deal with the combinatorial diversity of phylogenies of protein sequences (Goldman, et al., Biotechnology, 1992, 10, 1557). However, these methods consider only libraries having degeneracies at the nucleotide level. In some instances, such as for large sets of phylogenically related sequences, combinatorial libraries where degeneracies are at the oligonucleotide level (i.e., blocks of nucleotides), rather than at the nucleotide level, are more favorable. This difference would allow alteration of an entire sequence instead of just a few nucleotides.
In an effort to prepare populations of polynucleotides, a method referred to as DNA shuffling has been developed. According to this method, described in U.S. Pat. No. 6,117,679 and Stemmer, et al., Proc. Natl. Acad. Sci. USA, 1994, 91, 10747, a series of related polynucleotides are isolated, fragmented, and recombined to form a population of polynucleotide variants. The recombination of related polynucleotides proceeds via hybridization of complementary or partially complementary fragments. The requirement for hybridization limits this method to polynucleotides with a certain minimal amount of homology. Moreover, recombination between polynucleotides tends to occur at points of high sequence identity which are found randomly along the sequences. There is, therefore, little control of the sites of recombination during a shuffling experiment. Furthermore, DNA shuffling methods are not amenable to working with RNA. However, in certain cases it may be advantageous to work directly with RNA molecules. For example, many viral genomes consist of single strands of RNA like flaviviruses such as Dengue, Japanese Encephalitis and West Nile, retroviruses such as HIV, and other animal and plant pathogens, including viroids (Fundamental Virology, Lippincott-Raven, Phildelphia, Pa., 1996) By constructing recombinant viral genomes, valuable vaccines may be developed (Guirakhoo, et al., Virology, 1999, 257, 363 and Monath, et al., Vaccine, 1999, 17, 1868), and the availability of methods to synthesize and recombine RNA more rapidly may accelerate this type of research.
A method of synthesizing large polynucleotides (such as RNA or DNA molecules longer than 200 bp) of arbitrary or predefined sequence and in a manner that will more readily lend itself to automation is desired. In addition, an improved version of the enzyme T4 RNA ligase that would increase the ability of this enzyme to catalyze the ligation of two oligonucleotides is also desired. Ideally, the improved enzyme would catalyze efficiently the ligation of oligonucleotides. Also, the ability of the enzyme to carry out these reactions at an elevated temperature or to use ddATP instead of ATP would be valuable properties in an improved ligase. By increasing the productivity of gene synthesis in laboratories, the present invention would improve scientists"" ability to find, for example, enzymes capable of catalyzing reactions necessary to synthesize a new drug.
All in all, de novo gene synthesis is a powerful technique that when fully optimized would contribute greatly to the fields of biotechnology and medicine. Not only would gene synthesis facilitate the manipulation of large polynucleotides by offering, for example, better control over the positioning of restriction sites or optimization of regions of sequence governing gene expression; the ability to synthetically build a gene would allow the directed and rapid formation of combinatorial gene libraries. Screening of these libraries for genes with desired properties may allow the discovery or development of new and improved biomolecules such as enzymes with increased activity or receptors with higher ligand affinity. Thus, new methods for the synthesis of polynucleotides are needed, and the present invention is directed toward this need, as well as others.
The present invention provides methods of preparing large polynucleotides (such as RNA or DNA molecules longer than 200 bp) of arbitrary sequence and in a manner that will more readily lend itself to automation than existing methods.
One aspect of the present invention is directed to methods of preparing a polynucleotide having at least 200 nucleotides and a predetermined nucleotide sequence comprising: providing a solid support, providing a plurality of oligonucleotides, wherein the combination of the nucleotide sequences of the oligonucleotides comprises the nucleotide sequence of the polynucleotide, contacting the solid support with the 3xe2x80x2 terminus of a first oligonucleotide from the plurality of oligonucleotides to form a tethered oligonucleotide, ligating the 3xe2x80x2 terminus of another oligonucleotide from the plurality of oligonucleotides to the 5xe2x80x2 terminus of the tethered oligonucleotide, and repeating the ligation with other oligonucleotides until the polynucleotide is prepared.
Another aspect of the present invention is directed to methods of preparing a polynucleotide having at least 200 nucleotides and a predetermined nucleotide sequence comprising: providing a solid support, providing a plurality of oligonucleotides, wherein the combination of the nucleotide sequences of the oligonucleotides comprises the nucleotide sequence of the polynucleotide, contacting the solid support with the 5xe2x80x2 terminus of a first oligonucleotide from the plurality of oligonucleotides to form a tethered oligonucleotide, ligating the 5xe2x80x2 terminus of another oligonucleotide from the plurality of oligonucleotides to the 3xe2x80x2 terminus of the tethered oligonucleotide, and repeating the ligation with other oligonucleotides until the polynucleotide is prepared.
The present invention further embodies, inter alia, a method of preparing a polynucleotide from a plurality of oligonucleotides, the method comprising blocking the 3xe2x80x2 terminus of a first oligonucleotide with a blocking group to form a blocked oligonucleotide, wherein the first oligonucleotide comprises the 3xe2x80x2 terminus of the polynucleotide; coupling the 3xe2x80x2 terminus of a further oligonucleotide from the plurality of oligonucleotides to the 5xe2x80x2 terminus of the blocked oligonucleotide to form a coupled oligonucleotide; amplifying the coupled oligonucleotide to form an amplified oligonucleotide substantially free of blocking group; and repeating the blocking, coupling, and amplifying steps with the amplified oligonucleotide until the polynucleotide is prepared.
The present invention further embodies a method of preparing a polynucleotide from a plurality of oligonucleotides, the method comprising blocking the 3xe2x80x2 terminus of each of the oligonucleotides, except for an unblocked oligonucleotide comprising the 5xe2x80x2 terminus of the polynucleotide, with a blocking group to form a plurality of blocked oligonucleotides; coupling the 3xe2x80x2 terminus of the unblocked oligonucleotide with the 5xe2x80x2 terminus of one of the blocked oligonucleotides; amplifying the coupled oligonucleotide to form an amplified oligonucleotide substantially free of blocking group; and repeating the coupling and amplifying steps with the amplified oligonucleotide until the polynucleotide is prepared.
The present invention also contemplates a method of preparing a library of polynucleotides comprising simultaneously generating a plurality of different polynucleotides, wherein each of the polynucleotides is prepared by coupling a plurality of oligonucleotides using a ligase, wherein at least one of the oligonucleotides is attached to solid support.
Libraries prepared according to the methods recited above are also contemplated by the present invention.
Further embodiments of the present invention include a method of identifying a polynucleotide with a predetermined property, the method comprising generating a library of polynucleotides according any of the methods recited above, and selecting at least one polynucleotide within the library having the predetermined property.
The present invention further includes a method of identifying a polynucleotide with a predetermined property, the method comprising generating a library of polynucleotides according to any of the methods recited above; selecting at least one polynucleotide within the library having the predetermined property; and repeating the generating and selecting steps wherein at least one oligonucleotide of the selected polynucleotides is preferentially incorporated into the library.