The present invention relates to methods for mutagenizing nucleic acids and proteins. More particularly, the present invention relates to methods for mutagenizing nucleic acids and proteins relative to an initial target nucleic acid sequence by randomly priming the target sequence during amplification.
The sequences of genes encoding many important proteins have been determined at a rapid speed owing to the fast progress in the field of genomics. The three-dimensional structures of thousands of proteins have been determined by X-ray crystallography and other biophysical and biochemical methods, and many more polypeptide sequences critical for the biological function of the proteins have also been determined. However, to a large extent, the correlation between protein primary sequence, tertiary structure, and biological function remains elusive.
Proteins can generally tolerate a certain level of amino acid substitutions without severe consequences on folding or stability (Axe et al., (1996) Proc. Natl. Acad. Sci. USA 93:5590-5594; Bowie et al., (1990) Science 247:1306-1310; Gassner et al. (1996) Proc. Natl. Acad. Sci. USA 93:12155-12158; Baldisseri et al. (1991) Biochem. 30:3628-33; Huang et al. (1996) J. Mol. Biol. 258:688-703.; Rennel et al. (1991) J. Mol. Biol. 222:67-88; Shortle (1995) Curr. Opin. Biotechnol. 6:387-393). On the other hand, for many proteins, a single particular residue can be either critical to function and/or stability (Philippon et al. (1998) Cell Mol. Life Sci. 54:341-346). Although it is desirable to be able to predict protein folding pattern from its primary sequence and to correlate its structure with function in vivo, in reality, this has proven to be a formidable task.
One approach to studying protein structure and function is site-directed mutagenesis. It is an important, but cumbersome approach to compiling an overall picture of protein functional character, let alone stability and regulatory characteristics in vivo. For example, serine beta-lactamases have been found to exhibit very diverse primary structures and catalytic profiles, but almost all of the known three-dimensional structures for serine beta-lactamases exhibit a high degree of similarity with apparently equivalent chemical functionalities in the same strategic positions (Philippon et al. (1998) Cell Mol. Life Sci. 54:341-346).
The apparent complexity of macromolecular structure-function correlation has made random mutagenesis an attractive approach to redesigning proteins. Many of the random mutagenesis methods developed so far are designed to introduce random base-pair substitutions.
Methods of saturation mutagenesis utilizing random or partially degenerate primers that incorporate restriction sites have been described (Hill et al. (1987) Methods Enzymol. 155:558-568; Reidhaar-Olson et al. (1991) Methods Enzymol. 208:564-586; Oliphant et al. (1986) Gene 44:177-183).
Error-prone polymerase chain reaction is another methodology for randomly mutating genes by altering the concentrations of respective dNTP""s in the presence of dITP (Leung, S. et al. (1989) Nucleic Acid Res. 17:1177-1195); Caldwell and Joyce (1992) In PCR Methods Application 2:28-33; Spee et al. (1993) Nucleic Acid Res. 21: 777-778).
xe2x80x9cCassettexe2x80x9d mutagenesis is another method for creating libraries of mutant proteins (Huebner et al. (1988) Gene 73:319-325; Hill et al. (1987) Methods Enzymol. 155:558-568; Shiraishi and Shimura (1988) Gene 64:313-319; U.S. Pat. Nos. 5,830,720; 5,830,721; 5,830,722; 5,830,728; 5,830,740; 5,830,741; and 5,830,742). Cassette mutagenesis typically replaces a sequence block length of a template with a partially randomized sequence. The maximum information content that can be obtained is thus limited statistically to the number of random sequences in the randomized portion of the cassette.
A protocol has also been developed by which synthesis of an oligonucleotide is xe2x80x9cdopedxe2x80x9d with non-native phosphoramidites, resulting in randomization of the gene section targeted for random mutagenesis (Wang and Hoover (1997) J. Bacteriol. 179: 5812-5819). This method allows control of position selection, while retaining a random substitution rate.
Zaccolo and Gherardi (1999) describe a method of random mutagenesis utilizing pyrimidine and purine nucleoside analogs (Zaccolo and Gherardi (1999) J. Mol. Biol. 285: 775-783). This method was successful in achieving substitution mutations which rendered a xe2x96xa1-lactamase with an increased catalytic rate against the cephalosporin cefotaxime. Crea describes a xe2x80x9cwalk throughxe2x80x9d method, wherein a predetermined amino acid is introduced into a targeted sequence at pre-selected positions (U.S. Pat. No. 5,798,208).
Methods for mutating a target gene by insertion and/or deletion mutations have also been developed. It has been demonstrated that insertion mutations could be accommodated in the interior of staphylococcal nuclease (Keefe et al. (1994) Protein Sci. 3:391-401). Another insertional mutagenesis method involves a partial fragmentation by a high frequency cutting restriction endonuclease, phosphatasing, and circularizing by appropriate linkers (Fitzgerald et al. (1994) Protein Sci. 3:391-401). Examples of deletional mutagenesis methods developed include the utilization of an exonuclease (such as exonuclease III or Bal31) or through oligonucleotide directed deletions incorporating point deletions (Ner et al. (1989) Nucleic Acids Res. 17:4015-4023).
Methods have also been developed to create molecular libraries as a part of the process of engineering the evolution of molecules with desired characteristics. Termed xe2x80x9cdirected evolutionxe2x80x9d or some variant thereof, protocols describing this type of technology typically involve the reassembly of fragments of DNA, representing a xe2x80x9cshuffledxe2x80x9d pool; in effect, accelerating the recombinatorial process that leads to molecules with desired and/or enhanced characteristics (Stemmer (1994) Nature 370: 389-391; Zhang et al. (1997) Proc. Natl. Acad. Sci. 94: 4504-4509). Such xe2x80x9cdirected molecular evolutionxe2x80x9d approaches have been utilized to mutagenize enzymes (Gulik and Fahl (1995) Proc. Natl. Acad. Sci. USA 92: 8140-8144; Stemmer (1994) Nature 370: 389-391;You and Arnold (1996) Protein Eng. 9:77-83; Zhang et al. (1997) Proc. Natl. Acad. Sci. USA. 94:4504-4509), antibodies (Barbas et al. (1994) Proc. Natl. Acad. Sci. USA. 91: 3809-3813; Crameri et al. (1997) Nature Biotech. 15:436-438.), fluorescent proteins (Heim and Tsien (1996) Curr. Biol. 6:178-182.; Siemering et al. (1996) Curr. Biol. 6:1653-1663), and entire operons (Crameri et al. (1996) Nature Med. 2: 100-102).
The present invention provides methods of random mutagenesis that facilitate random truncation, insertion, deletion and substitution of a target polynucleotide using partially random-sequenced oligonucleotides. The methods can be employed to generate random libraries of polynucleotides and polypeptides which can be screened for clones that exhibit desired biological characteristics (e.g. stability, solubility, catalytic activity, catalytic specificity, binding affinity and specificity, etc.) under specified environment.
In one embodiment, a method is provided for producing mutagenized polynucleotide from a target sequence comprising:
(a) taking a sample comprising
(i) a target sequence including a section to be mutagenized,
(ii) a first primer where the first primer includes a first fixed sequence and a first unknown sequence 3xe2x80x2 to the first fixed sequence, and
(iii) a second primer where the second primer includes a second fixed sequence that differs from the first fixed sequence, and a second unknown sequence 3xe2x80x2 to the second fixed sequence;
(b) performing one or more cycles of primer extension amplification on the sample in the presence of at least one polymerase such that the first primer is extended relative to the target sequence; and
(c) performing one or more additional cycles of primer extension amplification on the sample such that the second primer is extended relative to the first primer that was extended in step (b) to form the mutagenized polynucleotide.
According to the above method, the first and the second primer may optionally include a portion which is complementary to the target sequence.
Also according to the above method, the first and second unknown sequences refer to the use of a library of first primers and a library of second primers where the first and second unknown sequences vary within the respective libraries of first and second primers. As a result, the sequence of the first and second unknown sequences that are employed in the method are not known in advance to the person performing the method.
In another embodiment, a method is provided for producing a library of mutagenized polynucleotides from a target sequence comprising:
(a) taking a sample comprising
(i) a target sequence including a section to be mutagenized,
(ii) a library of first primers where the first primers include a first fixed sequence and a first unknown sequence 3xe2x80x2 to the first fixed sequence, the first unknown sequence varying within the library of first primers, and
(iii) a library of second primers where the second primer include a second fixed sequence that differs from the first fixed sequence, and a second unknown sequence 3xe2x80x2 to the second fixed sequence, the second unknown sequence varying within the library of second primers;
(b) performing one or more cycles of primer extension amplification on the sample in the presence of at least one polymerase such that a member of the library of the first primers is extended relative to the target sequence; and
(c) performing one or more additional cycles of primer extension amplification on the sample such that a member of the library of the second primers is extended relative to the first primer that was extended in step (b) to form the library of mutagenized polynucleotides.
According to the above method, each of the first and second primers in the library may optionally include a portion which is complementary to the target sequence.
According to the above method, since the first and second unknown sequences vary within the respective libraries of first and second primers, the sequence of the first and second unknown sequences that are employed in the method are not known in advance to the person performing the method.
In yet another embodiment, a method is provided for producing a library of mutagenized polynucleotides from a target sequence comprising:
(a) taking a sample comprising
(i) a target sequence including a section to be mutagenized,
(ii) a library of first primers where the first primers include a first fixed sequence and a first unknown sequence 3xe2x80x2 to the first fixed sequence, the first unknown sequence varying within the library of first primers, and
(iii) a library of second primers where the second primer includes a second fixed sequence that differs from the first fixed sequence;
(b) performing one or more cycles of primer extension amplification on the sample in the presence of at least one polymerase such that a member of the library of the first primers is extended relative to the target sequence; and
(c) performing one or more additional cycles of primer extension amplification on the sample such that a member of the library of the second primers is extended relative to the first primer that was extended in step (b) to form the library of mutagenized polynucleotides.
According to this embodiment, the second fixed sequence of the second primer may be substantially homologous to a portion of the target sequence, such that the resulting library of of mutagenized polynucleotides are amplification products of the target sequence truncated at one end.
Methods are also provided for producing mutagenized polypeptides from a target sequence by forming a library of mutagenized polynucleotides according to any of the above methods and expressing polypeptides from the library of mutagenized polynucleotides.
According to any of the above methods, the target sequence may have a sequence which is known or partially or completely unknown.
According to any of the above methods, the target sequence may have a sequence which is known or partially or completely unknown. Optionally, the target sequence is a DNA sequence encoding a portion of an antibody such as the complementarity-determining region (CDRs, e.g. the variable regions of the heavy chain or the light chain), and more preferably a single chain antibody including the variable regions of the heavy chain and the light chain of an antibody.
According to any of the above methods, the target sequence may be a member of a library of DNA sequences that have conserved regions and hypervariable regions. For example, the target sequence is a member of a library of DNA sequences encoding an antibody library, in particular, a single chain antibody library.
Also according to any of the above methods, each of the first and second fixed sequences preferably include at least one restriction site, which facilitates subcloning in an expression vector, and the ultimate synthesis of RNA and polypeptides from the polynucleotides produced according to the methods. The synthesis of RNA and polypeptides can be performed in vitro or in vivo via in transformed or transfected host cells.
Also according to any of the above methods, one of the first and second fixed sequences may include a xe2x80x9cstartxe2x80x9d codon sequence (e.g. ATG or GTA) and the other of the first and second fixed sequence may include a sequence encoding one or more translation stop codons.
Also according to any of the above methods, the lengths of the first and second primers may optionally be between 10 and 80 nucleotides, preferably between 12 and 60 nucleotides and more preferably between 15 and 40 nucleotides. Optionally, the first and second primers may include one or more inosines at the 3xe2x80x2 end penultimate and ultimate positions.
Also according to any of the above methods, the unknown sequences are preferably at least partially unknown. More specifically, a first portion of the unknown sequences may be fixed within the library and a portion may vary within the library. In a preferred embodiment, the unknown sequence further includes a sequence encoding one or more specific amino acid residues such as the conserved amino acid residues of the protein encoded by the target sequence.
The unknown sequences of the first and second primers may optionally be synthetic and may be synthesized by randomly incorporating A, T, G, C, I or U.
The first and second unknown sequences in the above methods preferably have a length between 3 and 70 nucleotides, more preferably between 4 and 50 nucleotides, and most preferably between 5-15 nucleotides.
Also according to any of the above methods, the sample preferably includes the first primer at a concentration approximately equivalent to the concentration of the second primer. The concentrations of the first and second primers are each independently preferably between about 0.01 and 100 xcexcM, more preferably between about 0.1 and 10 xcexcM, and most preferably between about 0.2-1.0 xcexcM.
Also according to any of the above methods, the sample preferably includes salts such as NaCl and Mg2+ or any other components which facilitate desirable reaction characteristics.
Also according to any of the above methods, at least a portion of the multiple cycles of primer extension polymerase amplification may be performed such that extension by the polymerase is at least partially performed at a temperature below 70xc2x0 C. for at least 30 sec.
Also according to any of the above methods, at least a portion of the multiple cycles of primer extension polymerase amplification may be performed such that extension by the polymerase is at least partially performed at a temperature below 60xc2x0 C. for at least 30 sec.
Also according to any of the above methods, at least a portion of the multiple cycles of primer extension polymerase amplification may be performed such that extension by the polymerase is at least partially performed at a temperature below 50xc2x0 C. for at least 30 sec.
Also according to any of the above methods, at least a portion of the multiple cycles of primer extension polymerase amplification may be performed such that extension by the polymerase is performed by heating the amplification reaction mixture from a temperature between about 30xc2x0 C. to 60xc2x0 C. to a temperature between about 65xc2x0 C. to 75xc2x0 C. for at least 30 sec.
Also according to any of the above methods, at least a portion of the multiple cycles of primer extension polymerase amplification may be performed by ramping the temperature about 30xc2x0 C. to 60xc2x0 C. to a temperature between about 65xc2x0 C. to 75xc2x0 C. for at least 1 min.
Also according to any of the above methods, at least a portion of the multiple cycles of primer extension polymerase amplification may be performed by ramping the temperature about 30xc2x0 C. to 60xc2x0 C. to a temperature between about 65xc2x0 C. to 75xc2x0 C. for at least 1 min, wherein the incubation time after each ramp is shorter than that of the previous ramp.
Also according to any of the above methods, it is noted that the first and second primer may anneal to any portion of the target sequence. After at least one cycle of primer extension amplification, a truncated sequence of the target sequence is synthesized. When libraries of the first and second primers are included in the amplification reaction, truncated sequences of various lengths can be synthesized after at least one cycle of primer extension amplification.
Also according to any of the above methods, it is noted that the random sequence included in the first and second primer may anneal to the target sequence to form an imperfect double-stranded sequence during the at least one cycle of primer extension amplification. Such an imperfect double-stranded sequence may include mismatches, bulges or loops which may result in insertion, deletion and substitution of the target sequence.
Also according to any of the above methods, it is noted that the library of mutagenized polynucleotides formed may include homologs of the truncated sequences of the target sequence which include at least two sequences from the library of the first or second primers.
Also according to any of the above methods, it is noted that the library of mutagenized polynucleotides formed may include homologs of the truncated sequences of the target sequence where at least two portions of the truncated sequences of the target sequence have been deleted.
Also according to any of the above methods, it is noted that the library of mutagenized polynucleotides formed may include homologs of the target sequence where at least a portion of the mutagenized polynucleotides have been mutagenized at one or more separate locations on the target sequence.
The present invention also relates to reagents for performing the various methods of the present invention. For example, the reagents may be a first primer, a library of first primers, a second primer, and a library of second primers. The present invention may also include other reagents disclosed herein.
The present invention also relates to kits for performing the various methods of the present invention. The kits may include any two or more reagents employed in these methods, including, for example, a first primer, a library of first primers, a second primer, a library of second primers, one or more polymerases, and other reagents and buffers which may be used to employ these methods. In one embodiment, the kit includes a first primer and a second primer. In another embodiment, the kit includes a library of first primers and a library of second primers.