Methods of preparing DNA of mixed composition are becoming increasingly important in the study of biomolecular function as well as in the search for substances with new and useful properties. As DNA synthesis technology improved in the early 1980's, it became feasible to perform multiple syntheses as a means of generating mixtures of oligonucleotides. In principle, large and diverse collections could be made in multiple syntheses. In practice, several investigators realized that large numbers of oligonucleotides could be generated in a single synthesis by coupling mixtures of mononucleotides, instead of unique monomer building blocks. The complexity of the resulting collection or "library" of oligonucleotides is determined by the number of monomers coupled, and the number of sites at which mixtures of monomers are introduced.
Oligonucleotides of mixed composition are increasingly being used in protein mutagenesis for the study of structure and function. By expressing DNA sequences of mixed composition, a corresponding library of mutant proteins is generated. Allied with appropriate screening techniques, such libraries can be searched for substances with altered properties, and are therefore useful in the study of biomolecular function. The most general class of mutagenesis methods employs oligonucleotides based on the sequence of the wild-type gene, and incorporating modifications that will eventually give rise to any desired amino acid sequence changes. These methods were recently reviewed in the August 1991 issue of Current Opinion in Structural Biology.
Virtually all genetic studies of protein structure and activity employ substitution mutations: one or several amino acid side chains are replaced, but the length of the protein and the spacing of residues are conserved. In order to facilitate the generation of large numbers of substitution mutations in a single experiment, a number of prior art techniques have been developed (for review, see Botstein, D. & Shortle, D. (1985) Science 229, 1193-1201 and Zoller, M. J. (1991) Curr. Opin. Struct. Biol. 1, 605-610), the most popular of which involve the chemical synthesis of complex mixtures of oligonucleotides which are used either as mutagenic primers for DNA synthesis (see Hermes, J. D., Parekh, S. M., Blacklow, S. C., Koster, H., & Knowles, J. R. (1989) Gene 84, 143-151) or as mutagenic duplex fragments for ligation to restriction fragments (see Matteucci, M. D. & Heynecker, H. L. (1983) Nucl. Acids Res. 11, 3113-3121). To generate the required single amino acid substitutions, each monomer used for oligonucleotide synthesis is "doped" with small amounts of the three non-wild type mononucleotides. In principle, this method can provide every possible nucleotide substitution in a gene segment in a single experiment. Since the distribution of nucleotide substitutions will follow Poisson statistics, two mononucleotide replacements in the same codon will be relatively rare at levels of doping that give one or just a few amino acid substitutions per mutant gene. Consequently, for practical purposes, this strategy for generating mixtures of mutagenic oligonucleotides can be expected to yield only one third of all possible amino acid substitutions, with the types of amino acid substitutions induced at a particular position being determined by the sequence of the wild-type codon. It should also be noted that prior art monomer doping of oligonucleotides cannot be used to induce other types of changes in DNA sequence, such as insertions or deletions.
A related application of mixed DNA synthesis uses vast collections of diverse oligonucleotides in processes directed at discovering substances with new and useful properties. Libraries of peptides (Cwirla, S. E., Peters, E. A., Barrett, R. W., & Dower, W. J. (1990) Proc. Natl. Acad. Sci. USA 87, 6378-6382), RNA (Tsai, D., Kenan, D., & Keene, J. (1992) Proc. Natl. Acad. Sci., USA 89, 8864-8868) and DNA (Bock, L., Griffin, L., Latham, J., Vermaas, E., & Toole, J. (1992) Nature 355, 564-566), all of which were generated from collections of oligonucleotides prepared by mixed monomer synthesis, have been screened to locate molecules which bind to particular target substances. In this approach, the utility of peptide libraries is critically dependent on the way in which the oligonucleotide mixture is generated. This arises because of the degeneracy of the genetic code: amino acids are not represented by equal numbers of trinucleotide codons, some amino acids being encoded by only one codon, some by as many as six. Therefore, although oligonucleotides prepared from equal mixtures of all four monomers may contain each of the 64 trinucleotides, the encoded amino acids are represented unevenly, and "stop" codons are unavoidably generated. As a result, amino acids which are encoded by the largest number of codons are over-represented at the expense of those encoded by only one or two codons. By way of example, if a particular type of mutation is desired (for example substitution of only hydrophobic amino acids), the resulting library will contain a high proportion of undesired species. This drawback is particularly critical as the number of positions at which substitutions are made increases.
In an attempt to improve the efficiency of synthesizing mixed DNA sequences for preparation of peptide and protein libraries, schemes have been introduced in which monomers are mixed in a rational manner. For example, Youvan has calculated optimal mixtures of monomers for specifying particular subsets of amino acids (Arkin, A. P., & Youvan, D. C. (1992) Bio/Technology, 10, 297-300). Use of these mixtures increases the proportion of desired amino acids in a peptide or protein library. It does not, however, preclude generating undesired substitutions arising from particular combinations of monomers. Even with this method, the desired substitutions are usually a fraction of those introduced at each site. Consequently, as the number of sites altered increases, the proportion of desired mutants in the library decreases.
In recognition of the problems associated with the use of mixtures of monomers, Huse has described a method (disclosed in WO 92103461) in which DNA synthesis is performed so as to emulate multiple syntheses. This is achieved by carrying out the synthesis on multiple solid supports which can be mixed and re-divided when necessary. In this way, diverse mixtures of oligonucleotides can be made using monomers, and the problems associated with the degeneracy of the genetic code avoided. The method has two disadvantages: (i) for each synthesis, labour intensive dividing and re-mixing of support material is required, and (ii) the total number of different sequences which can be synthesized is limited by the number of physically separable supports used in the synthesis, which is typically of the order of 108.
In summary, existing methods of synthesis of multiple DNA sequences suffer several disadvantages:
1. Although every possible nucleotide substitution can be generated using oligonucleotides doped with mixed monomers, contiguous two and three mononucleotide substitutions are extremely uncommon. This is disadvantageous with regard to protein mutagenesis since each amino acid in a protein is specified by three contiguous nucleotides, and this strategy can efficiently generate only approximately one third of all possible amino acid substitutions for each wild-type amino acid in a single synthesis.
2. No strategy involving the synthesis of mixtures of oligonucleotides, as taught by the prior art, allows for the generation of mutant proteins with insertions of one or more codons at more than a single site in the synthesized oligonucleotide.
3. The degeneracy of the genetic code means that any mixture of mononucleotides used in mixed DNA synthesis unavoidably gives oligonucleotides containing undesired codons or does not provide all desired codons. This problem becomes critical as the number of positions at which mixtures are introduced increases.
4. Methods which simulate multiple syntheses are labour intensive, and the diversity of sequences which can be generated is limited by the number of physically separable supports used.
The present invention provides solutions to these problems and enables the preparation of mixed oligonucleotides with a multitude of applications in modern molecular biology. For example, mixed oligonucleotides prepared according to the present invention can be used to generate genes encoding peptide and/or protein libraries. Additionally, trinucleotides are useful in preparing degenerate primers for the polymerase chain reaction. The invention is particularly useful for protein mutagenesis; single-stranded mutagenesis primers and double-stranded "cassettes" encoding any combination of amino acids can be readily prepared by applying the method disclosed herein. The present invention also enables substitution, insertion, and deletion mutagenesis.
The method relies on the use of pre-synthesized oligonucleotides and additionally, specially protected mono- and oligonucleotides, which are compatible with the most efficient methods of DNA synthesis. Trinucleotide building blocks have been used previously in DNA synthesis (see, for example, Hirose, T., Crea, R., & Itakura, K. (1978) Tet. Lett., 2449-2452; Miyoshi, K., Miyake, T., Hozumi, T., & Itakura, K. (1980) Nucl. Acids Res., 8, 5473-5489) when stepwise coupling yields were low and it was more desirable to incorporate the largest possible oligonucleotide blocks at each step. This earlier work differs from the present invention as (i) it relied on inefficient and outdated phosphodiester chemistry and would therefore not allow multiple couplings, (ii) it was not directed at generating diverse and useful collections of mixed oligonucleotides, and (iii) it did not enable insertion and deletion mutagenesis.