The present invention relates generally to the manipulation of genetic materials and more particularly to the use of recombinant procedures to selectively modify double-stranded DNA sequences, especially cDNA sequences, for storage and incorporation into expression vectors.
The development of specific DNA sequences for insertion into DNA vectors in an attempt to secure microbial expression of polypeptides encoded thereby is accomplished by a variety of techniques. Three principal methods for obtaining DNA sequences are: (1) the chemical manufacture of DNA sequences; (2) the isolation of double-stranded DNA sequences from donor DNA; and (3) the in vitro synthesis of double-stranded DNA sequences by reverse transcription of messenger RNA isolated from donor cells. The latter methods involve formation of a DNA complement of messenger RNA and are referred to as "cDNA" or "copy DNA" methods.
Chemical manufacture of polypeptide-specifying DNA sequences is clearly a method of choice, provided the amino acid sequence of the polypeptide to be microbially expressed is known and efficient synthetic procedures can be applied in the assembly of the sequence. (See, e.g., Stabinsky, U.S. patent application Ser. No. 375,493, filed May 6, 1982; and Alton, et al., U.S. patent application Ser. No. 375,494, filed May 6, 1982).
When double-stranded DNA sequences are chemically manufactured, it is seldom the case that significant difficulties are encountered either in storing the manufactured sequences in a convenient vector (such as the E.coli DNA plasmid, pBR322) or in inserting the sequences into functional expression vectors wherein their microbial transcription into mRNA is placed under the control of selected promoter/regulator sequences, transcription termination sequences, and the like. This is so because designs for chemical manufacture of polypeptide synthesis-directing DNA sequences can rather easily be adapted to incorporate initial and terminal sequences specifying, e.g., microbial mRNA translation initiation sequences, microbial mRNA translation termination sequences, and DNA sequences providing recognition sites for restriction endonuclease enzyme cleavage which will facilitate storage and/or insertion of the sequence into selected expression vectors.
Where the sequence of the polypeptide to be expressed is unknown, and in cases when chemical manufacturing procedures are not efficiently practiceable, resort must be made to donor DNA isolation and cDNA procedures to obtain desired DNA sequences. Such sequences are seldom in condition for ready incorporation into microbial expression vectors. It is almost invariably the case that the isolated sequence must be processed at its 5' end either to delete base pairs coding for an undesired polypeptide leader region or to insert base pairs coding for microbial translation initiation. Similarly, where the isolated sequence includes at its 3' end base pairs coding for an undesired terminal polypeptide region, these must ordinarily be deleted and an appropriate translation termination sequence must be inserted. Further, it is almost invariably the case that DNA sequences will need to be added to the isolated sequence to allow the storage (and amplification) in a vector or the insertion of the isolated sequence into an expression vector in the correct reading frame and location relative to a promoter/regulator region and/or transcription termination sequence.
A number of procedures have been developed in the art for performing selective modifications on double-stranded sequences. U.S. Pat. No. 4,342,832, for example, describes construction of cloning vehicles wherein a cDNA gene coding for a desired polypeptide is placed under the control of a selected expression promoter. Briefly stated, the cDNA sequence is analyzed for the presence of a unique restriction endonuclease recognition site near the 5' end and cleaved at such a site, effectively deleting undesired sequences (5' to the structural gene) along with at least a part of the desired polypeptide coding region. A chemically manufactured DNA sequence is then employed as a replacement for the lost polypeptide coding region and this manufactured sequence will ordinarily include a translation initiating sequence and desired recognition sites to facilitate incorporation into an expression vector. The same types of manipulations can be performed to secure modification of the 3' end of the cDNA sequence.
In the absence of one or more unique recognition sites in the DNA sequence to be modified, of course, the procedures of U.S. Pat. No. 4,342,832 cannot be performed. Further, even where such recognition sites are available, practice of the methods may require chemical synthesis of very long double-stranded DNA sequences as replacements for deleted polypeptide coding sequences and may therefore involve nearly as much work as total gene synthesis.
Of interest to the background of the invention are those publications treating the use of single-stranded DNA "primers" for isolating selected cDNA sequences and for effecting nucleotide base changes in central and terminal portions of double-stranded DNA sequences. See, e.g., Montell, C., et al., Nature, 295: 380-384 (1982): Gillam, S., et al., Gene, 12: 129-137 (1980); Goeddel, D., Nucleic Acids Research, 8: 4057-4074 (1980); and Hu, N., et al., Gene, 17: 271-277 (1982).
Of particular interest to the background of the invention are recently-published European Patent Application Nos. 054330 and 054331, which illustrate use of primers in methods for modifying double-stranded cDNA sequences. According to the illustrated methods, the DNA sequence coding for "mature" thaumatin is isolated from the central portion of a larger sequence coding for preprothaumatin by a series of manipulations involving use of single-stranded primers. The single-stranded primers are hybridized to single-stranded portions of the mature thaumatin sequence which have been isolated from the preprothaumatin double-stranded sequence by, e.g., denaturation. DNA polymerase and S1 endonuclease digestion are employed to provide a sequence having, at its 5' end, codons specifying the initial amino acids of mature thaumatin and, at its 3' end, a translation termination codon ##STR1## In order to incorporate such a modified sequence into an expression vector, it must be immediately associated by blunt-end ligation with one or more synthetic, double-stranded, "linker" DNA sequences providing a translation initiation codon and/or a recognition site suited for proper insertion into the expression vector. (See, FIG. 4 of 054330 and FIG. 13 of 054331). Significant disadvantages attend use of the procedures illustrated. First, no means is provided for developing a selected terminal double-stranded sequence apart from performing a separate reaction with a linker. Second, no means is provided for "storage" or amplification of the modified double-stranded DNA sequence. Each modifying procedure must therefore be followed at its completion by association with a synthetic linker. If insertion into two different expression vectors is desired, for example, the modification procedures must essentially be duplicated in their entirety.
That the above-noted examples of procedures for selective modification of double-stranded DNA sequences are not readily and easily applied is evidenced by the fact that numerous cDNAs coding for commercially important polypeptides have been isolated and sequenced but have not as yet been successfully employed to effect microbial expression of the polypeptides. As one example, Sasavage, et al., J. Biol.Chem., 257: 678-681 (1982) reports on the preparation, isolation, and sequencing of a double-stranded DNA coding for the 199 amino acid polypeptide sequence of bovine prolactin along with an approximately 30 amino acid untranslated leader region. However, the authors did not report microbial expression of bovine prolactin.
There exists, therefore, a substantial need in the art for improved methods and materials for selective modification of double-stranded DNA sequences, especially cDNAs, allowing for their storage and their incorporation into expression vectors. Such methods could be illustratively applied to readily secure the microbial expression of commercially significant polypeptides such as bovine prolactin.