A. Field Of The Invention
This invention relates to recombinant-DNA technology. More specifically, it relates to linkers which can be used to insert specified restriction sites into a gene sequence, and to methods which permit the use of these linkers.
B. Description Of The Art
The inventor expresses his gratitude to The Helen Hay Whitney Foundation for its financial support, and to Johns Hopkins University for the use of their laboratories in connection with this invention.
Enzymatic cleavage and joining of DNA is of central importance to the generation of recombinant-DNA molecules. However, a limiting factor in many cloning strategies was that researchers had to rely on only those restriction sites provided by nature. This was a problem because in some cases appropriate restriction sites were not present at the right locations. Moreover, in other cases the desired restriction site was present at too many points in the gene sequence, and thus could not be selectively used.
In order to understand an approach that the prior art chose to try to solve these problems, it is important to have an understanding of the terms "nucleotide" and "oligonucleotide". "Nucleotides" are organic compounds having a nitrogenous base, a five carbon backbone (usually a sugar) and a phosphoric acid group. Many nitrogenous bases are derived from purine and pyrimidine such as uracil ("U"), thymine ("T"), cytosine ("C"), 5-methyl cytosine, 5-hydroxymethyl cytosine, adenine ("A"), guanine ("G"), 2-methyladenine, 1-methylguanine. "Nucleosides" are usually N-glycosides of these pyrimidine or purine bases. Among these are the ribonucleosides which contain D-ribose as the sugar component, and the 2' deoxyribonucleosides which contain 2'-deoxy-D-ribose as the sugar component. The most prevalent nucleosides are adenosine, guanosine, cytidine, uridine, 2' deoxyadenosine, 2' deoxyguanosine, 2' deoxycytidine, and 2' deoxythymidine.
The names for the corresponding "nucleotides" are the same except that "5'-phosphoric acid" is added to reflect the presence of a phosphate group. The nucleotides are also known by their abbreviations AMP, GMP, CMP, UMP, dAMP, dGMP, dCMP, and dTMP. These nucleotides can also occur as the 5' di-phosphates and the 5' triphosphates (e.g. ADP, ATP). As used herein, the term "nucleotide" is meant to refer to all of these variants, as well as similar variants such as where the nitrogenous base or the sugar backbone is further modified.
"Oligonucleotides" are compounds made by linking a relatively small number (e.g. less than twenty) nucleotides together in a sequence. The term is also meant to include compounds where the 5' end of the oligonucleotide is OH rather than phosphate, and other similar variants. The sequence of an oligonucleotide is normally labeled by reference to the sequence of its nitrogenous bases. The five most prevalent bases are those that have been abbreviated above by the letters A, G, T, C, and U.
To solve the problems described above, the art developed eight and ten base double stranded oligonucleotides (also known as "adaptors") that had a base sequence recognized by the desired restriction enzyme. See e.g. F. Heffron et al, 75 P.N.A.S. USA 6012-6016 (1978). (The disclosure of this reference and all other articles cited herein are incorporated by references as if fully set forth below.) These Heffron et al. adaptors were ligated into blunt ends randomly produced by DNAase I, thereby converting these sites to the desired specificity.
In writing out the sequence of a double stranded eight base oligonucleotide adaptor, it was conventional to abbreviate the oligonucleotide by writing a first strand 5' to 3' such as 5'-CCCCGGGG-3', and then writing underneath it in the reverse direction (3' to 5') the complementary strand. For example: EQU 5'-CCCCGGGG-3' EQU 3'-GGGGCCCC-5'
In this regard, G is known to be complementary to C, T is known to be complementary to A, and A is known to be complementary to U.
One problem with the above described approach is that amino acids are coded for in three base groupings (e.g. CCC-CGG-GG-). Thus, an eight or ten sequence adaptor has extra bases. As a result, if one inserts such an adaptor into a gene sequence, the insertion will be likely to cause frame shifts and distortions.
The art therefore developed six base ("hexameric") double stranded adaptors which did not have these problems. However, in view of the very short length of these adaptors, these prior art adaptors were designed so as to be completely complementary to themselves (e.g. ##STR1## Because of this, they were not useful for very important restriction sites which did not present "blunt" ends after cleavage (unless one was willing to first alter the restriction site ends). Moreover, if not inserted at exactly the right place in the sequence, these adaptors could cause the protein on one or both sides of the restriction site to lose or change a coded amino acid, with resulting distortions.
Other problems in the art included that once a double stranded adaptor had been formed, the adaptor would be suitable only for a site of one structure. Thus, one had to inventory many types of adaptors in the laboratory. Further, in order to cause prior art adaptors to ligate effectively, one often had to use a large excess of the adaptor. Then, one either had to purify away the excess, or waste costly restriction enzyme on eating up the excess.
The state of the prior art can be appreciated with reference to three recent articles. In one, J. D. Boeke, 181 Mol. Gen. Genet. 288-291 (1981) a two codon (six base) insertion was achieved by first cutting with an enzyme to leave two base overhanging "sticky" ends, then filling in both strands with a polymerase to gain two bases, then adding a very large adaptor having a four base segment of interest at one end, and then chopping off everything but the four bases. This method is more expensive and less efficient than the present invention, and its application is limited to very specific sequences.
The second article is J. Stone et al., 37 Cell 549-558 (1984) (not prior art) where a convoluted and inefficient process for inserting two codons was reported. Multiple twelve base adaptors were inserted into blunt end sites. Most of the excess DNA adaptor was then cut away and the DNA religated. Analysis of clones indicated multiple adaptor insertion. These clones had to be reopened and trimmed (yet again) to leave a single insertion (6 bases), and then recircularized. This method is also apparently limited to blunt end sites.
In the third article, J. Vieira et al., 19 Gene 259-268 (1982) a method is provided for inserting a restriction site of twelve bases. As before, the method is not general, and is limited to making a four amino acid insertion.
Thus, it can be seen that a need has existed for an improved means of converting restriction sites to a selected restriction enzyme specificity while creating only a six base insertion.