Since its development, the polymerase chain reaction (PCR) has revolutionized the field of molecular biology. PCR is a process by which any DNA sequence of interest (of reasonable size) located between two known sequences can be amplified using two primers homologous to these two known sequences, one primer being a forward primer and the other primer (binding to the complementary strand) being a reverse primer. PCR allows to easily clone any sequence of interest, provided that two known flanking sequences are available. No knowledge of the sequence located between the two primer binding sites is required.
A related but different problem in molecular biology is the identification of unknown sequences that flank a region of known nucleotide sequence. PCR cannot be used directly to amplify a fragment containing the known and unknown sequence, since the sequence at only one end of the fragment to be amplified is known for primer design. Examples of such problems include the determination of the nucleotide sequence flanking a stably integrated transgene (for example in a T-DNA), the nucleotide sequence flanking a transposon insertion, or the nucleotide sequence of the variable region of an antibody for which only the isotype is known. The first two examples refer to DNA molecules, while the third example refers to RNA molecules. This difference is not important since RNA molecules can be converted to cDNAs by reverse transcription using a primer binding at the known region.
Over the years, many protocols have been developed to provide solutions for the identification of unknown sequences that flank known sequences. Many of these protocols use either attaching an adaptor to the end of the unknown sequence or to use PCR using one or several unspecific primers (primers containing both a known constant sequence, an adaptor sequence, followed by random or unspecific variable sequence) that binds randomly to DNA, including to sequences in the vicinity of the known sequence. Two to three PCRs are then performed using combinations of adaptor primers and known region-specific primers (or gene-specific primers, abbreviated herein by “gsp”). After the first PCR, both specific and non-specific products are typically obtained. The ratio of specific products increases in the second PCR performed using an adaptor primer and a nested known sequence-specific primer, but many unspecific products are still present. Identification of the unknown sequence can be done by sequencing the amplified product. However, if several specific products are expected to be amplified in the same amplification (for example a genome might contain several transgenes or several transposons, or an RNA population might contain a large number of different antibodies), direct sequencing will not be useful. Rather, the amplified product will have to be cloned, and recombinant plasmids individually sequenced.
There are many approaches for cloning of PCR products. Cloning is typically done by ligating together DNA fragments that have been prepared by digestion with type II restriction enzymes. This process usually requires several steps: (1) the plasmids (or PCR products) containing the fragment to be subcloned or the recipient vector are digested with one or two restriction enzymes, (2) the digested fragments are separated using gel electrophoresis, and then the desired fragments extracted from the gel, (3) the purified vector and insert fragments are ligated together using a DNA ligase such as T4 DNA ligase, (4) the ligation is transformed in competent E. coli cells.
One limitation of standard cloning techniques is that the restriction sites chosen for cloning must not be present within the fragment to be cloned. Since for the problem discussed here, a part of the sequence to be cloned is not known, it can be expected that using restriction enzymes for cloning will result in the loss of part of the sequence of some of the amplified fragments, or even completely prevent cloning of some of the products.
An entirely different cloning strategy has been developed that does not require restriction enzymes: this strategy relies on generating DNA ends that are single-stranded both on the DNA insert and the vector. Complementary single-stranded overhangs on the DNA insert and in the cloning vector will anneal. If the annealed region is more than 12-15 nucleotides, and the two ends of a linear insert anneal with the two ends of a linearized piasmid vector, ligation-free cloning can be achieved. Host cells such as E. coli cells transformed with the annealed product will repair the nicks, leading to the formation of a circular plasmid capable of replicating (Li & Evans, 1997, Nucleic Acids Res., 25, 4165-4166).
One of the first cloning methods that were developed based on this principle was UDG cloning. PCR amplification of a DNA fragment was performed using primers containing an arbitrary 12 nucleotide extension containing at least 4 uracils. The vector was also amplified by PCR using primers containing a 12 nucleotide extension complementary to the extension in the primers for the insert. After PCR amplification, insert and vector were treated with uracil DNA glycosylase, which opens the DNA extensions in the vector and insert (UDG catalyzes the hydrolysis of the N-glycosylic bond between the uracil and sugar), creating single-stranded DNA ends. After annealing vector and insert, the mix was transformed in E. coli using chemically competent cells. E. coli then trims and repairs the ends at the junction site (Nisson P. E., Rashtchian A., & Watkins P. C., 1991, PCR Methods Appl., 1:120-123).
The advantage of this strategy is that it is efficient and results in very few empty vector constructs. The drawbacks are that (1) primers containing uracil are expensive, (2) the entire vector has to be amplified by PCR, and (3) the extension of primers has to contain 4 uracils and 4 adenines (complementary to the uracils in the extension of the complementary fragment) and therefore cannot consist of any 12 nucleotide sequence of choice.
Another strategy developed even earlier, ligation-independent cloning (LIC), was developed based on the 3′ to 5′ exonuclease activity of T4 polymerase (Aslanidis C. & de Jong, P. J., 1990, Nucleic Acids Res., 18:6069-6074). As with UDG cloning, a PCR fragment is amplified with two primers that contain 12 nucleotide extensions, these extensions lacking one of the 4 nucleotides, for example G. This amplified fragment is then treated with T4 polymerase in a buffer that contains dGTP but none of the other nucleotides. The 3′ to 5′ exonuclease activity of T4 polymerase removes all nucleotides until the first G encountered, at which point it stops since an equilibrium is obtained between removal and incorporation of this nucleotide. This treatment therefore creates a 12 nt (nt stands for nucleotide herein) single-stranded extension that can be used for annealing of vector and insert. The vector contains ends that are compatible with the insert and are made by digestion with T4 polymerase in the presence of dCTP.
In a similar work, DNA and insert were treated with T4 polymerase, but without addition of any nucleotide (Yang et al., 1993, Nucleic Acids Res., 21:1889-1893). Since the single-stranded extension might be longer than 12-15 nucleotides, the annealed mix was treated by adding all 4 deoxynucleotides and T4 polymerase to fill the single-stranded gaps.
Another procedure similar to both techniques described above (sequence and ligation-independent cloning: SLIC) is described in US 2007/0292954. The procedure is similar to the work of Yang et al., but gaps in the annealed heteroduplex are not filled with DNA polymerase. This work also describes the assembly of up to 10 fragments in a vector.
These three related strategies allow cloning without using standard restriction sites at the junction sites between vector and insert. However, non-specific amplification products obtained when one of the primers anneals to a random sequence non-specifically or primer dimers can be cloned in addition to the specific products. There is therefore a need for a cleaner cloning strategy.