The enzymatic ligation of pairs of oligonucleotides bound to a target nucleic acid is widely known. It is generally thought that the oligomers must each be of a minimum length to be ligated efficiently. Recent work has shown this minimum length to be about 6-8 bases (C. E. Pritchard and E. M. Southern, Nucl. Acids Res., 25, 3403-3407 (1997)). It is generally thought that ligation of oligonucleotides shorter than about 6 bases is not possible.
Under certain conditions, primer independent ligation can be accomplished using oligomers of at least six bases long. In this manner, PCR primers were prepared in situ from concatenated groups of a small number hexamers, heptamers or octamers (T. Kaczorowski and W. Szybalski, Gene, 179, 189-193 (1996); L. E. Kotler, D. Zevin-Sonkin, I. A. Sobolev, A. D. Beskin and L. E. Ulanovsky, Proc. Natl. Acad. Sci. USA, 90, 4241-4245 (1993)). Such ligation in the absence of a primer is undesirable in the present methods and must be avoided. The success in replicating a polynucleotide sequence in a controlled and defined manner rests in knowing the point of origination of the newly synthesized strand.
Nucleic acids can be synthesized from a template, primer and nucleotide triphosphates (NTPs) by the action of a polymerase action. Labels can be incorporated by substituting a percentage of labeled NTPs. The ability to achieve a high degree of label incorporation is limited and the precise spacing of labels is not controllable.
The polymerase chain reaction (PCR) is a method of amplifying the amount of a polynucleotide by the use of a primer complementary to each strand which span the region to be replicated. Nucleic acid synthesis proceeds by extension of each primer with a polymerase and the four dNTPs. Thermal cycling allows multiple copies of the template to be synthesized, approximately doubling the quantity of amplicon in each cycle. A variant termed Ligase Chain Reaction (LCR) involves the ligation of two pairs of oligonucleotides with a ligase enzyme to replicate the sequence of interest (D. Y. Wu and R. B. Wallace, Genomics, 4, 560-569 (1989)). The two oligonucleotides to be ligated constitute the entire length of the strand. Ligation of a large number of small oligomers to a primer to replicate a nucleic acid has not been achieved to the best of Applicant's knowledge.
Methods of providing sequence information using oligonucleotide ligation are disclosed in U.S. Pat. No. 5,750,341 and U.S. Pat. No. 5,770,367 and a publication (S. Dubiley, E. Kirilov, Y. Lysov and A Mirzabekov, Nucl. Acids Res., 25, 2259-2265 (1997)). The reported methods differ fundamentally from those of the present invention in requiring that oligomers be ligated one at a time and the sequence be analyzed after each step. These methods are therefore far more laborious than those of the present invention.
Methods of Labeling Nucleic Acids--Present methods of labeling nucleic acids or oligonucleotides include the tailing method, random primed labeling, nick translation, the labeled branched DNA and end labeling using a labeled primer. Each method suffers disadvantages in certain applications. Use of an end labeled primer extended by PCR with unlabeled bases leads to only one or a few labels per product nucleic acid.
The tailing method incorporates an indeterminate and uncontrolled number of labels by appending a tail of noncomplementary bases onto the nucleic acid of interest. This adds many additional bases, which not only adds expense, but may interfere with hybridization and lead to nonspecific binding. In addition it is not readily applicable to the synthesis of short nucleic acids or oligonucleotides since the length of the tail could exceed the length of the sequence of interest.
The random prime method, applicable to the labeling of long nucleic acids, uses a mixture of primers which are extended by a polymerase with a mixture of labeled and unlabeled bases. The number of bases which can be incorporated is variable and arbitrary in number. A mixture of numerous nucleic acid fragments of varying lengths are produced from both strands. Similarly, nick translation produces a mixture of numerous nucleic acid fragments of varying lengths from both strands. Breaks in both strands of DNA are created and new nucleic acid strands are synthesized from the position of the nick using a mix of labeled and unlabeled bases. Since the position of nicking is arbitrary, label incorporation is not controlled either.
The branched DNA technology has been used in diagnostic tests as a means to attach several labels to a target DNA. The methodology relies on the creation of several branches of synthetic nucleic acid each bound to a probe, followed by hybridization of multiple labeled oligonucleotides to each of the branched amplification multimers. The method requires the costly preparation of many probes and branched DNA and is not generally applicable, especially for the generation of short pieces of highly labeled DNA.