The smallest possible difference between two DNA sequences is a change of a single base, a Single Nucleotide Polymorphism or SNP. Such differences are common in the human population, occurring roughly one every 1000 bases between any two unrelated individuals. Some SNPs have medically important consequences, while others are silent but may be useful as markers to study genetic transmission of traits.
A number of methods have been developed to score SNPs, including allele-specific hybridization, electrophoretic DNA sequencing, single-nucleotide extension using labeled chain terminators, the “Invader” assay (Third Wave Technologies, Madison Wis.), mass spectrometry, the 5′ nuclease, assay (Taqman; see below), etc. All of these methods entail assays that are either difficult or expensive to develop, or difficult or expensive to perform.
It will be appreciated that while SNPs are common, it is at times advantageous to score other polymorphisms such as insertions, deletions, rearrangements or sequence alterations involving more than one base. SNP scoring has been emphasized in the literature because it is the most difficult case, but most methods capable of scoring SNPs are also capable of scoring additional types of polymorphisms.
One of the known assays for detecting single-base differences in DNA samples uses an exonuclease specific for mismatched bases (see, e.g., U.S. Pat. No. 5,391,480). In general, such an assay involves labeling the 3′ nucleotide in a primer with a fluorescent marker. The labeled oligonucleotide is hybridized to an unknown DNA sample. If the 3′ nucleotide (the query position) of the oligonucleotide is complementary to the corresponding nucleotide in the hybridized DNA, it will be insensitive to nuclease; if there is a mismatch it will be sensitive to nuclease and will be cleaved. For example, in a PCR reaction, the query position corresponds to the 3′ end of one of the two primers. This primer is synthesized in two versions (1 and 2), one complementary to each of the two expected versions of a SNP (SNP versions 1 and 2, respectively). The 3′ nucleotides of primers 1 and 2 are labeled with distinguishable fluors. The polymerase used for the PCR is one capable of excising mismatched 3′ nucleotides (an “error-correcting” or “3′ exonuclease-activity-containing” polymerase). If the input template contains SNP version 1, then primer 2 will at some frequency anneal to an amplicon containing SNP version 1 and the 3′ nucleotide will be clipped off by the error-correcting activity of the polymerase. Clipped-off fluorescent nucleotides are detected by a decrease in fluorescence polarization (FP). At the same time, primer 1, which is fully complementary to SNP version 1, will at some frequency anneal to an amplicon containing SNP version 1 and be extended to full amplicon length. The extended primer then becomes insensitive to further attack by nuclease. Thus, if SNP version 1 is present, there will be a decrease in FP for the fluor linked to primer 2; if SNP version 2 is present, there will be a decrease in FP for the fluor linked to primer 1; if both SNP versions are present (as in a heterozygote), then there will be a decrease in FP for both fluors, but to a smaller extent for each.
Commercially-available polymerases such as Pfu are capable of extending a labeled nucleotide if it is correctly matched and clipping it if it is mismatched. The procedure is also distinct from the “Taqman” assay (see, e.g., U.S. Pat. Nos. 5,210,015 and 5,487,972), which uses the 5′-3′ nuclease activity of some thermostable polymerases.
There are a number of problems and deficiencies with this method, however. First, known error-correcting polymerases, such as the Pyrococcus genus family B polymerases, are ill-suited to amplification of sequences directly from genomic DNA. The processivity of the polymerases is too low to reliably complete a full-length copy of an amplicon in a single round. Thus, completion of a full length copy must rely on hybridization of the partial copies to a suitable template in the reaction mix, and therefore only occurs if the template concentration is relatively high. This creates a problem, because it is preferable to use low amounts of genomic DNA in a PCR reaction in order to allow use of DNA that is not highly purified; and to reduce the amount of non-specific DNA, which can lead to side reactions, present in the reaction. The prior art protocol is therefore conventionally performed by 1) pre-amplifying a region containing the SNP site using unlabeled primers and Taq or other polymerase capable of amplifying single copies, 2) purifying the amplified DNA, 3) re-amplifying with labeled primers and an error-correcting polymerase, and 4) detecting whether error correction has occurred.
Second, the methods used for scoring whether error correction has occurred (and therefore what versions of an SNP are present in the original sample) are inadequate for low cost and high throughput. Given the cost of reagents and disposables, and the amortized cost of equipment and space, it is exceedingly difficult to run a PCR for less than 10–20 US cents. Yet, for many applications, SNP scoring is not economical unless it can be done for 1 US cent per locus. Therefore, it is necessary to score at least ten and perhaps many more SNPs per PCR. Assays based on scoring with FP can score no more than 1 or 2 SNPs per PCR.
The current invention meets the need for an economical SNP assay that can be performed using small amounts of genomic DNA. Here, we describe an error-correction SNP assay capable of robust operation from small amounts of genomic DNA and several methods for parallelizing this assay for low-cost, high throughput operation.
The processivity of a polymerase, i.e., the amount of product generated by the enzyme per binding event, can be enhanced by increasing the stability of the modifying enzyme/nucleic acid complex. Co-pending U.S. application Ser. No. 09/870,353 and WO01/92501 disclose modified polymerases that have increased processivity that is achieved by joining a sequence-non-specific double-stranded nucleic acid binding domain to the enzyme, or its catalytic domain. Among the modified polymerases disclosed are error-correcting Family B polymerases, which typically are used in the current invention.