This invention relates generally to the field of nucleic acid biology. More specifically, the invention provides methods and compositions for high-throughput amplification, detection and comparison of polynucleotide sequence variations in biological samples for research, diagnostic and therapeutic applications.
As the Human Genome Project approaches completion of a reference sequence of the human genome, increasing attention is being paid to uncovering DNA sequence variations among groups of individuals as well as between different human populations. Identifying these variations is a critical part of further exploration of the genetic basis for predisposition and resistance to disease. These sequence variations will serve as genetic markers in studies of diseases and traits with complex inheritance patterns and strong environmental interactions.
Currently, large-scale sequence assays for population-based genetic variations such as single nucleotide polymorphisms (SNPs) are done on hybridization-based oligonucleotide arrays (DNA chips). For example, U.S. Pat. No. 5,837,832 (Chee et al.) describes DNA chips containing arrays of four sets of probes, each of which differs from others on a single nucleotide. Target polynucleotides of interest are hybridized to the DNA chip and the specific sequence variations detected based on the target polynucleotides"" preference and degree of hybridization at discrete probe locations. Similar technology was used in U.S. Pat. No. 5,861,242 (Chee et al.) for analysis of various HIV DNA sequences.
Several problems are associated with the current hybridization-based sequence variation assays, and hence limiting their applications. See review by Hacia (1999) Nature Genetics Supp. 21:42-47. For example, accuracy of the hybridization assay remains poor, which hinders its use in heterozygous mutation screens. The same experimental approach applied to any two sequences can yield results with vastly different accuracy. The false negative error rate of the hybridization-based mutational analysis needs to be improved. Since the hybridization-based methodology is hinged on the hybridization differences by one nucleotide, the specificity of hybridization-based sequence analysis can be dramatically influenced by variations in target polynucleotide as well as in hybridization conditions. Hybridization-based mutation detection is particularly powerless when the target polynucleotides are in trace amount in the sample.
Detection of small quantities of genetic materials represents a major challenge in biological research and clinical diagnosis. Polymerase chain reaction (PCR) provides a powerful tool for in vitro amplification of specific polynucleotide sequences, such as genomic DNA, single stranded cDNA or mRNA, with high sensitivity and specificity. One application of this is the amplification of target gene sequences in biological samples from, for example, environmental, food and medical sources, etc. to allow identification of causative, pathogenic, spoilage or indicator organisms present in the sample.
Therefore, there exist a need for developing methods for analyzing sequence variations with higher accuracy and greater sensitivity.
The present invention provides novel methods for sequence variation analysis with greater sensitivity, better accuracy and less time-consuming, as compared to the conventional hybridization-based approaches.
In one aspect, the invention provides methods for detecting sequence variations between a target polynucleotide and a reference sequence, including single or multiple base substitutions, deletions or insertions, and other more complex variations. The methods utilize an array of multiple groups of oligonucleotide primers immobilized to a solid phase support, with each group of oligonucleotide primers being selected to span a particular region of the reference sequence, occupying a discrete area of the array, and comprising at least four sets of primers: 1) a first set that is exactly complementary to the reference sequence; and 2) three additional sets of primers, each of which is identical to the first set of primers but for the most 3xe2x80x2-end nucleotide which is different in each of the three sets. The array of the invention can be used in a polymerase-mediated amplification reaction, during which the target polynucleotide serves as template for the synthesis of detectable nascent polynucleotides which are extended from the appropriate sets of primers that are exactly complementary to the target polynucleotide. The immobilized primers enable xe2x80x9cin-situxe2x80x9d hybridization and amplification of specific regions of the target polynucleotide on a solid-phase support. The nascent strand at each primer site can be detected quantitatively with labels that are incorporated into the strand during amplification. In one preferred embodiment, the amplification means for practicing the invention is PCR. The microarray on a solid phase support can comprise up to about 100,000 groups of primers. As such, the method is useful for detecting up to about 100,000 different regions of the target polynucleotide. For most applications, a high number of groups will be desirable, although it is clear that there is no lower limit to the number of groups which can be present on the support.
According to one embodiment of the invention, an immobilized primer is used alone for asymmetric PCR of a target polynucleotide that will result in a single complementary strand attached to the solid phase at each proper primer site and detected optionally with labels incorporated into the strand. According to another embodiment of the invention, another primer for each target polynucleotide is present in solution so that both strands for a target polynucleotide can be symmetrically synthesized and retained at each primer site for enhanced detection.
The present invention can be used to detect sequence variations in a single target polynucleotide as compared with a reference sequence, in which case the DNA array of the invention comprises multiple groups of primers corresponding and or relating to the reference sequence, as described above. Alternatively, the invention can be used to detect sequence variations in multiple target polynucleotides as compared with one or many reference sequences. The target polynucleotides can be structurally related or unrelated. When multiple target polynucleotides with no sequence homology are detected according to the present invention, the DNA microarray is divided into different areas with each area devoted to a particular reference sequence aimed at a particular target polynucleotide. Multiple groups of primers are affixed onto the solid support within the area, with each group being selected to span a particular region of the reference sequence. As in the case of single target polynucleotide, each group comprises at least four sets of primers: 1) a first set that is exactly complementary to the reference sequence; and 2) three additional sets of primers, each of which is identical to the first set of primers but for the most 3xe2x80x2-end nucleotide which is different in each of the three sets.
The invention further provides kits for detecting sequence variations in a target polynucleotide using either symmetric PCR or asymmetric PCR approach as disclosed herein. The kits comprise a microarray of PCR primers and reagents necessary for PCR reaction and detection. The microarray of primers can comprise up to about 100,000 groups of primers tailored to particular reference sequences. In one embodiment of the invention, the kits comprise labeled nucleotides capable of being incorporated into the synthesized strands during PCR reaction.