Human diseases arise from a complex interaction of DNA polymorphisms or mutations and environmental factors. Single nucleotide polymorphisms (SNPs) have recently been identified as potentially powerful means for genetic typing, and are predicted to supersede microsatellite repeat analysis as the standard for genetic association, linkage, and mapping studies.
The major goal in human genetics is to ascertain the relationship between DNA sequence variation and phenotypic variation. For these studies, molecular polymorphisms are indispensable for conventional meiotic mapping, fine-structure mapping and haplotype analysis. However, with the contemplated sequencing of a reference human genome and identification of all human genes, studies of complex genetic disorders are expected to be more efficient if one were to systematically search all human genes for functional variants by association and linkage disequilibrium studies. This requires the development of technology and methods for the systematic discovery of genetic variation in human DNA, primarily the single nucleotide polymorphisms (SNPs) which are the most abundant.
Several different types of polymorphism have been reported. A restriction fragment length polymorphism (RFLP) means a variation in DNA sequence that alters the length of a restriction fragment as described in Botstein et al., Am. J. Hum. Genet. 32, 314-331 (1980). The restriction fragment length polymorphism may create or delete a restriction site, thus changing the length of the restriction fragment. RFLPs have been widely used in human and animal genetic analyses (see WO 90/13668; WO90/11369; Donis-Keller, Cell 51, 319-337 (1987); Lander et al., Genetics 121, 85-99 (1989)). When a heritable trait can be linked to a particular RFLP, the presence of the RFLP in an individual can be used to predict the likelihood that the animal will also exhibit the trait.
Other polymorphisms take the form of short tandem repeats (STRS) that include tandem di-, tri- and tetra-nucleotide repeated motifs. These tandem repeats are also referred to as variable number tandem repeat (VNTR) polymorphisms. VNTRs have been used in identity and paternity analysis (U.S. Pat. No. 5,075,217; Armour et al., FEBS Lett. 307, 113-115 (1992); Hom et al., WO 91/14003; Jeffreys, EP 370,719), and in a large number of genetic mapping studies.
Other polymorphisms take the form of single nucleotide variations between individuals of the same species. Such polymorphisms are far more frequent than RFLPs, STRs and VNTRs. Some single nucleotide polymorphisms occur in protein-coding sequences, in which case, one of the polymorphic forms may give rise to the expression of a defective or other variant protein. Other single nucleotide polymorphisms occur in noncoding regions. Some of these polymorphisms may also result in defective or variant protein expression (e.g., as a result of defective splicing). Other single nucleotide polymorphisms have no phenotypic effects. Single nucleotide polymorphisms occur with greater frequency and are spaced more uniformly throughout the genome than other forms of polymorphism. The greater frequency and uniformity of single nucleotide polymorphisms means that there is a greater probability that such a polymorphism will be found in close proximity to a genetic locus of interest than would be the case for other polymorphisms. The presence of SNPs may be linked to, for example, a certain population, a disease state, or a propensity for a disease state.
Generally, polymorphisms can be associated with the susceptibility to develop a certain disease or condition. The presence of polymorphisms that cause a change in protein structure are more likely to correlate with the likelihood to develop a certain type or “Trait” Thus, it is highly desirable to dispose of methods that allow quick and cheap genotyping of subjects. Early identification of alleles that are linked to an increased likelihood of developing a condition would allow early intervention and prevention of the development of the disease.
Pharmacogenomics is the study of the relationship between an individual's genotype and that individual's response to a foreign compound or drug. Differences in metabolism of therapeutics can lead to severe toxicity or therapeutic failure by altering the relation between dose and blood concentration of the pharmacologically active drug. Thus, a physician or clinician may consider applying knowledge obtained in relevant pharmacogenomics studies in determining the type of drug and dosage and/or therapeutic regimen of treatment.
Pharmacogenomics deals with clinically significant hereditary variations in the response to drugs due to altered drug disposition and abnormal action in affected persons. See, for example, Eichelbaum, M. et al. (1996) Clin. Exp. Pharmacol. Physiol. 23(1-11):983-985 and Linder, M. W. et al. (1997) Clin. Chem. 43(2):254-266. In general, two types of pharmacogenetic conditions can be differentiated. Genetic conditions transmitted as a single factor altering the way drugs act on the body (altered drug action) or genetic conditions transmitted as single factors altering the way the body acts on drugs (altered drug metabolism). These pharmacogenetic conditions can occur either as rare genetic defects or as naturally-occurring polymorphisms. For example, glucose-6-phosphate dehydrogenase deficiency (G6PD) is a common inherited enzymopathy in which the main clinical complication is haemolysis after ingestion of oxidant drugs (anti-malarials, sulfonamides, analgesics, nitrofarans) and consumption of fava beans. Thus, it would be highly desirable to dispose of fast and cheap methods for determining a subject's genotype so as to predict the best treatment.
Thus, there is a considerable demand for high throughput, very low cost nucleotide sequence (e.g., SNPs) identification in regions of known sequence in order to identify alleles of polymorphic genes, e.g., SNPs. There are currently many methods available to screen polymorphisms, e.g., SNPs. A typical genotyping strategy involves three basic steps. The first step consists of amplifying the target DNA, which is necessary since a human genome contains 3×109 base pairs of DNA and most assays lack both the sensitivity and the selectivity to accurately detect a small number of bases, in particular a single base, from a mixture this complex. As a result, most strategies currently used rely on first amplifying a region of several hundred bases including the polymorphic region to be screened using PCR. This reaction requires 2 unique primers for each amplified region (“amplicon”). Once the complexity has been reduced, the second step in the currently used methods consists of differentially labeling the alleles so as to be able to identify the genotype. This step involves attaching some identifiable marker (e.g. fluorescent label, mass tag, etc.) in a manner which is specific to the base being assayed. The third step in currently used methods consists of detecting the allele to determine the individuals genotypes. Detection mechanisms include fluorescent signals, the polarization of a fluorescent signal, mass spectrometry to identify mass tags, etc.
Sensitivity, i.e. detection limits, remain a significant obstacle in nucleic acid detection systems, and a variety of techniques have been developed to address this issue. Briefly, these techniques can be classified as either target amplification or signal amplification. Target amplification involves the amplification (i.e. replication) of the target sequence to be detected, resulting in a significant increase in the number of target molecules. Target amplification strategies include the polymerase chain reaction (PCR), strand displacement amplification (SDA), and nucleic acid sequence based amplification (NASBA).
Alternatively, rather than amplify the target, alternate techniques use the target as a template to replicate a signaling probe, allowing a small number of target molecules to result in a large number of signaling probes, that then can be detected. Signal amplification strategies include the ligase chain reaction (LCR), cycling probe technology (CPT), invasive cleavage techniques such as Invader™ technology, Q-Beta replicase (QβR) technology, and the use of “amplification probes” such as “branched DNA” that result in multiple label probes binding to a single target sequence.
The polymerase chain reaction (PCR) is widely used and described, and involves the use of primer extension combined with thermal cycling to amplify a target sequence; see U.S. Pat. Nos. 4,683,195 and 4,683,202, and PCR Essential Data, J. W. Wiley & sons, Ed. C. R. Newton, 1995, all of which are incorporated by reference. In addition, there are a number of variations of PCR which also find use in the invention, including “quantitative competitive PCR” or “QC-PCR”, “arbitrarily primed PCR” or “AP-PCR”, “immuno-PCR”, “Alu-PCR”, “PCR single strand conformational polymorphism” or “PCR-SSCP”, allelic PCR (see Newton et al. Nucl. Acid Res. 17:2503 91989); “reverse transcriptase PCR” or “RT-PCR”, “biotin capture PCR”, “vectorette PCR”, “panhandle PCR”, and “PCR select cDNA subtraction”, among others.
Strand displacement amplification (SDA) is generally described in Walker et al., in Molecular Methods for Virus Detection, Academic Press, Inc., 1995, and U.S. Pat. Nos. 5,455,166 and 5,130,238, all of which are hereby incorporated by reference.
Nucleic acid sequence based amplification (NASBA) is generally described in U.S. Pat. No. 5,409,818 and “Profiting from Gene-based Diagnostics”, CTB International Publishing Inc., N.J., 1996, both of which are incorporated by reference.
Cycling probe technology (CPT) is a nucleic acid detection system based on signal or probe amplification rather than target amplification, such as is done in polymerase chain reactions (PCR). Cycling probe technology relies on a molar excess of labeled probe which contains a scissile linkage of RNA. Upon hybridization of the probe to the target, the resulting hybrid contains a portion of RNA:DNA. This area of RNA:DNA duplex is recognized by RNAseH and the RNA is excised, resulting in cleavage of the probe. The probe now consists of two smaller sequences which may be released, thus leaving the target intact for repeated rounds of the reaction. The unreacted probe is removed and the label is then detected. CPT is generally described in U.S. Pat. Nos. 5,011,769, 5,403,711, 5,660,988, and 4,876,187, and PCT published applications WO 95/05480, WO 95/1416, and WO 95/00667, all of which are specifically incorporated herein by reference.
The oligonucleotide ligation assay (OLA; sometimes referred to as the ligation chain reaction (LCR)) involves the ligation of at least two smaller probes into a single long probe, using the target sequence as the template for the ligase. See generally U.S. Pat. Nos. 5,185,243, 5,679,524 and 5,573,907; EP 0 320 308 B1; EP 0 336 731 B1; EP 0 439 182 B1; WO 90/01069; WO 89/12696; and WO 89/09835, all of which are incorporated by reference.
Invader™ technology is based on structure-specific polymerases that cleave nucleic acids in a site-specific manner. Two probes are used: an “invader” probe and a “signaling” probe, that adjacently hybridize to a target sequence with a non-complementary overlap. The enzyme cleaves at the overlap due to its recognition of the “tail”, and releases the “tail” with a label. This can then be detected. The Invader™ technology is described in U.S. Pat. Nos. 5,846,717; 5,614,402; 5,719,028; 5,541,311; and 5,843,669, all of which are hereby incorporated by reference.
None of the methods currently used are particularly well suited to very high throughput at low cost. One of the principal shortcomings of the available methods are their reliance on the Polymerase Chain Reaction (PCR) in order to generate relatively simple DNA template for polymorphism analysis (i.e., genotyping). This reaction is not easily multiplexed which implies that each assay for identifying a particular polymorphism requires a separate reaction. This makes any high throughput assay cumbersome and expensive as millions of reactions will have to be performed in order to screen the requisite number of polymorphism. Thus, there is a need for a method that allows thousands of polymorphic regions, e.g., SNPs to be analyzed and quantified in a single reaction vessel, greatly increasing the throughput and decreasing the cost of analysis.