1. Field of the Invention
The present invention relates to methods of determining the identity of a polymorphic nucleotide in a target sequence having at least two variants such as a single nucleotide polymorphism, or SNP. The methods of the present invention utilize primers having sequences complementary to the region upstream of the position being analyzed. Extension of primers hybridized to target sites is carried out in the absence of a deoxyribonucleoside triphosphate (dNTP) or ribonucleoside triphosphate (rNTP) complementary to one of the polymorphic nucleotides. Differences in length between the primers and any extension products reveal the identity of the nucleotide present at the polymorphic site.
2. Background of the Invention
DNA polymorphism can be due to differences in sequence or in length of a genomic region. Approximately 80% of human DNA polymorphisms are sequence polymorphisms, while only about 20% are length polymorphisms. About 90% of sequence polymorphisms are single nucleotide polymorphisms (SNPs). SNPs are genetic variations that arise from differences in the identity of a single nucleotide in a nucleic acid sequence, giving rise to two variants (sometimes called alleles) of that site. Sites having three polymorphic nucleotides have also been detected. SNPs appear to be the most widely distributed genetic markers in the human genome, occurring approximately every kilobase. Since SNPs represent the most common type of DNA sequence variation, the ability to discriminate between variants of these genetic markers is a very important tool in genetic research.
Many inherited diseases are the result of single point mutations at SNP sites. In some cases, the single point mutation causing nucleotide substitution in a protein-encoding gene is sufficient to actually cause the disease, as in sickle cell anemia and hemophilia. For diseases influenced by a large number of genes, including diabetes, heart disease, various cancers, and certain psychiatric disorders, SNPs are studied as markers to aid scientists in creating detailed maps genetic variation to help find disease-linked genes.
SNP markers can be used to identify genes involved in disease or associated with any detectable phenotype by identifying the variant bases of one or more SNPs that correlate with the presence, absence, or degree of severity of the condition. DNA samples are isolated from individuals with and without the disease, and the identity of the polymorphic bases of one or more SNPs from each population are determined. The variants having a statistical association with the disease or phenotype are identified. Thereafter, samples may be taken from individuals and the variant bases of one or more SNPs associated with a disease or phenotype can be identified to determine whether the individuals are likely to develop a particular disease or phenotype, or whether they already suffer from a particular disease or possess a particular phenotype. Mapping SNP markers associated with a disease or phenotype to their chromosomal locations can identify the genes in which they occur, or indicate nearby genes having a role in the development or severity of the disease. By developing a high-density SNP map of the human genome, scientists hope to be able to pinpoint the genetic origins of diseases, the genetic differences that predispose some individuals to disease and underlie variations in individual responses to treatment and, potentially, to predict the most appropriate drugs to treat disease in individuals of a given genetic makeup.
Both the high frequency and wide distribution of SNPs in the human genome makes them a valuable source of biallelic markers for identity testing, genome mapping, and medical diagnostics. SNPs are densely spaced in the human genome, with an estimated number of more than 107 sites scattered along the 3xc3x97109 base pairs of the human genome. Because SNPs occur at a greater frequency and with more uniform distribution than other classes of polymorphisms such as variable number of tandem repeat (VNTR) polymorphisms or restriction fragment length polymorphisms (RFLPs), there is a greater probability that SNP markers will be found in close proximity to a genetic locus of interest. SNPs are also preferred as markers because they are mutationally more stable than VNTRs, which have a high mutation rate. In addition, genome analysis using VNTRs and RFLPs is highly dependent on the method used to detect the polymorphism, while new SNPs can easily be detected by sequencingxe2x80x94either random sequencing to detect new SNPs or targeted sequencing to analyze known SNPs.
The different forms of a characterized SNP are easy to distinguish and can therefore be used on a routine basis for genetic typing based on polymorphisms within and between individuals. SNPs correspond to a locus where the sequence differs by a single nucleotide and has only two alleles, making SNPs suitable for highly parallel detection and automated scoring. These features offer the possibility of developing rapid, high-throughput genotyping using SNP analysis.
At present, SNPs can be characterized using any of a variety of methods. These methods include direct or indirect sequencing of the site, oligonucleotide ligation assays (OLAs), ligase/polymerase analysis, use of allele-specific hybridization probes, use of dideoxyribonucleoside triphosphates (ddNTPs) for extension in solution or on solid phase, or use of restriction enzymes to map SNPs. A significant disadvantage of the oligonucleotide ligation assay (OLA) is that this method requires each possible variant of the SNP to be analyzed using a separate set of oligonucleotides for each nucleotide. The main drawback of OLA is that ligation is not a highly discriminating process, such that non-specific signals can occur with an unacceptably high frequency. Techniques such as sequencing, ligase/polymerase analysis, or restriction enzyme mapping are laborious, time-consuming, and often quite expensive for large-scale analysis. Methods such as extension in solution or on solid phase, or ligase/polymerase analysis, rely on incorporation of expensive ddNTPs or labeled dNTPs between bases at a polymorphic site. Since the signal is proportional to the number of ddNTPs or labeled dNTPs incorporated, these methods are often not sensitive enough to be used for routine analysis. For extension on solid phase, primers must first be immobilized to a solid support. Use of a solid support often interferes with hybridization of a primer to the target sequence.
A rapid, accurate, and cost-effective method is needed to meet demands for automated high-throughput analysis of SNPs.
The present invention provides a simple and effective method for determining the identity of the nucleotide present at a polymorphic site.
The invention involves detection of reaction products of a primer that hybridizes upstream of the polymorphic site. DNA polymerase or RNA polymerase is used to extend the primer in the absence of a dNTP or rNTP complementary to one of the variants at the polymorphic site, for example a SNP.
In detail, the invention provides a method for determining the identity of a nucleotide at a polymorphic site, said method comprising the following steps:
1) Oligonucleotide hybridization: Oligonucleotides having a nucleotide sequence complementary to that of a target molecule known to contain a SNP are hybridized in a manner such that the 3xe2x80x2 terminus of the hybridized oligonucleotide is upstream of the preselected site.
2) Polymerase extension: The hybridized primer is incubated with DNA or RNA polymerase and one or more dNTPs or rNTPs, under conditions sufficient to permit template-dependent polymerase incorporation of the dNTP or rNTP to the 3xe2x80x2 terminus of the hybridized oligonucleotide. Extension reactions are performed in the absence of a dNTP or rNTP complementary to one of the variants. Extension of the primer to the polymorphic position depends upon whether the reaction mixture contains the dNTP or rNTP complementary to the variant present at the preselected site.
3) Analysis: The reaction products are analyzed to determine whether the primer has been extended to the polymorphic position. Because the extension reactions are performed in the absence of a dNTP or rNTP complementary to one of the variants, the reaction product will not include the polymorphic base if the reaction mixture lacks the complementary nucleotide. Thus, the length of the reaction product depends on whether extension proceeded to the polymorphic position. Suitable methods for analysis include any convenient means of determining the length of the reaction product, including HPLC, capillary electrophoresis, microfluidics technology, or slab gel elecrophoresis. The primers and extension products may be detected using well known methods, including use of intercalators, DNA-binding dyes, or UV light.