The DNA that makes up human chromosomes provides the instructions that direct the production of all proteins in the body. These proteins carry out the vital functions of life. Variations in DNA sequences encoding a protein may produce variations or mutations in the proteins encoded, potentially affecting the normal function of cells. Although environment often plays a significant role in disease, variations or mutations in the DNA of an individual are directly related to almost all human diseases, including infectious disease, cancer, and autoimmune disorders. Knowledge of genetics will help unravel the genetic bases of disease and be useful in treatment. For example, knowledge of human genetics has led to a limited understanding of variations between individuals when it comes to drug response—the field of pharmocogenetics. Over half a century ago, adverse drug responses were correlated with amino acid variations in two drug-metabolizing enzymes, plasma cholinesterase and glucose-6-phosphate dehydrogenase. Since then, careful genetic analyses have linked sequence polymorphisms in over 35 drug metabolism enzymes, 25 drug targets and 5 drug transporters with compromised levels of drug efficacy or safety (Evans and Relling, Science 296:487–91 (1999)).
Any two humans are 99.9% similar in their genetic makeup; thus, most of the sequence of the DNA of their genomes is identical. However, it is crucial to identify and understand the differences, as it is these DNA sequence differences that account for the phenotypic differences between individuals, including susceptibility to disease and response to treatment of disease. The differences in DNA appear in many forms, for example there are deletions of many-base stretches of DNA, insertions of stretches of DNA, differences in the number of repetitive DNA elements in non-coding regions, and, perhaps most importantly, changes in single nitrogenous base positions in the genome called “single nucleotide polymorphisms” (SNPs).
There are several methods for SNP genotyping known in the art. For example, DNA sequencing is well known and generally available in the art and may be used to determine the location of SNPs in a genome. See, for example, Sambrook, et al., Molecular Cloning: A Laboratory Manual (Cold Spring Harbor Laboratory, New York) (1989), and Ausubel, et al., Current Protocols in Molecular Biology (John Wiley and Sons, New York) (1997), incorporated herein by reference. Sequencing methods may be used to determine the sequence of the same genomic regions from different DNA strands where the sequences are then compared, by, for example, computer software, and the differences are noted. DNA sequencing methods may employ such enzymes as the Klenow fragment of DNA polymerase 1, Sequenase (US Biochemical Corp, Cleveland, Ohio.), Taq polymerase (Perkin Elmer), thermostable T7 polymerase (Amersham, Chicago, Ill.), or combinations of polymerases and proofreading exonucleases such as those found in the Elongase Amplification System marketed by Gibco/BRL (Gaithersburg, Md.). Preferably, the process is automated with machines such as the Hamilton Micro Lab 2200 (Hamilton, Reno, Nev.), Peltier Thermal Cycler (PTC200; MJ Research, Watertown, Mass.) and the ABI Catalyst and 373 and 377 DNA Sequencers (Perkin Elmer, Wellesley, Mass.).
Capillary electrophoresis systems that are commercially available may be used to perform SNP analysis. In particular, capillary sequencing may employ flowable polymers for electrophoretic separation, four different fluorescent dyes (one for each nucleotide) which are laser activated, and detection of the emitted wavelengths by a charge coupled device camera. Output/light intensity may be converted to electrical signal using appropriate software (e.g. Genotyper and Sequence Naviagator, Perkin Elmer, Wellesley, Mass.) and the entire process from loading of samples to computer analysis and electronic data display may be computer controlled. Again, this method may be used to determine the sequence of the same genomic regions from different DNA strands where the sequences are then compared and the differences noted.
Alternatively, once a genomic sequence from one reference DNA strand has been determined by sequencing, it is possible to use hybridization techniques to determine differences in sequence between the reference strand and other DNA strands. These differences may be SNPs. An example of a suitable hybridization technique involves the use of DNA chips (oligonucleotide arrays), for example, those available from Affymetrix, Inc. Santa Clara, Calif. For details on the use of DNA chips for the detection of, for example, SNPs, see U.S. Pat. No. 6,300,063 issued to Lipshutz, et al., and U.S. Pat. No. 5,837,832 to Chee, et al.
Another technique suitable for the detection of SNPs in genomic DNA is the Invader technology available from Third Wave Technologies, Inc., Madison, Wis. Examples of using this technology to detect SNPs may be found, e.g., in Hessner, et al., Clinical Chemistry 46(8):1051–56 (2000); and Hall, et al., PNAS 97(15):8272–77 (2000). In the Invader process, two short DNA probes hybridize to a target nucleic acid to form a structure recognized by a nuclease enzyme. For SNP analysis, two separate reactions are run—one for each SNP variant. If one of the probes is complementary to the sequence, the nuclease will cleave it to release a short DNA fragment termed a “flap”. The flap binds to a fluorescently-labeled probe and forms another structure recognized by a nuclease enzyme. When the enzyme cleaves the labeled probe, the probe emits a detectable fluorescence signal thereby indicating which SNP variant is present. One advantage of this method is that amplification of the target DNA sequence is not necessary.
Another technique for SNP analysis, rolling circle amplification, utilizes an oligonucleotide complementary to a circular DNA template to produce an amplified signal. Extension of the oligonucleotide results in the production of multiple copies of the circular template in a long concatemer. Typically, detectable labels are incorporated into the extended oligonucleotide during the extension reaction. The extension reaction can be allowed to proceed until a detectable amount of extension product is synthesized, then the extension product is analyzed by various methods, such as sequencing techniques or using microarrays.
Another technique suitable for the detection of SNPs makes use of the 5′-exonuclease activity of a DNA polymerase to generate a signal by digesting a probe molecule to release a fluorescently labeled nucleotide. This assay is frequently referred to as a Taqman assay (see, e.g., Arnold, et al., BioTechniques 25(1):98–106 (1998)). A target DNA containing a SNP is amplified in the presence of a probe molecule that hybridizes to the SNP site. The probe molecule contains both a fluorescent reporter-labeled nucleotide at the 5′-end and a quencher-labeled nucleotide at the 3′-end. The probe sequence is selected so that the nucleotide in the probe that aligns with the SNP site in the target DNA is as near as possible to the center of the probe to maximize the difference in melting temperature between the correct match probe and the mismatch probe. As the PCR reaction is conducted, the correct match probe hybridizes to the SNP site in the target DNA and is digested by the Taq polymerase used in the PCR assay. This digestion results in physically separating the fluorescent labeled nucleotide from the quencher with a concomitant increase in fluorescence. The mismatch probe does not remain hybridized during the elongation portion of the PCR reaction and is, therefore, not digested and the fluorescently labeled nucleotide remains quenched.
Denaturing HPLC using a polystyrene-divinylbenzene reverse phase column and an ion-pairing mobile phase also can be used to identify SNPs. In this process, a DNA segment containing a SNP is PCR amplified. After amplification, the PCR product is denatured by heating and mixed with a second denatured PCR product with a known nucleotide at the SNP position. The PCR products are annealed and are analyzed by HPLC at elevated temperature. The temperature is chosen to denature duplex molecules that are mismatched at the SNP location but not to denature those that are perfect matches. Under these conditions, heteroduplex molecules typically elute before homoduplex molecules. For an example of the use of this technique see Kota, et al., Genome 44(4):523–28 (2001).
SNPs can be detected using solid phase amplification and microsequencing of the amplification product. Beads to which primers have been covalently attached are used to carry out amplification reactions. The primers are designed to include a recognition site for a Type II restriction enzyme. After amplification—which results in a PCR product attached to the bead—the product is digested with the restriction enzyme. Cleavage of the product with the restriction enzyme results in the production of a single stranded portion including the SNP site and a 3′-OH that can be extended to fill in the single stranded portion. Inclusion of ddNTPs in an extension reaction allows direct sequencing of the product. For an example of the use of this technique to identify SNPs see Shapero, et al., Genome Research 11:1926–34 (2001).
Similarly, Shuber, U.S. Pat. No. 5,707,806, describes a method of minisequencing up to 2000 bp in the vicinity of a SNP after cleavage with mismatch repair enzymes. Methyl-directed mismatch repair enzymes, such as Mut S, Mut L, Mut H and Mut U, work as a complex to recognize and cleave at or around a mismatch (where the SNP is located if heteroduplexes have been formed). The Shuber method then employs dNTPs and ddNTPs and a polymerase to fill in the gaps. The products are then sequenced. The sequence of the sequenced products is compared to a known sequence and the differences are noted. DNA polymerases used for this method include DNA pol I, pol III, T7 DNA pol and T4 DNA polymerase.
Other techniques for SNP detection and/or genotyping are the Single Strand Conformation Polymorphisms (SSCP) technique and the Denaturing Gradient Gel Electrophoresis (DGGE) technique. In SSCP, sample and control DNAs are denatured and run on polyacrylamide gels in a non-denaturing environment. Single strands of DNA with a SNP are separated on the gel, and will show different mobility as compared to the single strands of the control DNA. The difference in mobility is caused by a conformational change of the single stranded DNA due to the single base change. However, in this method the examined DNA fragment size must be restricted to less than about 300 bp, as sensitivity of the assay is decreased if the fragment is larger.
The DGGE method is similar to SSCP in the sense that it depends on DNA denaturation by gel, but DGGE uses heat or chemical denaturants to separate the two strands of the DNA being examined. DNA fragments have different melting temperatures, determined by their nucleotide sequence. The hydrogen bonds formed between G/C melt at a higher temperature than those of T/A. When separated by electrophoresis through a gradient of increasing temperature, the control DNA and the sample DNA that contains different nucleotides will melt at different specific points on the gel, according to their melting temperatures. In addition to other drawbacks to the SSCP and DGGE processes, neither process identifies the position of the SNP—they only indicate whether one or more different nucleotides are present.
In the Single Base Primer Extension assay, double stranded sample DNA is denatured and primers complementary to the sequence are added and allowed to anneal to the DNA. The primers are usually about 20–30 nucleotides in length, and their 3′ end is adjacent to the SNP. Next, DNA polymerase and ddNTPs with varying fluorescent tags are added. By identifying the 3′ base added to the primer, it is possible to identify if a SNP is present. This technique, like some of those discussed previously, requires that one know a priori the location of each SNP, and requires synthesis of a specific primer for each SNP location. As an alternative to using varying tags, a mass spectrometer may be employed. Sequenom, Inc. applies this assay with the use of mass spectrometry.
Allele Specific Oligonucleotide Ligation is yet another technique employing specific primers. One primer is complementary to the target sequence 5′ to and including the SNP position. The second primer is complementary to the sequence immediately 3′ of the SNP position. The sample DNA is denatured and allowed to hybridize with the primers. DNA ligase is then added. If the upstream primer matches the SNP, ligation will be achieved between the two primers. If there is a mismatch—the primer does not match the SNP and ligation is not achieved. Thus, if ligation has taken place, the product will be a long single strand with the two primers connected together. Again, however, this technique requires that one know a priori the location of each SNP, and requires synthesis of two specific primers for each SNP location.
Another assay for SNPs is Allele Specific Hybridization. As with Allele Specific Oligonucleotide Ligation, sample DNA is examined by hybridization to primers. In this technique, an oligonucleotide fabricated onto a solid support covers the SNP and regions 5′ and 3′ of the SNP. Sample DNA is denatured and allowed to hybridize to the oligonucleotide/solid support. The sample DNA/oligonucleotide/solid support is analyzed by eluting the bound sample DNA. When the SNP position in the sample DNA is complementary to the base located at the same position on the fabricated oligonucleotide, the two strands are separated with more difficulty then when there is a mismatch. This technique can be used in conjunction with various labels. Again, however, this technique requires a priori knowledge of the SNP, sequences surrounding the SNP and the synthesis of an oligonucleotide for each SNP.
In addition to SNPs, the DNA of two organisms of the same species will differ in the number and/or position of any modified nitrogenous bases that are present. Such modifications or variations include methylations, oxidations or aminations of the basic nucleotides, including but not limited in any way to the methyladenosines and methylguanosines, 2′-O-methylcytodine, 2′-O-methyluridine, 8-oxoguanine, 8-oxoadenine, fapy-guanine, methy-fapy-guanine, fapy-adenine, aflatoxin B1-fapy-guanine, 5-hydroxy-cystosine and 5-hydroxy-uracil. The number and/or position of such modified nucleotides may be an important factor in assessing disease causing agents, particularly whether a certain chemical reagent is a carcinogen able to cause large numbers of these modified nucleotides in the DNA.
Thus, there is an interest and need in the art for a method of SNP or modified nucleotide (variation) analysis that does not require specialized primers or a priori knowledge of each SNP location. The present invention satisfies this need in the art.