1. Field of the Invention
The present invention is directed to a rapid method for determining the sequence of nucleic acid. The method is especially useful for genotyping, and for the detection of one to tens to hundreds to thousands of single nucleotide polymorphisms (SNPs) or mutations on single or on multiple chromosomes, and for the detection of chromosomal abnormalities, such as truncations, transversions, trisomies, and monosomies.
2. Background
Sequence variation among individuals comprises a continuum from deleterious disease mutations to neutral polymorphisms. There are more than three thousand genetic diseases currently known including Duchenne Muscular Dystrophy, Alzheimer's Disease, Cystic Fibrosis, and Huntington's Disease (D. N. Cooper and M. Krawczak, “Human Genome Mutations,” BIOS Scientific Publishers, Oxford (1993)). Also, particular DNA sequences may predispose individuals to a variety of diseases such as obesity, arteriosclerosis, and various types of cancer, including breast, prostate, and colon. In addition, chromosomal abnormalities, such as trisomy 21, which results in Down's Syndrome, trisomy 18, which results in Edward's Syndrome, trisomy 13, which results in Patau Syndrome, monosomy X, which results in Turner's Syndrome, and other sex aneuploidies, account for a significant portion of the genetic defects in liveborn human beings. Knowledge of gene mutations, chromosomal abnormalities, and variations in gene sequences, such as single nucleotide polymorphisms (SNPs), will help to understand, diagnose, prevent, and treat diseases.
Most frequently, sequence variation is seen in differences in the lengths of repeated sequence elements, such as minisatellites and microsatellites, as small insertions or deletions, and as substitutions of the individual bases. Single nucleotide polymorphisms (SNPs) represent the most common form of sequence variation; three million common SNPs with a population frequency of over 5% have been estimated to be present in the human genome. Small deletions or insertions, which usually cause frameshift mutations, occur on average, once in every 12 kilobases of genomic DNA (Wang, D. G. et al., Science 280: 1077–1082 (1998)). A genetic map using these polymorphisms as a guide is being developed (http://research.marshfieldclinic.org/genetics/; internet address as of Jan. 10, 2002).
The nucleic acid sequence of the human genome was published in February, 2001, and provides a genetic map of unprecedented resolution, containing several hundred thousand SNP markers, and a potential wealth of information on human diseases (Venter et al., Science 291:1304–1351 (2001); International Human Genome Sequencing Consortium, Nature 409:860–921 (2001)). However, the length of DNA contained within the human chromosomes totals over 3 billion base pairs so sequencing the genome of every individual is impractical. Thus, it is imperative to develop high throughput methods for rapidly determining the presence of allelic variants of SNPs and point mutations, which predispose to or cause disease phenotypes. Efficient methods to characterize functional polymorphisms that affect an individual's physiology, psychology, audiology, opthamology, neurology, response to drugs, drug metabolism, and drug interactions also are needed.
Several techniques are widely used for analyzing and detecting genetic variations, such as DNA sequencing, restriction fragment length polymorphisms (RFLP), DNA hybridization assays, including DNA microarrays and peptide nucleic acid analysis, and the Protein Truncation Test (PTT), all of which have limitations. Although DNA sequencing is the most definitive method, it is also the most time consuming and expensive. Often, the entire coding sequence of a gene is analyzed even though only a small fraction of the coding sequence is of interest. In most instances, a limited number of mutations in any particular gene account for the majority of the disease phenotypes.
For example, the cystic fibrosis transmembrane conductance regulator (CFTR) gene is composed of 24 exons spanning over 250,000 base pairs (Rommens et al., Science 245:1059–1065 (1989); Riordan et al., Science 245:1066–73 (1989)). Currently, there are approximately 200 mutations in the CFTR gene that are associated with a disease state of Cystic Fibrosis. Therefore, only a very small percentage of the reading frame for the CFTR gene needs to be analyzed. Furthermore, a total of 10 mutations make up 75.1% of all known disease cases. The deletion of a single phenylalanine residue, F508, accounts for 66% of all Cystic Fibrosis cases in Caucasians.
Hybridization techniques, including Southern Blots, Slot Blots, Dot Blots, and DNA microarrays, are commonly used to detect genetic variations (Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory Press, Third Edition (2001). In a typical hybridization assay, an unknown nucleotide sequence (“the target”) is analyzed based on its affinity for another fragment with a known nucleotide sequence (“the probe”). If the two fragments hybridize under “stringent conditions,” the sequences are thought to be complementary, and the sequence of the target fragment may be inferred from “the probe” sequence.
However, the results from a typical hybridization assay often are difficult to interpret. The absence or presence of a hybridization signal is dependent upon the definition of “stringent conditions.” Any number of variables may be used to raise or lower stringency conditions such as salt concentration, the presence or absence of competitor nucleotide fragments, the number of washes performed to remove non-specific binding and the time and temperature at which the hybridizations are performed. Commonly, hybridization conditions must be optimized for each “target” nucleotide fragment, which is time-consuming, and inconsistent with a high throughput method. A high degree of variability is often seen in hybridization assays, as well as a high proportion of false positives. Typically, hybridization assays function as a screen for likely candidates but a positive confirmation requires DNA sequencing analysis.
Several techniques for the detection of mutations have evolved based on the principal of hybridization analysis. For example, in the primer extension assay, the DNA region spanning the nucleotide of interest is amplified by PCR, or any other suitable amplification technique. After amplification, a primer is hybridized to a target nucleic acid sequence, wherein the last nucleotide of the 3′ end of the primer anneals immediately 5′ to the nucleotide position on the target sequence that is to be analyzed. The annealed primer is extended by a single, labeled nucleotide triphosphate. The incorporated nucleotide is then detected.
There are several limitations to the primer extension assay. First, the region of interest must be amplified prior to primer extension, which increases the time and expense of the assay. Second, PCR primers and dNTPs must be completely removed before primer extension, and residual contaminants can interfere with the proper analysis of the results. Third, and the most restrictive aspect of the assay, is that the primer is hybridized to the DNA template, which requires optimization of conditions for each primer, and for each sequence that is analyzed. Hybridization assays have a low degree of reproducibility, and a high degree of non-specificity.
The Peptide Nucleic Acid (PNA) affinity assay is a derivative of traditional hybridization assays (Nielsen et al., Science 254:1497–1500 (1991); Egholm et al., J. Am. Chem. Soc. 114:1895–1897 (1992); James et al., Protein Science 3:1347–1350 (1994)). PNAs are structural DNA mimics that follow Watson-Crick base pairing rules, and are used in standard DNA hybridization assays. PNAs display greater specificity in hybridization assays because a PNA/DNA mismatch is more destabilizing than a DNA/DNA mismatch and complementary PNA/DNA strands form stronger bonds than complementary DNA/DNA strands. However, genetic analysis using PNAs still requires a laborious hybridization step, and as such, is subject to a high degree of non-specificity and difficulty with reproducibility.
Recently, DNA microarrays have been developed to detect genetic variations and polymorphisms (Taton et al., Science 289:1757–60, 2000; Lockhart et al., Nature 405:827–836 (2000); Gerhold et al., Trends in Biochemical Sciences 24:168–73 (1999); Wallace, R. W., Molecular Medicine Today 3:384–89 (1997); Blanchard and Hood, Nature Biotechnology 149:1649 (1996)). DNA microarrays are fabricated by high-speed robotics, on glass or nylon substrates, and contain DNA fragments with known identities (“the probe”). The microarrays are used for matching known and unknown DNA fragments (“the target”) based on traditional base-pairing rules. The advantage of DNA microarrays is that one DNA chip may provide information on thousands of genes simultaneously. However, DNA microarrays are still based on the principle of hybridization, and as such, are subject to the disadvantages discussed above.
The Protein Truncation Test (PTT) is also commonly used to detect genetic polymorphisms (Roest et al., Human Molecular Genetics 2:1719–1721, (1993); Van Der Luit et al., Genomics 20:1–4 (1994); Hogervorst et al., Nature Genetics 10: 208–212 (1995)). Typically, in the PTT, the gene of interest is PCR amplified, subjected to in vitro transcription/translation, purified, and analyzed by polyacrylamide gel electrophoresis. The PTT is useful for screening large portions of coding sequence and detecting mutations that produce stop codons, which significantly diminish the size of the expected protein. However, the PTT is not designed to detect mutations that do not significantly alter the size of the protein.
Thus, a need still exists for a rapid method of analyzing DNA, especially genomic DNA suspected of having one or more single nucleotide polymorphisms or mutations.