Methods for rapidly sequencing DNA have become needed for analyzing diseases and mutations in the population and developing therapies. The most commonly observed form of human sequence variation is single nucleotide polymorphisms (SNPs), which occur in approximately 1-in-300 to 1-in-1000 base pairs of genomic sequence. Building upon the complete sequence of the human genome, efforts are underway to identify the underlying genetic link to common diseases by SNP mapping or direct association. Technology developments focused on rapid, high-throughput, and low cost DNA sequencing would facilitate the understanding and use of genetic information, such as SNPs, in applied medicine.
In general, 10%-to-15% of SNPs will affect protein function by altering specific amino acid residues, will affect the proper processing of genes by changing splicing mechanisms, or will affect the normal level of expression of the gene or protein by varying regulatory mechanisms. It is envisioned that the identification of informative SNPs will lead to more accurate diagnosis of inherited disease, better prognosis of risk susceptibilities, or identity of sporadic mutations in tissue. One application of an individual's SNP profile would be to significantly delay the onset or progression of disease with prophylactic drug therapies. Moreover, an SNP profile of drug metabolizing genes could be used to prescribe a specific drug regimen to provide safer and more efficacious results. To accomplish these ambitious goals, genome sequencing will move into the resequencing phase with the potential of partial sequencing of a large majority of the population, which would involve sequencing specific regions or single base pairs in parallel, which are distributed throughout the human genome to obtain the SNP profile for a given complex disease.
Sequence variations underlying most common diseases are likely to involve multiple SNPs, which are dispersed throughout associated genes and exist in low frequency. Thus, DNA sequencing technologies that employ strategies for de novo sequencing are more likely to detect and/or discover these rare, widely dispersed variants than technologies targeting only known SNPs.
Traditionally, DNA sequencing has been accomplished by the “Sanger” or “dideoxy” method, which involves the chain termination of DNA synthesis by the incorporation of 2′,3′-dideoxynucleotides (ddNTPs) using DNA polymerase (Sanger, F., Nicklen, S., and Coulson, A. R. (1977) DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74, 5463-5467). The reaction also includes the natural 2′-deoxynucleotides (dNTPs), which extend the DNA chain by DNA synthesis. Balanced appropriately, competition between chain extension and chain termination results in the generation of a set of nested DNA fragments, which are uniformly distributed over thousands of bases and differ in size as base pair increments. Electrophoresis is used to resolve the nested DNA fragments by their respective size. The ratio of dNTP/ddNTP in the sequencing reaction determines the frequency of chain termination, and hence the distribution of lengths of terminated chains. The fragments are then detected via the prior attachment of four different fluorophores to the four bases of DNA (i.e., A, C, G, and T), which fluoresce their respective colors when irradiated with a suitable laser source. Currently, Sanger sequencing has been the most widely used method for discovery of SNPs by direct PCR sequencing (Gibbs, R. A., Nguyen, P.-N., McBride, L. J., Koepf, S. M., and Caskey, C. T. (1989) Identification of mutations leading to the Lesch-Nyhan syndrome by automated direct DNA sequencing of in vitro amplified cDNA. Proc. Natl. Acad. Sci. USA 86, 1919-1923) or genomic sequencing (Hunkapiller, T., Kaiser, R. J., Koop, B. F., and Hood, L. (1991) Large-scale and automated DNA sequencing Determination. Science 254, 59-67; International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. (2001) Nature 409, 860-921).
The need for developing new sequencing technologies has never been greater than today with applications spanning diverse research sectors including comparative genomics and evolution, forensics, epidemiology, and applied medicine for diagnostics and therapeutics. Current sequencing technologies are too expensive, labor intensive, and time consuming for broad application in human sequence variation studies. Genome center cost is calculated on the basis of dollars per 1,000 Q20 bases and can be generally divided into the categories of instrumentation, personnel, reagents and materials, and overhead expenses. Currently, these centers are operating at less than one dollar per 1,000 Q20 bases with at least 50% of the cost resulting from DNA sequencing instrumentation alone. Developments in novel detection methods, miniaturization in instrumentation, microfluidic separation technologies, and an increase in the number of assays per run will most likely have the biggest impact on reducing cost.
It is therefore an object of the invention to provide novel compounds that are useful in efficient sequencing of genomic information in high throughput sequencing reactions.
It is another object of the invention to provide novel reagents and combinations of reagents that can efficiently and affordably provide genomic information.
It is yet another object of the invention to provide libraries and arrays of reagents for diagnostic methods and for developing targeted therapeutics for individuals.