Genome and transcriptome sequencing profiles can provide important insights into cell physiology, medical diagnosis, prognosis and treatment planning. For example, whole genome single nucleotide polymorphism (SNP) profiling may provide an unbiased method of determining genetic predisposing factors for adverse drug reactions (ADRs). Genetic factors can determine individual susceptibility to both dose-dependent and dose-independent ADRs, see, e.g., Pirmohamed Trends Pharmacol. Sci., 22:298-305 (2001). The cancer phenotype reflects changes in the expression patterns of hundreds or even thousands of genes that occur as a mutation of an oncogene or a tumor suppressor gene. Functional genomic approaches such as DNA microarrays or serial analysis of gene expression (SAGE) can help determine the expression level of genes in a cell or an organism's transcriptome, see, e.g., Polyak, J. Clin. Oncol., 19:2948-2958 (2001). However, understanding the meaning of expression patterns of transcriptomes requires knowledge of the sequences of the messages therein, and up to 30% of mammalian messenger RNA consists of thousands of distinct species, see, e.g., Brenner, Proc. Natl. Acad. Sci. USA, 97:1665-1670 (2000). Relatively minor alterations in sequences and expression can have profound impact on cell physiology and organism health.
The majority DNA sequencing methods presently in use are based on the chemical degradation method of Maxam and Gilbert., Proc. Natl. Acad. Sci. USA, 74:560-564 (1977) or the dideoxy chain termination approach of Sanger et al., Proc. Natl. Acad. Sci., 74: 5463-5467 (1977). The chain termination method has been improved in several ways and in various forms is widely used in commercial DNA sequencing instruments, e.g. Hunkapiller et al., Science, 254: 59-67 (1991).
Both the chemical degradation and chain termination methods require the generation of one or more sets of labeled DNA fragments, each having a common origin and each terminating with a known base. The set or sets of fragments are then separated by size to obtain sequence information. The size separation is usually accomplished by high resolution electrophoresis, either gel or non-gel based, which must have the capacity to distinguish very large fragments differing in size by no more than a single nucleotide. Despite many significant improvements, the technique does not readily lend itself to miniaturization or to massively parallel implementation.
As an alternative to the Sanger-based approaches to DNA sequencing, several so-called “base-by-base” or “single base” sequencing approaches have been explored, see, U.S. Pat. No. 5,302,509; International patent applications WO 91/06678 and WO 93/21340; Canard et al, Gene, 148: 1-6 (1994); and Metzker et al., Nuc. Acids Res., 22: 4259-4267 (1994). These approaches are characterized by the determination of a single nucleotide per cycle of chemical or biochemical operations and no requirement of a separation step. Thus, these “single base” approaches allow the possibility of carrying out many thousands of sequencing reactions in parallel, for example, on target polynucleotides attached to microparticles or on solid phase arrays as described, for example in International Patent Application PCT/US95/12678. “Single base” sequencing methods, however, have been hampered by problems such as inefficient chemistries that prevent determination of more than a few nucleotides in a sequencing operation.
Yet another alternative uses make use of the specificity of Watson-Crick base pairing to determine sequence information. In these methods, a polynucleotide is digested with a nuclease to produce a single-stranded region. An oligonucleotide containing a known base is then hybridized to this single stranded-region under stringent hybridization conditions. By repeated cycles of digestion and hybridization, the sequence of the polynucleotide can be determined. Examples, of such methods can be found in U.S. Pat. Nos. 5,552,278; 5,599,675; 5,710,000; 5,714,330; 5,831,065; 6,027,894; 6,013,445; 6,225,077; 6,251,600; 6,258,533; 6,291,181; European Patent Nos. EP 0 927267 and EP 0 703 991; and International Patent Applications WO 98/48047, WO 98/15652 and WO 98/10095. Although, these methods can be adapted for parallel sequencing, for example in arrays, the methods determine only a single nucleotide at a time for each polynucleotide to be sequenced. This slows the sequence determination.
With the need to sequence whole genomes and the increasing reliance on sequence information for diagnostic purposes, what is needed is a method that is capable of simultaneously determining the identity and order of groups of nucleotides on a single polynucleotide and that is suitable for multiplex applications. The present invention meets that need.