The most commonly used methods of nucleic acid sequencing are the dideoxy-mediated chain termination method, also known as the "Sanger Method" (Sanger et al., J Molec. Biol. 94:441 (1975); see also Prober et al., Science 238:336-340 (1987), both herein incorporated by reference in their entirety) and the "chemical degradation method," also known as the "Maxam-Gilbert method" (Maxam et al., Proc. Natl. Acad. Sci.(U.S.A.) 74:560 (1977), herein incorporated by reference in its entirety). Such methods are disclosed in Maniatis et al., Molecular Cloning, a Laboratory Manual, 2nd Edition, Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (1989); Zyskind et al., Recombinant DNA Laboratory Manual, Academic Press, Inc., N.Y. (1988), both herein incorporated by reference in their entirety.
Both the dideoxy-mediated method and the Maxam-Gilbert method of DNA sequencing require the prior isolation of the DNA molecule that is to be sequenced. The sequence information is obtained by subjecting the reaction products to electrophoretic analysis (typically using polyacrylamide gels). Thus, a sample is applied to a lane of a gel, and the various species of nested fragments are separated from one another by their migration velocity through the gel.
In response to the difficulties encountered in employing gel electrophoresis to analyze sequences, several alternative methods have been developed. In one such method, a solid phase array of nucleic acid molecules is employed. The array consists of combinatorial (i.e., random or pseudo-random) nucleic acid molecules. Chetverin et al. provides a general review of solid-phase oligonucleotide synthesis and hybridization techniques (Chetverin et al., Bio/Technology 12:1093-1099 (1994), herein incorporated by reference in its entirety).
Macevicz, for example, describes a method for determining nucleic acid sequence via hybridization with multiple mixtures of oligonucleotide probes. In accordance with this method, the sequence of a target polynucleotide is determined by permitting the target to sequentially hybridize with sets of probes having an invariant nucleotide at one position, and a variant nucleotides at other positions (U.S. Pat. No. 5,002,867, herein incorporated by reference in its entirety). The Macevicz method determines the nucleotide sequence of the target by hybridizing the target with a set of probes, and then determining the number of sites that at least one member of the set is capable of hybridizing to the target (i.e., the number of "matches"). This procedure is repeated until each member of sets of probes has been tested.
Beattie et al, have described a protocol for the preparation of terminal amine-derivatized 9-mer oligonucleotide arrays on ordinary microscope slides (Beattie et al., Molec. Biotech. 4:213-225 (1995), herein incorporated by reference in its entirety). These oligonucleotide arrays can hybridize DNA target strands of up to several hundred bases in length and can discriminate against mismatches.
Drmanac has described a method for sequencing nucleic acid by hybridization using nucleic acid segments on different sectors of a substrate and probes which discriminate between a one base mismatch (Drmanac EP 797683, herein incorporated by reference in its entirety). Gruber describes a method for screening a sample for the presence of an unknown sequence using hybridization sequencing (Gruber, EP 787183, herein incorporated by reference in its entirety).
In contrast to the "Sanger Method" and the "Maxam-Gilbert method," which identify the entire sequence of nucleotides of a target polynucleotide, "microsequencing" methods determine the identity of only a single nucleotide at a "predetermined" site. Such methods have particular utility in determining the presence and identity of polymorphisms in a target polynucleotide.
Because single nucleotide polymorphisms constitute sites of variation flanked by regions of invariant sequence, their analysis requires no more than the determination of the identity of the single nucleotide present at the site of variation; it is unnecessary to determine a complete gene sequence for each patient. Several methods have been developed to facilitate the analysis of such single nucleotide polymorphisms.
The GBA.TM. Genetic Bit Analysis method disclosed by Goelet et al. (WO 92/15712, herein incorporated by reference in its entirety) is a particularly useful microsequencing method. In GBA.TM., the nucleotide sequence information surrounding a predetermined site of interrogation is used to design an oligonucleotide primer that is complementary to the region immediately adjacent to, but not including, the predetermined site. The target DNA template is selected from the biological sample and hybridized to the interrogating primer. This primer is extended by a single labeled dideoxynucleotide using DNA polymerase in the presence of at least two, and most preferably all four chain terminating nucleoside triphosphate precursors.
Mundy (U.S. Pat. No. 4,656,127, herein incorporated by reference in its entirety) discusses alternative microsequencing methods for determining the identity of the nucleotide present at a particular polymorphic site. Mundy's method employs a specialized exonuclease-resistant nucleotide derivative. A primer complementary to the allelic sequence immediately 3'- to the polymorphic site is permitted to hybridize to a target molecule obtained from a particular animal or human. If the polymorphic site on the target molecule contains a nucleotide that is complementary to the particular exonucleotide-resistant nucleotide derivative present, then that derivative will be incorporated by a polymerase onto the end of the hybridized primer. Such incorporation renders the primer resistant to exonuclease, and thereby permits its detection. Since the identity of the exonucleotide-resistant derivative of the sample is known, a finding that the primer has become resistant to exonucleases reveals that the nucleotide present in the polymorphic site of the target molecule was complementary to that of the nucleotide derivative used in the reaction. Mundy's method has the advantage that it does not require the determination of large amounts of extraneous sequence data. It has the disadvantages of destroying the amplified target sequences, and unmodified primer and of being extremely sensitive to the rate of polymerase incorporation of the specific exonuclease-resistant nucleotide being used.
Cohen et al. (French Patent 2,650,840; PCT Appln. No. W091/02087, both of which are herein incorporated by reference in their entirety) discuss a solution-based method for determining the identity of the nucleotide of a polymorphic site. As in the Mundy method of U.S. Pat. No. 4,656,127, a primer is employed that is complementary to allelic sequences immediately 3'-to a polymorphic site. The method determines the identity of the nucleotide of that site using labeled dideoxynucleotide derivatives, which, if complementary to the nucleotide of the polymorphic site will become incorporated onto the terminus of the primer.
In contrast to the method of Cohen et al. (French Patent 2,650,840; PCT Appln. No. W091/02087), the GBA.TM. method of Goelet et al. can be conducted as a heterogeneous phase assay, in which the primer or the target molecule is immobilized to a solid phase. It is thus easier to perform, and more accurate than the method discussed by Cohen. The method of Cohen has the significant disadvantage of being a solution-based extension method that uses labeled dideoxynucleoside triphosphates. In the Cohen method, the target DNA template is usually prepared by a DNA amplification reaction, such as PCR, that uses a high concentration of deoxynucleoside triphosphates, the natural substrates of DNA polymerases. These monomers will compete in the subsequent extension reaction with the dideoxynucleoside triphosphates. Therefore, following the PCR reaction, an additional purification step is required to separate the DNA template from the unincorporated dNTPs. Because it is a solution-based method, the unincorporated dNTPs are difficult to remove and the method is not suited for high volume testing.
Cheesman (U.S. Pat. No. 5,302,509, herein incorporated by reference in its entirety) describes a method for sequencing a single stranded DNA molecule using fluorescently labeled 3'-blocked nucleotide triphosphates. An apparatus for the separation, concentration and detection of a DNA molecule in a liquid sample has been described by Ritterband et al. (PCT Patent Application No. W095/17676, herein incorporated by reference in its entirety). Dower et al. (U.S. Pat. No. 5,547,839, herein incorporated by reference in its entirety) describes a filter based detection system for the simultaneous parallel sequencing of an immobilized primer using fluorescent labels.
The delayed extraction PinPoint MALDI-TOF mass spectrometry method is a method for determining the identity of the incorporated non-extendible nucleotide by measuring the change in mass of the extended primer (Haff et al., Genome Methods 7:378-388 (1997), the entirety of which is herein incorporated by reference).
Chee et al. (WO95/11995, herein incorporated by reference in its entirety) describes an array of primers immobilized onto a solid surface. Chee et al. further describes a method for determining the presence of a mutation in a target sequence by comparing against a reference sequence with a known sequence.
Several primer-guided nucleotide incorporation procedures for assaying polymorphic sites in DNA have been described (Komher et al., Nucl. Acids. Res. 1 7:7779-7784 (1989); Sokolov, Nucl. Acids Res. 18:3671 (1990); Syvanen et al., Genomics 8:684-692 (1990); Kuppuswamy et al., Proc. Natl. Acad. Sci. (U.S.A.) 88:1143-1147 (1991); Prezant et al., Hum. Mutat. 1:159-164 (1992); Ugozzoli et al., GATA 9:107-112 (1992); Nyren et al., Anal. Biochem. 208:171-175 (1993); and Wallace, W089/10414, all of which are herein incorporated by reference in their entirety). These methods differ from GBA.TM. in that they all rely on the incorporation of labeled deoxynucleotides to discriminate between bases at a polymorphic site. In such a format, since the signal is proportional to the number of deoxynucleotides incorporated, polymorphisms that occur in runs of the same nucleotide can result in signals that are proportional to the length of the run (Syvanen et al., Amer. J. Hum. Genet. 52:46-59 (1993), herein incorporated by reference in its entirety). Such a range of locus-specific signals could be more complex to interpret, especially for heterozygotes, compared to the simple, ternary (2:0, 1:1, or 0:2) class of signals produced by the GBA.TM. method. In addition, for some loci, incorporation of an incorrect deoxynucleotide can occur even in the presence of the correct dideoxynucleotide (Komher et al., Nucl. Acids. Res. 1 7:7779-7784 (1989)). Such deoxynucleotide misincorporation events may be due to the Km of the DNA polymerase for the mispaired deoxy-substrate being comparable, in some sequence contexts, to the relatively poor Km of even a correctly base paired dideoxy-substrate (Kornberg, A. et al., In: DNA Replication, Second Edition (1992), W. H. Freeman and Company, N.Y.; Tabor, S. et al., Proc. Natl. Acad. Sci. (U.S.A.) 86:4076-4080 (1989), both of which are herein incorporated by reference in their entirety). This effect would contribute to the background noise in the polymorphic site interrogation.
An alternative microsequencing approach, the "Oligonucleotide Ligation Assay" ("OLA") (Landegren et al., Science 241:1077-1080 (1988), herein incorporated by reference in its entirety) has also been described as being capable of detecting single nucleotide polymorphisms. The OLA protocol uses two oligonucleotides which are designed to be capable of hybridizing to abutting sequences of a single strand of a target. One of the oligonucleotides is biotinylated, and the other is detectably labeled. If the precise complementary sequence is found in a target molecule, the oligonucleotides will hybridize such that their termini abut, and create a ligation substrate. Ligation then permits the labeled oligonucleotide to be recovered using avidin, or another biotin ligand. Nickerson et al. have described a nucleic acid detection assay that combines attributes of PCR and OLA (Nickerson et al., Proc. Natl. Acad. Sci. (U.S.A.) 87:8923-8927 (1990), herein incorporated by reference in its entirety). In this method, PCR is used to achieve the exponential amplification of target DNA, which is then detected using OLA. In addition to requiring multiple, and separate processing steps, one problem associated with such combinations is that they inherit all of the problems associated with PCR and OLA.
Boyce-Jacino et al have described a method for sequencing a polynucleotide using nested GBA (U.S. patent application Ser. No. 08/616,906, herein incorporated by reference in its entirety). In that method, an array of nested primer oligonucleotides is immobilized to a solid support. A target nucleic molecule is hybridized to the array of nested primer oligonucleotides and the hybridized array is sequenced using GBA.
Pastinen et al. describe a method for the multiplex detection of mutations wherein the mutations are detected by extending immobilized primers, that anneal to the template sequences immediately adjacent to the mutant nucleotide positions, with a single labeled dideoxynucleotide using a DNA polymerase (Pastinen et al., Genome Res. 7:606-614 (1997), herein incorporated by reference in its entirety). In this method, the oligonucleotide arrays were prepared by coupling one primer per mutation to be detected on a small glass area. Pastinen et al. have also described a method to detect multiple single nucleotide polymorphisms in an undivided sample (Pastinen et al, Clin. Chem. 42:13191-1397 (1996), herein incorporated by reference in its entirety). According to this method, the amplified DNA templates are first captured onto a manifold and then, with multiple minisequencing primers, single nucleotide extension reactions are carried out simultaneously with fluorescently labeled dideoxynucleotides.
Jalanko et al applied the solid-phase minisequencing method to the detection of a mutation causing cystic fibrosis (Jalanko et al., Clin. Chem. 38:39-43 (1992), herein incorporated by reference in its entirety). In the method of Jalanko et al., an amplified DNA molecule which is biotinylated at the 5' terminus is bound to a solid phase and denatured. A detection primer, which hybridizes immediately before the mutation, is hybridized to the immobilized single stranded template and elongated with a single, labeled deoxynucleoside residue. Shumaker et al. describes another solid phase primer extension method for mutation detection (Shumaker et al., Hum. Mutation 7:346-354 (1996), herein incorporated by reference in its entirety). In this method, the template DNA was annealed to an oligonucleotide array, extended with .sup.32 P dNTPs and analyzed with a phosphoimager. The grid position of the oligonucleotide identified the mutation site and the extended base identified the mutation.
Caskey et al. describe a method of analyzing a polynucleotide of interest using one or more sets of consecutive oligonucleotide primers differing within each set by one base at the growing end thereof (Caskey et al., WO 95/00669, herein incorporated by reference in its entirety). The oligonucleotide primers are extended with a chain terminating nucleotide and the identity of each terminating nucleotide is determined.
In conventional fluorescent-based sequencing applications, the predominate method of base calling involves the use of four dye label terminators that have different emission spectra (as used herein, base calling refers to identifying the identity of the nucleotide base). One such application employs laser excitation and a cooled CCD (charged coupled device) detector (Kostichka and Smith, U.S. Pat. No. 5,162,654, herein incorporated by reference in its entirety) for the parallel detection of four fluorescently labeled DNA sequencing reactions during their electrophoretic separation in ultrathin (50-100 microns) denaturing polyacrylamide gels (Kostichka et al., Bio/Technology 10:78-81 (1992), herein incorporated by reference in its entirety).
Weiss et al describes another fluorescent-based sequencing application (U.S. Pat. No. 5,470,710, herein incorporated by reference in its entirety). That method is an enzyme linked fluorescence method for the detection of nucleic acid molecules.
In these applications, spectral recognition of different dyes is primarily accomplished by capturing fluorescence emissions in specific spectral regions using one or more excitation wavelengths. One problem with this approach is the "cross-talk" of different dyes due to the relatively large width of dye spectra. Spectral cross-talk is one source of false recognition of dyes, resulting in base miscalling in fluorescent-based DNA sequence analysis (as used herein, the term miscalling refers to an error in identifying the identity of the nucleotide base). The miscalling rate depends primarily on the signal-to-noise ratio (SNR) and the detection system's spectral selectivity. Basically, the miscalling rate increases when the SNR decreases. Therefore, if the spectrally recognized emission is weak, the spectral selectivity of the instrument will have to be improved to lower the miscalling rate.
It has been reported that the spectral cross-talk in macroscale, gel-based DNA fluorescent sequencing has been resolved by improved dye-terminator biochemistry, optimization of filter transmission spectra and software manipulation using an instrument of relatively low spectral selectivity (Yager et al., Curr. Opinion Biotechnol. 8:107-113 (1997), herein incorporated by reference in its entirety). For example, the ABI gel sequencer (ABI, Applied Biosystems, Inc., Foster City, Calif.) has the capability of generating an acceptable base calling error rate of 2% (ABI Prism) using a single excitation wavelength and filter-based detection optics. In the case of microarray fluorescent detection, the spectral cross-talk problem is more difficult to overcome due to the significantly smaller size of the reaction spots, which require high spatial resolution power and generate very limited numbers of detectable fluorescence photons. This miniaturization/detection problem is well-known in the field of DNA sequencing by microcapillary electrophoresis. Several methods, including the use of two excitation wavelengths (Li and Yeung, Applied Spectroscopy 49:1528-1533 (1995), herein incorporated by reference in its entirety) and multi-wavelength (complete spectrum) fluorescence detection (Karger et al., Nucleic Acids Res. 19:4955-4962 (1991), herein incorporated by reference in its entirety) have been developed to improve the spectral selectivity and identification in microcapillary multi-color sequencing.
Specific dye/base recognition on the microchip platform, which is considered to be a two-dimensional platform, is reported to be more complicated than in microcapillary methods, which are considered to be a one-dimensional platform. The two-dimensional nature of the microarray provides advantages in processing through-put due to the parallelism. However, it also requires a detection method that is compatible with its two-dimensional platform, in order for the through-put potential to be realized. There have been at least two microarray fluorescent detectors, including the "genescanner" from Hewlett Packard (Santa Clara, Calif.)(Taylor et al., J. Med. Genet. 31: 937-94 (1994), herein incorporated by reference in its entirety) and a confocal scanner (General Scanning, Inc., Boston), developed using filter-based confocal optics configuration, a one-dimensional (1-D) detector (the photomultiplier tube, PMT) and narrow bandpass interference filters to obtain sensitive detection and spectral identification of array emissions. The confocal configuration is used in these fluorescent detection instruments to obtain high spatial resolution and to reduce background emission by confining the detection volume (Sandison and Webb, Applied Optics 33:603-615 (1994), herein incorporated by reference in its entirety).
The disadvantages of confocal microarray scanners include: 1) low through-put caused by the necessity of sequential, point-by-point scanning of the microarray, 2) use of moving optical-mechanical parts for scanning, 3) use of expensive qualitative focusing/collection optics, 4) high power excitation requirement due to the significant loss of collected emissions in the confocal pinhole, and 5) repeated scanning required for multi-color detection. Therefore, although confocal filter-based microchip scanners can be potentially used for spectral recognition of array emissions, they are inherently expensive and low through-put. In addition, for multi-color detection, photobleaching of the dyes under powerful laser excitation during repeated scans may further complicate the spectral analysis.
Four-color confocal fluorescence capillary array scanner sequencing apparatuses have been described (Mathies et al., U.S. Pat. No. 5,274,240; Kheterpal et al., Electrophoresis 17:1852-1859 (1996), both of which are herein incorporated by reference in their entirety). Kheterpal's array scanner utilizes a single laser wavelength of 488 nm to collect data from up to 25 capillaries in parallel. A capillary electrophoresis apparatus for the detection of nucleic acid sequences which employs a He-Ne laser has been described (Kambara, U.S. Pat. No. 5,667,656, herein incorporated by reference in its entirety). Ulmer describes another capillary sequencing apparatus employing fluorescently labeled bases and a laser detector (Ulmer, U.S. Pat. No. 5,674,743, herein incorporated by reference in its entirety).
Ives et al. describe a method for the detection of fluor-labeled (fluorescein, eosin, tetramethyl-rhodamine, Lissamine and Texas Red) dideoxynucleotides using a commercially available plate reader (Cytofluor II) (Ives et al., SPIE Proceeding 2680:258-269 (1996), herein incorporated by reference in its entirety). Ives et al also disclose an experimental optical setup to detect fluorescence from fluor-labeled GBA.TM. dideoxynucleotides which uses excitation light from an air cooled argon laser at 488 nm with collection optics consisting of a spherical collection lens, Schott filters, fiber optic collection (collectively a filter-based optics configuration), an imaging spectrometer and a 0.degree. C. thermoelectrically-cooled CCD camera. In addition, the optical system is used to detect the fluorescence emitting from a single reaction spot on a microchip.
Bogdanov et al. disclose the fluorescent imaging and quantification of solid support-bound nucleic acids (Bogdanov et al., SPIE Proceeding 2985:129-137 (1997), herein incorporated by reference in its entirety). The Bogdanov et al. reference discloses direct multicolor fluorescent imaging of a GBA.TM. array (GBA.TM. microchip) on a solid-state support with low background emission (glass microscope slide) for simultaneous (CCD camera) and sequential (commercial FluorImagers) reaction spots reading at excitation of various lasers. Bogdanov et al. employ a filter-based confocal optics configuration. Two-color fluorescent images of oligonucleotides labeled by fluorescein and CY3 were reported, as was CCD-based imaging of a direct multispot GBA.TM. image extension detection reaction using fluorescein-labeled ddATP.
Multi-color fluorescent detection has been used in macroscale gel-based sequencing. The present invention provides a microscale sequencing technique and apparatus with significant advantages over other solid-phase sequencing techniques and apparatuses. These advantages include simplification of sample and reagent processing, rapid and sensitive detection, as well as compatibility with high through-put processing. Through strategic combinations of a highly sensitive CCD detector with parallel image spectrometry, hyperspectral imaging detection on SPS microarrays has provided for a low-cost sequence analysis technology.