The analysis of nucleic acid molecules at the genome level is an extremely complex endeavor which requires accurate, rapid characterization of large numbers of often very large nucleic acid molecules via high throughput DNA mapping and sequencing. The construction of physical maps, and ultimately of nucleotide sequences, for eukaryotic chromosomes currently remains laborious and difficult. This is due, in part, to the fact that current procedures for mapping and sequencing DNA were originally designed to analyze nucleic acids at the gene, rather than at the genome, level (Chumakov, et al., 1992, Nature 359:380; Maier, et al., 1992, Nat. Genet. 1:273).
DNA Sequencing
Approaches to DNA sequencing have varied widely, and have made it possible to sequence entire genomes, including portions of the human genome. The most commonly used method has been the dideoxy chain termination method of Sanger (1977, Proc. Natl. Acad. Sci. USA 74:5463). However, this method is time-consuming, labor-intensive and expensive, requiring the analysis of four sets of radioactively labeled DNA fragments resolved by gel electrophoresis to determine the DNA sequence.
To overcome some of these deficiencies, automated DNA sequencing systems were developed which used four fluorescently labeled dideoxy nucleotides to label DNA (Smith et al., 1985, Nucleic Acids Res. 13:2399-2412; Smith et al., 1986, Nature 321:674; Prober et al., 1987, Science 238:336-341, which are incorporated herein by reference). Automated slab gel electrophoresis systems enable large-scale sequence acquisition (Roach et al., 1995, Genomics 26:345-353; Venter et al., 1996, Nature 381:364-366; Profer et al., 1987, Science 238:336-341; Lake et al., 1996, Science 273:1058; Strathmann et al., 1991, Proc. Natl. Acad. Sci. USA 88:1247-1250; and the complete genomic sequence of Saccharomyces cerevisiae in the Stanford database). Current large-scale sequencing is largely the domain of centers where costly and complex support systems are essential for the production efforts. Efforts to deal with sequence acquisition from a large population (usually less than 1,000) is limited to relatively small numbers of loci (Davies et al., 1995, Nature 371:130-136). However, these methods are still dependent on Sanger sequencing reactions and gel electrophoresis to generate ladders and robotic sample handling procedures to deal with the attending numbers of clones and polymerase chain reacting products.
Some recently developed methods and devices for automated sequencing of bulk DNA samples that utilize fluorescently labeled nucleotides are described in U.S. Pat. No. 5,674,743; International Application Nos. PCT/GB93/00848 published Apr. 22, 1993 as WO 93/21340; PCT/US96/08633 published Jun. 4, 1996 as WO 96/39417; and PCT/US94/01156 published Jan. 31, 1994 as WO 94/18218. None of the recently developed methods is capable of sequencing individual nucleic acid molecules.
Techniques for sequencing large genomes of DNA have relied upon the construction of Yeast Artificial Chromosomes ("YAC") contiguous sequences. Preliminary physical maps of a large fraction of the human genome have been generated via YACs (Cohen et al., 1993, Nature 366:698-701). However, extensive high resolution maps of YACs have not been widely generated, due to the high frequency of rearrangement/chimerism among YACs, the low complexity of fingerprints generated by hybridization approaches, and the extensive labor required to overcome these problems. Ordered maps of YACs have been optically made by using a spermine condensation method (to avoid shearing the DNA) and fixing the clones in molten agarose onto derivatized glass surfaces (Cai et al., 1995, Natl. Acad. Sci. USA 92:5164-5168).
There have been several proposals for the rapid attainment of sequence data from clones that minimize or obviate the need for shotgun sequencing approaches or subcloning of large insert clones (Smith et al., 1994, Nature Genet. 7:40-47; Kupfer et al., 1995, Genomics 27:90-100; Chen et al., 1993, Genomics 17:651-656 and Roach et al., 1995, Genomics 26:345-353). Several of these approaches advocate the generation of "sequence sampled maps" (Smith et al., 1994, Nature Genet. 7:40-47 and Venter et al., 1996, Nature 381:364-366) which require fingerprinting of clones, or large numbers of subclones, to achieve good target coverage while simultaneously generating a fine-scale map.
A recent development has been the proposal of DNA sequencing of aligned and oriented Bacterial Artificial Chromosomes ("BAC") contiguous sequences (Venter et al., 1996, Nature 381:364-366); (see also Smith et al., 1994, Nature Genetics 7:40-47; Kupfer et al., 1995, Genomics 27:90-100; and Chen et al., 1993, Genomics 17:651-656). BACs offer the advantage of considerably greater stability than YACs, are more easily physically managed due to their smaller size (.about.500 kb to 2 Mb versus .about.100 to 200 kb, respectively), and are more compatible with automated DNA purification procedures (Kim et al., 1996, Proc. Natl. Acad. Sci. USA 93:6297-6301; Kim et al., 1994, Genomics 24:527-534; and Schmitt et al., 1996, Genomics 33:9-20). Further approaches for the optical analysis of BAC clones were also developed (Cai et al., 1998, Proc. Natl. Acad. Sci. USA 95:3390-3395).
Limitations of these approaches described above include low throughput, DNA fragmentation (preventing subsequent or simultaneous multimethod analyses), and difficulties in automation. Despite the potential utilities of these and other approaches, it is increasingly clear that current molecular approaches were developed primarily for characterization of single genes, not entire genomes, and are, therefore, not optimally suited to the analysis of polygenic diseases and complex traits, especially on a population-wide basis (Risch et al., 1996, Science 273:1516-1517).
Visualization and Surface Mounting of Single DNA Molecules
Single molecule approaches represent a subset of current physical and genetic mapping approaches constitute the two major approaches to genomic analysis, and are critical to mapping and cloning of disease genes and to direct sequencing efforts. Such methods of visualization of single DNA molecules include fluorescence microscopy in solution (Yanagida et al., 1986, in Applications of fluorescence in the biomedical sciences Taylor et al. (eds), Alan Liss, New York, pp 321-345; Yanagida et al., 1983, Cold Spring Harbor Symp. Quantit. Biol. 47:177; Matsumoto et al., 1981, J. Mol. Biol. 132:501-516; Schwartz et al., 1989, Nature 338:520-522; and Houseal et al., 1989, Biophys. J. 56:507-516); FISH (Manuelidis et al., 1982, J. Cell. Biol. 95:619; Lawrence et al., 1988, Cell 52:51; Lichter et al., 1990, Science 247:64; Heng et al., 1992, Proc. Natl. Acad. Sci. USA 89:9509; van den Engh et al., 1992, Science 257:1410); visualization by scanning tunneling microscopy or atomic force microscopy techniques (Keller et al., 1989, Proc. Natl. Acad. Sci. USA 86:5356-5360; see, e.g., Karrasch et al., 1993, Biophysical J. 65:2437-2446; Hansma et al., 1993, Nucleic Acids Research 21:505-512; Bustamante et al., 1992, Biochemistry 31:22-26; Lyubchenko et al., 1992, J. Biomol. Struct. and Dyn. 10:589-606; Allison et al., 1992, Proc. Natl. Acad. Sci. USA 89:10129-10133; Zenhausern et al., 1992, J. Struct. Biol. 108:69-73); visualization of circular DNA molecules (Bustamante et al., 1992, Biochemistry 31:22-26); DNA bending in transcription complexes by scanning force microscopy (Rees et al., 1993, Science 260:1646-1649); direct mechanical measurement of the elasticity of single DNA molecules using magnetic beads (Smith et al., 1992, Science 258:1122-1126); alignment and detection of DNA molecules involving either elongation of end-tethered surface bound molecules by a receding air-water interface (U.S. Pat. Nos. 5,079,169; 5,380,833; Perkins et al., 1994, Science 264:819; and Bensimon et al., 1994, Science 265:2096-2098), and elongation of non-tethered molecules by `fluid fixation` (Samad et al., 1995, Nature 378:516-517; Cai et al., 1995, Proc. Natl. Acad. Sci. USA 92:5164-5168; Meng et al., 1995, Nature Genet. 9:432-438; Wang et al., 1995, Proc. Natl. Acad. Sci. USA 92:165-169; and Schwartz et al., 1993, and Science 262:110-114); (See also Reed et al., "A Quantitative Study Of Optical Mapping Surfaces By Atomic Force Microscopy And Restriction Endonuclease Digestion" in press, Analytical Biochemistry; Cai et al., "High Resolution Restriction Maps Of Bacterial Artificial Chromosomes Constructed By Optical Mapping", 1998, Proc. Natl. Acad. Sci. USA 95:3390-3395; Samad and Schwartz, "Genomic Analysis by Optical Mapping" in Analytical Biotechnology--Genomic Analysis in press; Schwartz et al., 1997, Current Opinion in Biotechnology, 8:70-74; Samad, 1995, Genomics Research 59:1-4; and Primrose, 1995, Principles of Genome Analysis: A guide to mapping and sequencing DNA from different organisms, Blackwell Science Ltd., Oxford England, pp. 76-77; and Bautsch et al., 1997 "Long-Range Restriction Mapping of Genomic DNA" in Genomic Mapping: A Practical Approach, Chapter 12, Paul H. Dear ed., Oxford University Press, New York, pp. 281-313).
New modes of molecular investigation have emerged from advances in molecular fixation techniques, labeling, and the development of scanning probe microscopies (Keller et al., 1989, Proc. Natl. Acad. Sci. USA 86:5356-5360; Bensimon et al., 1994, Science 265:2096-2098; Guthold et al., 1994, Proc. Natl. Acad. Sci. USA, 91:12927-12931; Hansma et al., 1996, Nucleic Acids Res. 24:713-720; Cai et al., 1995, Proc. Natl. Acad. Sci. USA 92:5164-5168; Meng et al., 1995, Nature Genet. 9:432-438; Weier et al., 1995, Hum. Mol. Genet. 4:1903-1910; Wang et al., 1995, Proc. Natl. Acad. Sci. USA 92:165-169; Schwartz et al., 1993, Science 262:110-114; Schena et al., 1995, Science 270:467-470; Heller et al., 1997, Proc. Natl. Acad. Sci. USA 94:2150-2155; Erie et al., 1994, Science 266:1562-1566; and Leuba et al., 1994, Proc. Natl. Acad. Sci. USA 91:11621-11625). In particular, molecular fixation techniques have relied on the application of outside forces such as electrical fields, a travelling meniscus (Michalet et al., 1997, Science 277:1518) or end-tethering of molecules with beads (Strick et al., 1996, Science 271:1835-1837) to fix DNA to solid surfaces. Biochemistries have been performed on surface-mounted DNA molecules, but the procedures used bulk deposition and analysis (Schena et al., 1995, Science 270:467-470; Heller et al., 1997, Proc. Natl. Acad. Sci. USA 94:2150-2155; Craig et al., 1990, Nucleic Acids Res. 18:2653-2660; and Nizetic et al., 1991, Proc. Natl. Acad. Sci. USA 88:3233-3237).
Once the nucleic acid molecules are fixed, they must be imaged and analyzed. Although the spatial resolution of conventional light microscopy is limited, cooled, charged-coupled (CCD) imaging devices have stimulated the development of new optical approaches to the quantitation of nucleic acids, that may supplant electrophoresis-based techniques in many applications (Schena et al., 1995, Science 270:467-470; Lipshutz et al., 1995, Biotechniques 19:442-447; and Chee et al., 1996, Science 274:610-614). Yanagida and coworkers (Yanagida et al., 1996, in Applications of fluorescence in the biomedical sciences, Taylor et al. (eds), Alan Liss, New York, pp. 321-345) first investigated the molecular motions of fluorescently stained individual DNA molecules in solution by image-enhanced fluorescence microscopy. Optical mapping was subsequently developed for the rapid production of ordered restriction maps from individual, fluorescently stained DNA molecules (Cai et al., 1995, Proc. Natl. Acad. Sci. USA 92:5164-5168; Meng et al., 1995, Nature Genet. 9:432-438; Wang et al., 1995, Proc. Natl. Acad. Sci. USA 92:165-169; Schwartz et al., 1993, Science 262:110-114; Schwartz et al., 1997, Curr. Opinions in Biotechnology 8:70-74; Samad et al., Nature 378:516-517; and Samad et al., 1995, Genomic Research 59:1-4).
In the original method, individual fluorescently labeled yeast chromosomes were elongated and fixed in a flow of molten agarose generated between a coverslip and a glass slide (Schwartz et al., 1993, Science 262:110-114). Restriction endonuclease cleavage events were recorded as time-lapse images, following addition of magnesium ions to activate the added endonuclease. Cleavage sites appeared as growing gaps due to relaxation of DNA coils at nascent ends, and maps were constructed by measuring fragment sizes using relative fluorescent intensity or apparent length measurements.
In another closed system, the DNA molecules (2-1,500 kb) were elongated and fixed using the flow and adhesion forces generated when a fluid sample is compressed between two glass surfaces, one derivatized with polylysine or APTES (Meng et al., 1995, Nature Genet. 9:432-438 and Cai et al., 1995, Proc. Natl. Acad. Sci. USA 92:5164-5168). Fixed molecules were digested with restriction endonucleases, fluorescently stained (Rye et al., 1992, Nucleic Acids Res. 20:2803-2812) and optically mapped (Meng et al., 1995, Nature Genet. 9:432-438 and Cai et al., 1995, Proc. Natl. Acad. Sci. USA 92:5164-5168). However, closed systems have limited access to the samples and cannot readily accommodate arrayed samples (Bensimon et al., 1994, Science 265:2096-2098 and Meng et al., 1995, Nature Genet. 9:432-438).
To increase the throughput and versatility of optical mapping and sequencing, multiple samples need to be arrayed on a single mapping surface. Although robotic gridding techniques for DNA samples exist (Heller et al., 1997, Proc. Natl. Acad. Sci. USA 94:2150-2155; Craig et al., 1990, Nucl. Acids Res. 18:2653-2660; and Nizetic et al., 1991, Proc. Natl. Acad. Sci. USA 88:3233-3237), such approaches were not designed to work with single molecule substrates and could not be relied upon to deposit molecules retaining significant accessibility to enzymatic action.
While single molecule techniques offer the potential advantage of an ordering capability which gel electrophoresis lacks, none of the current single molecule techniques can be used, on a practical level, as high resolution genomic sequencing tools. The molecules described by Yanagida (Yanagida, M. et al., 1983, Cold Spring Harbor Symp. Quantit.
Biol. 47:177; Matsumoto, S. et al., 1981, J. Mol. Biol. 132:501-516) were visualized, primarily free in solution making any practical sequencing impossible. Further, while the FISH technique offers the advantage of using only a limited number of immobilized fragments, usually chromosomes, it is not possible to achieve the sizing resolution available with gel electrophoresis.
Single molecule tethering techniques, as listed above, generally involve individual nucleic acid molecules which have, first, been immobilized onto a surface via one or both of their ends, and, second, have been manipulated such that the molecules are stretched out. These techniques, however, are not suited to genome analysis. First, the steps involved are time consuming and can only be accomplished with a small number of molecules per procedure. Further, in general, the tethered molecules cannot be stored and used again.
Recently, special effort has centered on development of improved surface-based approaches for DNA fixation, compatible with a variety of molecular imaging techniques. Desirable DNA fixation attributes include: a usable population of elongated molecules, preservation of biochemical activity, parallel sample processing capabilities, high sample deposition rates, densely gridded samples and easy access to arrayed samples.
Present-day array hybridization technology already involves gridding DNA samples densely on open-faced, charged-membrane surfaces (Craig et al., 1990, Nucl. Acids Res. 18:2653-2660; and Nizetic et al., 1991, Proc. Natl. Acad. Sci. USA 88:3233-3237). Gridded sample arrays facilitate biochemical manipulations and analyses and are limited only by sample density and available biochemistries.
New approaches to molecular deposition, called "fluid fixation," involve placing small droplets of DNA solution onto critically derivatized glass surfaces which readily elongates and fixes DNA molecules. Conveniently, application of outside forces are completely obviated in the fluid fixation technique, thereby making use of electrical fields, a travelling meniscus or end-tethering of molecules unnecessary. The passive nature of fluid fixation provides the platform needed for efforts to automate optical mapping and sequencing.
The observation of single fluorochromes using video rate imaging techniques has been described by Schmidt et al. (Schmidt et al., 1996, Proc. Natl. Acad. Sci. USA 93:2926-2929) using a standard fluorescence microscope, laser illumination, and a cooled CCD camera with frame shifting capability. A significant advance in signal/noise optimization was made by Funatsu et al. (Funatsu et al., 1995, Nature 374:555-559) by systematically minimizing noise in virtually every possible experimental and instrumentational variable.
In conclusion, a rapid, accurate method of optically sequencing individual nucleic acid molecules was needed in the art. Such nucleotide sequencing of single molecules would be useful for aligning/overlapping contiguous sequences for genomic mapping and genomic analysis, and in rapidly analyzing single nucleotide polymorphisms in a population of individual nucleic acid molecules.
Citation of documents herein is not intended as an admission that any of the documents cited herein is pertinent prior art, or an admission that the cited documents are considered material to the patentability of the claims of the present application. All statements as to the date or representations as to the contents of these documents are based on the information available to the applicants and does; not constitute any admission as to the correctness of the dates or contents of these documents.