The present invention relates to methods for analyzing nucleic acid reactions in general and transcription reactions in particular. The method utilizes individual nucleic acid molecules immobilized along their length on a planar substrate. Using optical techniques, such as epifluorescent microscopy, the reactions of individual nucleic acid molecules can be studied using the method described herein.
The present invention also relates to scalable and massively automatable methods for imaging nucleic acid reactions, either in stop-action fashion or in real time. Bayesian inference estimation methods are utilized to analyze a population of images and to produce data sets of genome-sized scale to be used in the identification of genes, promoter regions, termination regions, or virtually any other phenomena associated with transcription or reverse-transcription of nucleic acid molecules. The method can be used to fabricate maps of transcription events, to correlate these maps to restriction site maps, and to use these data to identify from where in a sequenced genome a single nucleic acid molecule originated.
The analysis of nucleic acid molecules at the genome level is an extremely complex endeavor which requires accurate, rapid characterization of large numbers of often very large nucleic acid molecules via high throughput DNA mapping and sequencing. The construction of physical maps, and ultimately of nucleotide sequences, for eukaryotic chromosomes currently remains laborious and difficult. This is due, in part, to the fact that current procedures for mapping and sequencing DNA were originally designed to analyze nucleic acids at the gene, rather than at the genome, level (Chumakov, et al., 1992, Nature 359:380; Maier, et al., 1992, Nat. Genet. 1:273).
Approaches to DNA sequencing have varied widely, and have made it possible to sequence entire genomes, including portions of the human genome. The most commonly used method has been the dideoxy chain termination method of Sanger (1977, Proc. Natl. Acad. Sci. USA 74:5463). However, this method is time-consuming, labor-intensive and expensive, requiring the analysis of four sets of radioactively labeled DNA fragments resolved by gel electrophoresis to determine the DNA sequence.
To overcome some of these deficiencies, automated DNA sequencing systems were developed which used four fluorescently labeled dideoxy nucleotides to label DNA (Smith et al., 1985, Nucleic Acids Res. 13:2399-2412; Smith et al., 1986, Nature 321:674; Prober et al., 1987, Science 238:336-341, which are incorporated herein by reference). Automated slab gel electrophoresis systems enable large-scale sequence acquisition (Roach et al., 1995, Genomics 26:345-353; Venter et al., 1996, Nature 381:364-366; Profer et al., 1987, Science 238:336-341; Lake et al., 1996, Science 273:1058; Strathmann et al., 1991, Proc. Natl. Acad. Sci. USA 88:1247-1250; and the complete genomic sequence of Saccharomyces cerevisiae in the Stanford database). Current large-scale sequencing is largely the domain of centers where costly and complex support systems are essential for the production efforts. Efforts to deal with sequence acquisition from a large population (usually less than 1,000) is limited to relatively small numbers of loci (Davies et al., 1995, Nature 371:130-136). However, these methods are still dependent on Sanger sequencing reactions and gel electrophoresis to generate ladders and robotic sample handling procedures to deal with the attending numbers of clones and polymerase chain reacting products.
Some recently developed methods and devices for automated sequencing of bulk DNA samples that utilize fluorescently labeled nucleotides are described in U.S. Pat. No. 5,674,743; International Application Nos. PCT/GB93/00848 published Apr. 22, 1993 as WO 93/21340; PCT/US96/08633 published Jun. 4, 1996 as WO 96/39417; and PCT/US94/01156 published Jan. 31, 1994 as WO 94/18218. None of the recently developed methods is capable of sequencing individual nucleic acid molecules.
Techniques for sequencing large genomes of DNA have relied upon the construction of Yeast Artificial Chromosomes (xe2x80x9cYACxe2x80x9d) contiguous sequences. Preliminary physical maps of a large fraction of the human genome have been generated via YACs (Cohen et al., 1993, Nature 366:698-701). However, extensive high resolution maps of YACs have not been widely generated, due to the high frequency of rearrangement/chimerism among YACs, the low complexity of fingerprints generated by hybridization approaches, and the extensive labor required to overcome these problems. Ordered maps of YACs have been optically made by using a spermine condensation method (to avoid shearing the DNA) and fixing the clones in molten agarose onto derivatized glass surfaces (Cai et al., 1995, Natl. Acad. Sci. USA 92:5164-5168). There have been several proposals for the rapid attainment of sequence data from clones that minimize or obviate the need for shotgun sequencing approaches or subcloning of large insert clones (Smith et al., 1994, Nature Genet. 7:40-47; Kupfer et al., 1995, Genomics 27:90-100; Chen et al., 1993, Genomics 17:651-656 and Roach et al., 1995, Genomics 26:345-353). Several of these approaches advocate the generation of xe2x80x9csequence sampled mapsxe2x80x9d (Smith et al., 1994, Nature Genet. 7:40-47 and Venter et al., 1996, Nature 381:364-366) which require fingerprinting of clones, or large numbers of subclones, to achieve good target coverage while simultaneously generating a fine-scale map.
A recent development has been the proposal of DNA sequencing of aligned and oriented Bacterial Artificial Chromosomes (xe2x80x9cBACxe2x80x9d) contiguous sequences (Venter et al., 1996, Nature 381:364-366); (see also Smith et al., 1994, Nature Genetics 7:40-47; Kupfer et al., 1995, Genomics 27:90-100; and Chen et al., 1993, Genomics 17:651-656). BACs offer the advantage of considerably greater stability than YACs, are more easily physically managed due to their smaller size (xcx9c500 kb to 2 Mb versus xcx9c100 to 200 kb, respectively), and are more compatible with automated DNA purification procedures (Kim et al., 1996, Proc. Natl. Acad. Sci. USA 93:6297-6301; Kim et al., 1994, Genomics 24:527-534; and Schmitt et al., 1996, Genomics 33:9-20). Further approaches for the optical analysis of BAC clones were also developed (Cai et al., 1998, Proc. Natl. Acad. Sci. USA 95:3390-3395).
Limitations of these approaches described above include low throughput, DNA fragmentation (preventing subsequent or simultaneous multimethod analyses), and difficulties in automation. Despite the potential utilities of these and other approaches, it is increasingly clear that current molecular approaches were developed primarily for characterization of single genes, not entire genomes, and are, therefore, not optimally suited to the analysis of polygenic diseases and complex traits, especially on a population-wide basis (Risch et al., 1996, Science 273:1516-1517).
Single molecule approaches represent a subset of current physical and genetic mapping approaches constitute the two major approaches to genomic analysis, and are critical to mapping and cloning of disease genes and to direct sequencing efforts. Such methods of visualization of single DNA molecules include fluorescence microscopy in solution (Yanagida et al., 1986, in Applications of fluorescence in the biomedical sciences Taylor et al. (eds), Alan Liss, New York, pp 321-345; Yanagida et al., 1983, Cold Spring Harbor Symp. Quantit. Biol. 47:177; Matsumoto et al., 1981, J. Mol. Biol. 132:501-516; Schwartz et al., 1989, Nature 338:520-522; and Houseal et al., 1989, Biophys. J. 56:507-516); FISH (Manuelidis et al., 1982, J. Cell. Biol. 95:619; Lawrence et al., 1988, Cell 52:51; Lichter et al., 1990, Science 247:64; Heng et al., 1992, Proc. Natl. Acad. Sci. USA 89:9509; van den Engh et al., 1992, Science 257:1410); visualization by scanning tunneling microscopy or atomic force microscopy techniques (Keller et al., 1989, Proc. Natl. Acad. Sci. USA 86:5356-5360; see, e.g., Karrasch et al., 1993, Biophysical J. 65:2437-2446; Hansma et al., 1993, Nucleic Acids Research 21:505-512; Bustamante et al., 1992, Biochemistry 31:22-26; Lyubchenko et al., 1992, J. Biomol. Struct. and Dyn. 10:589-606; Allison et al., 1992, Proc. Natl. Acad. Sci. USA 89:10129-10133; Zenhausern et al., 1992, J. Struct. Biol. 108:69-73); visualization of circular DNA molecules (Bustamante et al., 1992, Biochemistry 31:22-26); DNA bending in transcription complexes by scanning force microscopy (Rees et al., 1993, Science 260:1646-1649); direct mechanical measurement of the elasticity of single DNA molecules using magnetic beads (Smith et al., 1992, Science 258:1122-1126); alignment and detection of DNA molecules involving either elongation of end-tethered surface bound molecules by a receding air-water interface (U.S. Pat. No. 5,079,169; U.S. Pat. No. 5,380,833; Perkins et al., 1994, Science 264:819; and Bensimon et al., 1994, Science 265:2096-2098), and elongation of non-tethered molecules by xe2x80x98fluid fixationxe2x80x99 (Samad et al., 1995, Nature 378:516-517; Cai et al., 1995, Proc. Natl. Acad. Sci. USA 92:5164-5168; Meng et al., 1995, Nature Genet. 9:432-438; Wang et al., 1995, Proc. Natl. Acad. Sci. USA 92:165-169; and Schwartz et al., 1993, and Science 262:110-114); (See also Reed et al., xe2x80x9cA Quantitative Study Of Optical Mapping Surfaces By Atomic Force Microscopy And Restriction Endonuclease Digestionxe2x80x9d in press, Analytical Biochemistry; Cai et al., xe2x80x9cHigh Resolution Restriction Maps Of Bacterial Artificial Chromosomes Constructed By Optical Mappingxe2x80x9d, 1998, Proc. Natl. Acad. Sci. USA 95:3390-3395; Samad and Schwartz, xe2x80x9cGenomic Analysis by Optioal Mappingxe2x80x9d in Analytical Biotechnology-Genomic Analysis in press, (see also, U.S. Pat. No. 6,147,198, issued Nov. 4, 2000 to David C. Schwartz and incorporated herein); Schwartz et al., 1997, Current Opinion in Biotechnology, 8:70-74; Samad, 1995, Genomics Research 59:1-4; and Primrose, 1995, Principles of Genome Analysis: A guide to mapping and sequencing DNA from different organisms, Blackwell Science Ltd., Oxford England, pp. 76-77; and Bautsch et al., 1997 xe2x80x9cLong-Range Restriction Mapping of Genomic DNAxe2x80x9d in Genomic Mapping: A Practical Approach, Chapter 12, Paul H. Dear ed., Oxford University Press, New York, pp. 281-313).
New modes of molecular investigation have emerged from advances in molecular fixation techniques, labeling, and the development of scanning probe microscopies (Keller et al., 1989, Proc. Natl. Acad. Sci. USA 86:5356-5360; Bensimon et al., 1994, Science 265:2096-2098; Guthold et al., 1994, Proc. Natl. Acad. Sci. USA, 91:12927-12931; Hansma et al., 1996, Nucleic Acids Res. 24:713-720; Cai et al., 1995, Proc. Natl. Acad. Sci. USA 92:5164-5168; Meng et al., 1995, Nature Genet. 9:432-438; Weier et al., 1995, Hum. Mol. Genet. 4:1903-1910; Wang et al., 1995, Proc. Natl. Acad. Sci. USA 92:165-169; Schwartz et al., 1993, Science 262:110-114; Schena et al., 1995, Science 270:467-470; Heller et al., 1997, Proc. Natl. Acad. Sci. USA 94:2150-2155; Erie et al., 1994, Science 266:1562-1566; and Leuba et al., 1994, Proc. Natl. Acad. Sci. USA 91:11621-11625). In particular, molecular fixation techniques have relied on the application of outside forces such as electrical fields, a travelling meniscus (Michalet et al., 1997, Science 277:1518) or end-tethering of molecules with beads (Strick et al., 1996, Science 271:1835-1837) to fix DNA to solid surfaces. Biochemistries have been performed on surface-mounted DNA molecules, but the procedures used bulk deposition and analysis (Schena et al., 1995, Science 270:467-470; Heller et al., 1997, Proc. Natl. Acad. Sci. USA 94:2150-2155; Craig et al., 1990, Nucleic Acids Res. 18:2653-2660; and Nizetic et al., 1991, Proc. Natl. Acad. Sci. USA 88:3233-3237).
Once the nucleic acid molecules are fixed, they must be imaged and analyzed. Although the spatial resolution of conventional light microscopy is limited, cooled, charged-coupled (CCD) imaging devices have stimulated the development of new optical approaches to the quantitation of nucleic acids, that may supplant electrophoresis-based techniques in many applications (Schena et al., 1995, Science 270:467-470; Lipshutz et al., 1995, Biotechniques 19:442-447; and Chee et al., 1996, Science 274:610-614). Yanagida and coworkers (Yanagida et al., 1996, in Applications of fluorescence in the biomedical sciences, Taylor et al. (eds), Alan Liss, New York, pp. 321-345) first investigated the molecular motions of fluorescently stained individual DNA molecules in solution by image-enhanced fluorescence microscopy. Optical mapping was subsequently developed for the rapid production of ordered restriction maps from individual, fluorescently stained DNA molecules (Cai et al., 1995, Proc. Natl. Acad. Sci. USA 92:5164-5168; Meng et al., 1995, Nature Genet. 9:432-438; Wang et al., 1995, Proc. Natl. Acad. Sci. USA 92:165-169; Schwartz et al., 1993, Science 262:110-114; Schwartz et al., 1997, Curr. Opinions in Biotechnology 8:70-74; Samad et al., Nature 378:516-517; and Samad et al., 1995, Genomic Research 59:1-4).
In the original method, individual fluorescently labeled yeast chromosomes were elongated and fixed in a flow of molten agarose generated between a coverslip and a glass slide (Schwartz et al., 1993, Science 262:110-114). Restriction endonuclease cleavage events were recorded as time-lapse images, following addition of magnesium ions to activate the added endonuclease. Cleavage sites appeared as growing gaps due to relaxation of DNA coils at nascent ends, and maps were constructed by measuring fragment sizes using relative fluorescent intensity or apparent length measurements.
In another closed system, the DNA molecules (2-1,500 kb) were elongated and fixed using the flow and adhesion forces generated when a fluid sample is compressed between two glass surfaces, one derivatized with polylysine or APTES (Meng et al., 1995, Nature Genet. 9:432-438 and Cai et al., 1995, Proc. Natl. Acad. Sci. USA 92:5164-5168). Fixed molecules were digested with restriction endonucleases, fluorescently stained (Rye et al., 1992, Nucleic Acids Res. 20:2803-2812) and optically mapped (Meng et al., 1995, Nature Genet. 9:432-438 and Cai et al., 1995, Proc. Natl. Acad. Sci. USA 92:5164-5168). However, closed systems have limited access to the samples and cannot readily accommodate arrayed samples (Bensimon et al., 1994, Science 265:2096-2098 and Meng et al., 1995, Nature Genet. 9:432-438).
To increase the throughput and versatility of optical mapping and sequencing, multiple samples need to be arrayed on a single mapping surface. Although robotic gridding techniques for DNA samples exist (Heller et al., 1997, Proc. Natl. Acad. Sci. USA 94:2150-2155; Craig et al., 1990, Nucl. Acids Res. 18:2653-2660; and Nizetic et al., 1991, Proc. Natl. Acad. Sci. USA 88:3233-3237), such approaches were not designed to work with single molecule substrates and could not be relied upon to deposit molecules retaining significant accessibility to enzymatic action.
While single molecule techniques offer the potential advantage of an ordering capability which gel electrophoresis lacks, none of the current single molecule techniques can be used, on a practical level, as high resolution genomic sequencing tools. The molecules described by Yanagida (Yanagida, M. et al., 1983, Cold Spring Harbor Symp. Quantit. Biol. 47:177; Matsumoto, S. et al., 1981, J. Mol. Biol. 132:501-516) were visualized, primarily free in solution making any practical sequencing impossible. Further, while the FISH technique offers the advantage of using only a limited number of immobilized fragments, usually chromosomes, it is not possible to achieve the sizing resolution available with gel electrophoresis.
Single molecule tethering techniques, as listed above, generally involve individual nucleic acid molecules which have, first, been immobilized onto a surface via one or both of their ends, and, second, have been manipulated such that the molecules are stretched out. These techniques, however, are not suited to genome analysis. First, the steps involved are time consuming and can only be accomplished with a small number of molecules per procedure. Further, in general, the tethered molecules cannot be stored and used again.
Recently, special effort has centered on development of improved surface-based approaches for DNA fixation, compatible with a variety of molecular imaging techniques. Desirable DNA fixation attributes include: a usable population of elongated molecules, preservation of biochemical activity, parallel sample processing capabilities, high sample deposition rates, densely gridded samples and easy access to arrayed samples.
Present-day array hybridization technology already involves gridding DNA samples densely on open-faced, charged-membrane surfaces (Craig et al., 1990, Nucl. Acids Res. 18:2653-2660; and Nizetic et al., 1991, Proc. Natl. Acad. Sci. USA 88:3233-3237). Gridded sample arrays facilitate biochemical manipulations and analyses and are limited only by sample density and available biochemistries.
New approaches to molecular deposition, called xe2x80x9cfluid fixation,xe2x80x9d involve placing small droplets of DNA solution onto critically derivatized glass surfaces which readily elongates and fixes DNA molecules. Conveniently, application of outside forces are completely obviated in the fluid fixation technique, thereby making use of electrical fields, a travelling meniscus or end-tethering of molecules unnecessary. The passive nature of fluid fixation provides the platform needed for efforts to automate optical mapping and sequencing.
The observation of single fluorochromes using video rate imaging techniques has been described by Schmidt et al. (Schmidt et al., 1996, Proc. Natl. Acad. Sci. USA 93:2926-2929) using a standard fluorescence microscope, laser illumination, and a cooled CCD camera with frame shifting capability. A significant advance in signal/noise optimization was made by Funatsu et al. (Funatsu et al., 1995, Nature 374:555-559) by systematically minimizing noise in virtually every possible experimental and instrumentational variable.
In conclusion, a rapid, accurate method of optically sequencing individual nucleic acid molecules was needed in the art. Such nucleotide sequencing of single molecules would be useful for aligning/overlapping contiguous sequences for genomic mapping and genomic analysis, and in rapidly analyzing single nucleotide polymorphisms in a population of individual nucleic acid molecules.
Citation of documents herein is not intended as an admission that any of the documents cited herein is pertinent prior art, or an admission that the cited documents are considered material to the patentability of the claims of the present application. All statements as to the date or representations as to the contents of these documents are based on the information available to the applicants and does not constitute any admission as to the correctness of the dates or contents of these documents.
The recently published human genome maps have given us the first rough draft of our own human DNA genome. Venter et al., 2001, Science 291:1304-1350; International Human Genome Sequencing Consortium, 2001, Nature 409:860-921. A human transcriptome map was also constructed by using SAGE (serial analysis of gene expression) and provided a high-resolution view of gene distribution in chromosomal domains. Caron et al., 2001, Science 291:1289-1292. The achievement of this important milestone in genomic science was made possible through a combination of technological and organizational breakthroughs, and is now poised to serve as the touchstone for major discoveries in the biological sciences. Obviously, however, deciphering the nucleotide sequence of the human genome is only the first and perhaps smallest step in the process. The challenge now at hand is to discern the biological and biochemical significance embedded within the roughly 3,000,000,000 bases within the human genome. The same problem exists in the study of other, less massive genomes, such as those of bacteria and virus. Careful and comprehensive study of transcriptional patterns within different organisms, cell-types, and environments is a critical consideration in this effort. Underlying these studies are the basic biochemical mechanisms that define transcriptional activities at the molecular level.
While the approaches and systems developed for large-scale sequencing have laid the foundation for a broad range of high-throughput systems for molecular analysis, these approaches are not well-suited for studying biochemical mechanisms associated with transcription. The present invention, however is an in vitro method and device, a xe2x80x9csystem,xe2x80x9d that looks to the single molecule level for tracking numerous steps involved in gene expression and its modulation. While looking at individual molecules to study nucleic acid reactions and interactions, the method of the present invention can use an entire genome as a template, thereby elucidating transcription phenomena at an unprecedented scale and with unprecedented speed. The system utilizes biochemical and detection systems that readily enable statistical and computational analysis of the large data sets generated by the method. The method utilizes this optical mapping system to construct physical (an correlatable) maps of transcription events and restriction sites from ensembles of single DNA molecules.
Modern approaches to expression profiling based on microarrays and Affymetrix-brand chips (Affymetrix, Inc., Santa Clara, Calif.) are already proving their value in identifying genes associated with cellular function and development. Such studies are also providing the early clues to how networks of genes and their products work together to produce observable phenotypes. In addition, the identification of disease-related genes opens new routes for rational pharmaceutical intervention. The technologies that enabled these new studies include hybridization-based techniques (DNA microarrays), PCR-based techniques (differential display); sequence based techniques (SAGE; serial analysis of gene expression), and MPSS (massively parallel signature sequencing). Kozian and Kirschbaum, 1999, TIBTECH, 17:73-78.
DNA microarrays are now widely used for expression profiling, because they are intrinsically massively parallel and experimentally accessible. Brown and Botstein, 1999, Nature Genetics 21:33-37. Two main technologies are commonly used to produce DNA chips: photolithography as developed by Affymetrix and mechanical grid systems, which deposit PCR products or clones into two-dimensional arrays. Celis et al., 2000, FEBS Letters 480:2-16. While these approaches analyze the expression levels of thousands of genes simultaneously, they each suffer from insurmountable limitations, such as scalability, speed, and ease of automation. See, for example, Celis et al., supra.
The present invention is based on the development of techniques to grid multiple individual nucleic acid molecule samples, to image individual substrate molecules and single labeled nucleotides using automated fluorescence microscopy; and to integrate with a scheme for automatic construction of restriction fragment and DNA sequence maps to create a methods and systems which eliminate operator interaction. The present invention also correlates these data with transcription events, such as initiation, pause, termination, etc. The invention thus includes a method for mapping transcription events, using an entire genome as a template, as well as a method to correlate those transcription events with restriction sites within the same genome.
Specifically, a first embodiment of the invention is directed to a method of analyzing enzymatic and chemical reactions of nucleic acids. The method comprises elongating and fixing onto a surface of a substrate a plurality of nucleic acid molecules in such a fashion that each individual nucleic acid molecule is fixed along its length onto the surface of the substrate with a small degree of relaxation so that the nucleic acid molecules are individually analyzable and accessible for enzymatic and chemical reactions. Then the elongated and fixed nucleic acid of step (a) are subjected to an enzymatic or chemical reaction in the presence of a labeled reagent that generates signals correlating to the enzymatic or chemical reaction. The, the signals generated by the labeled reagent are acquired and compiled, whereby the enzymatic or chemical reaction of step is analyzed.
Another embodiment of the invention is directed to a method of analyzing enzymatic and chemical reactions of nucleic acids, the method comprising first elongating and fixing onto a surface of a substrate a plurality of nucleic acid molecules in such a fashion that each individual nucleic acid molecule is fixed along its length onto the surface of the substrate with a small degree of relaxation so that the nucleic acid molecules are individually analyzable and accessible for enzymatic and chemical reactions. The elongated and fixed nucleic acid is then subjected to a transcription reaction followed by a restriction reaction in the presence of a labeled reagent that generates signals correlating to the transcription reaction and the restriction reaction, respectively. Then acquiring and compiling the signals generated by the labeled reagent. The acquired and compiled signals generated by the labeled reagent are then compiled into an image. Individual elongated nucleic acid molecules are observed for the appearance of complexes corresponding to transcription events in the individual nucleic acid molecule; the same individual molecules are also observed for the appearance of gaps corresponding to cleavage sites between restriction fragments. These steps are reiterated on additional individual elongated nucleic acid molecules, to thereby generate additional images. The images are then compiled into an ordered map correlating transcription event sites and restriction enzyme cleavage sites based upon the images.
Moreover, the map so generated can then be compared to known genomic sequences, whereby it can be determined from where within a genome a single nucleic acid molecule originated.
A still further embodiment of the invention is directed to a method of analyzing enzymatic and chemical reactions of nucleic acids wherein the nucleic acid is subjected to a reaction in a vessel, followed by transfer of the resction products to an optical mapping surface. Here, the method comprises subjecting nucleic acid molecules to an enzymatic or chemical reaction in the presence of a labeled reagent that generates signals correlating to the enzymatic or chemical reaction, thereby generating nucleic acid reaction products. Then elongating and fixing onto a surface of a substrate a plurality of the nucleic acid reaction products in such a fashion that each individual nucleic acid molecule is fixed along its length onto the surface of the substrate with a small degree of relaxation so that the nucleic acid molecules are individually analyzable and accessible for further enzymatic and chemical reactions. The signals generated by the labeled reagent, are then acquired and compiled whereby the enzymatic or chemical reaction is analyzed. Maps can be generated from these reactions in the same fashion as noted in the preceding paragraphs.