2.1 Technical Field
The invention relates to methods for analyzing molecules and devices for performing such analysis. The methods and devices allow reliable analysis of a single molecule of nucleic acids. Such single molecules may be derived from natural samples such as cells, tissues, soil, air, water, without separating or enriching individual components. In certain aspects of the invention, the methods and devices are useful in performing nucleic acid sequence analysis or nucleic acid quantification including gene expression.
2.2 Sequence Listing
The sequences of the polynucleotides described herein are listed in the Sequence Listing and are submitted on a compact disc containing the file labeled “CAL-2CIP PCT.txt”—8.00 KB (8.192 bytes) which was created on an IBM PC, Windows 2000 operating system on Feb. 26, 2004 at 11:26:18 AM. The Sequence Listing entitled “CAL-2CIP PCT.txt” is herein incorporated by reference in its entirety. A computer readable format (“CRF”) and three duplicate copies (“Copy 1,” “Copy 2” and “Copy 3”) of the Sequence Listing “CAL-2CIP PCT.txt” are submitted herein. Applicants hereby state that the content of the CRF and Copies 1, 2 and 3 of the Sequence Listing, submitted in accordance with 37 CPR §1.821(c) and (e), respectively, are the same.
2.3. Background
There are three established DNA sequencing technologies. The dominant sequencing method used today is based on Sanger's dideoxy chain termination process (Sanger et al., Proc. Natl. Acad. Sci. USA 74:5463 (1977) herein incorporated by reference in its entirety) and relies on various gel-based separation instruments ranging from manual systems to fully automated capillary sequencers. The Sanger process is technically difficult and is limited to read lengths of about 1 kb or less, requiring multiple reads to achieve high accuracy. A second method, pyrosequencing, also uses polymerase to generate sequence information by monitoring production of pyrophosphate generated during consecutive cycles in which specific DNA bases are tested for incorporation into the growing chain (Ronaghi, Genome Res. 11:3 (2001), herein incorporated by reference in its entirety). The method provides an elegant multi-well plate assay, but only for local sequencing of very short 10-50 base fragments. This read length restriction represents a serious limitation for sequence-based diagnostics.
Both of the above technologies represent direct sequencing methods in which each base position in a chain is determined sequentially by direct experimentation. Sequencing by hybridization (SBH) (U.S. Pat. No. 5,202,231; Drmanac et al., Genomics 4:114 (1989), both of which are herein incorporated by reference in their entirety), uses the fundamental life chemistry of base-specific hybridization of complementary nucleic acids to indirectly assemble the order of bases in a target DNA. In SBH, overlapping probes of known sequence are hybridized to sample DNA molecules and the resulting hybridization pattern is used to generate the target sequence using computer algorithms (co-owned, co-pending U.S. patent application Ser. No. 09/874,772; Drmanac et al., Science 260:1649-1652 (1993); Drmanac et al., Nat. Biotech. 16:54-58 (1998); Drmanac et al., “Sequencing and Fingerprinting DNA by Hybridization with Oligonucleotide Probes,” In: Encyclopedia of Analytical Chemistry, pp. 5232-5237 (2000); Drmanac et al., “Sequencing by Hybridization (SBH): Advantages, Achievements, and Opportunities,” In: Advances in Biochemical Engineering/Biotechnology: Chip Technology, Hoheisel, J. (Ed.), Vol. 76, pp. 75-98 (2002), all of which are herein incorporated by reference in their entirety). Probes or DNA targets may be arrayed in the form of high-density arrays (see, for example, Cutler et al., Genome Res. 11:1913-1925 (2001), herein incorporated by reference in its entirety). Advantages of the SBH method include experimental simplicity, longer read length, higher accuracy, and multiplex sample analysis in a single assay.
Currently, there is a critical need for new biodefense technologies that can quickly and accurately detect, analyze, and identify all potential pathogens in complex samples. Current pathogen detection technologies generally lack the sensitivity and selectivity to accurately identify trace quantities of pathogens in such samples and are often expensive and difficult to operate. In addition, in their current implementations, all three sequencing technologies require large quantities of sample DNA. Samples are usually prepared by one of several amplification methods, primarily PCR. These methods, especially SBH, can provide good sequence-based diagnostics of individual genes or mixtures of 2-5 genes, although with substantial cost associated with DNA amplification and array preparation. Thus, all current sequencing methods lack the speed and efficiency needed to provide at acceptable cost comprehensive sequence-based pathogen diagnostics and screening in complex biological samples. This creates a wide gap between current technical capacity and new sequencing needs. Ideally, a suitable diagnostics process should permit a simultaneous survey of all critical pathogens potentially present in environmental or clinical samples, including mixtures in which engineered pathogens are hidden among organisms.
The requirements for such comprehensive pathogen diagnostics include the need to sequence 10-100 critical genes or entire genomes simultaneously for hundreds of pathogens and to process thousands of samples. Ultimately, this will require sequencing 10-100 Mb of DNA per sample, or 100 Mb to 10 Gb of DNA per day for a lab performing continuous systematic surveys. Current sequencing methods have over 100 fold lower sequencing throughput and 100 fold higher cost than is required for such comprehensive pathogen diagnostics and pre-symptomatic surveys.
Current biosensor technologies use a variety of molecular recognition strategies, including antibodies, nucleic acid probes, aptamers, enzymes, bioreceptors, and other small molecule ligands (Iqbal et al., Biosensors and Bioelectronics 15:549-578 (2000), herein incorporated by reference in its entirety). Molecular recognition elements must be coupled to a reporter molecule or tag to allow positive detection events.
Both DNA hybridization and antibody-based technologies are already widely used in pathogen diagnostics. Nucleic acid-based technologies are generally more specific and sensitive than antibody-based detection, but can be time consuming and less robust (Iqbal et al., 2000, supra). DNA amplification (through PCR or cloning) or signal amplification is generally necessary to achieve reliable signal strength and accurate prior sequence knowledge is required to construct pathogen-specific probes. Although development of monoclonal antibodies has increased the specificity and reliability of immunoassays, the technology is relatively expensive and prone to false positive signals (Doing et al., J. Clin. Microbiol. 37:1582-1583 (1999); Marks, Clin. Chem. 48:2008-2016 (2002), both of which are herein incorporated by reference in their entirety). Other molecular recognition technologies such as phage display, aptamers and small molecule ligands are still in their early stages of development and not yet versatile enough to address all pathogen detection problems.
The main liability of all current diagnostic technologies is that they lack the sensitivity and versatility to detect and identify all potential pathogens in a sample. Weapons designers can easily engineer new biowarfare agents to foil most pathogen-specific probes or immunoassays. There is a clear urgent need for efficient sequence-based diagnostics.
To this end, Applicants have developed a high-efficiency genome sequencing system, random DNA array-based sequencing by hybridization (rSBH). rSBH can be useful for genomic sequence analysis of all genomes present in complex microbial communities as well as individual human genome sequencing. rSBH eliminates the need for DNA cloning or DNA separation and reduces the cost of sequencing using methods known in the art.