The benefits of the $1,000 genome have been well documented in the literature (Kaiser, J., “DNA Sequencing—A Plan to Capture Human Diversity in 1000 Genomes,” Science 319:395-395 (2008); Kuehn, B. M., “1000 Genomes Project Promises Closer Look at Variation in Human Genome,” JAMA 300:2715-2715 (2008); Mardis, E., “Anticipating the $1,000 Genome,” Genome Biol. 7:112 (2006); Metzker, M. L., “Emerging Technologies in DNA Sequencing,” Genome Res. 15:1767-1776 (2005); Schloss, J., “How to Get Genomes at One Ten-Thousandth the Cost,” Nature Biotechnol. 26:1113-1115 (2008)). Some of the important consequences include; (i) personalized medicine that could assist in more effective disease prevention, improve diagnosis and prognosis to match the appropriate therapy with the specific patient through genome-wide evaluation of sequence variations; (ii) understanding genome-wide complexity; (iii) designing new therapeutics; and (iv) developing a de facto standard for in vitro diagnostics (IVD) irrespective of sequence variation type.
There are a plethora of different genetic variations that serve as effective biomarkers for a variety of diseases, such as sporadic mutations, inherited mutations, single nucleotide polymorphisms (SNPs), methylation patterns (epigenetics), gene expression, copy number variation, microsatellite instability, etc. Unfortunately, all of these structural modifications require a unique assay format and as such, are difficult to implement in the clinic due to the specialized equipment and expertise required to carry out each molecular assay (Thomas et al., “Biomedical Microelectromechanical Systems (BioMEMS) Using Electrophoresis for the Analysis of Genetic Mutations,” Molecular Review Diagnostics 2:429-447 (2002)). A “standard” assay format that can uncover the presence/absence of all sequence variations using a single instrument with little operator expertise will expand the full utility of IVD. In many cases, extensive resequencing of selected exons in the genome can provide the necessary clinical information with the required sensitivity irrespective of the type of sequence variation.
Advances in DNA sequencing hold the promise to standardize and develop non-invasive molecular diagnosis to improve prenatal care, transplantation efficacy, cancer and other disease detection and individualized treatment. Currently, patients with predisposing or early disease are not identified, and those with disease are not given the best treatment—all because of failures at the diagnostic level. Consequently, there is an urgent need to develop automated ultra-fast sequencing platforms that may be used in the clinical laboratory. Such low-cost bench-top machines are needed to accelerate the discovery, validation and clinical use of molecular markers.
For example, in the cancer field, there is a need to develop such technology for early detection, guiding therapy, and monitoring for recurrence—all from a blood sample. This includes the need to develop (i) high sensitivity detection of promoter hypermethylation and hypomethylation (when present at 1% to 0.01% of cell-free DNA), (ii) high sensitivity detection of common and uncommon mutations in known genes (when present at 1% to 0.01% of cell-free DNA), (iii) accurate quantification of tumor-specific mRNA and miRNA isolated from tumor-derived exosomes or RISC complex in blood, (iv) accurate quantification of tumor-specific copy changes in DNA isolated from circulating tumor cells, (v) accurate quantification of mutations, promoter hypermethylation and hypomethylation in DNA isolated from circulating tumor cells. All of the above cases (except quantification of tumor-specific copy changes in DNA isolated from circulating tumor cells) require focusing the sequencing on targeted genes or regions of the genome. Further, determination of the sequence information or methylation status from both strands of the original fragment provides critically needed confirmation of rare events.
In the prenatal care field, there is an urgent need to develop non-invasive assays for, common aneuploidies, such as trisomy 21, 18, or 13, small deletions, such as those arising from deletions in the Duchenne muscular dystrophy (DMD) gene, other small copy number anomalies, such as those responsible for autism, balanced translocations to determine potential clinical manifestations, methylation changes, which may result in diseases associated with imprinting, such as Angelman's syndrome or Prader-Willi syndrome, triplet repeat changes, responsible for diseases such as Huntington's disease, point mutations, such as those in the CFTR gene responsible for cystic fibrosis.
Single molecule sequencing (SMS) provides some unique attributes not available with ensemble-based strategies, such as those based on PCR, in terms of attaining the ambitious mandates set forth by the $1,000 genome project. For example, SMS (i) streamlines the sample processing pipeline to reduce the finished base sequencing cost (Bayley, H., “Sequencing Single Molecules of DNA,” Curr. Opin. Chem. Biol. 10:628-637 (2006)); (ii) eliminates the need for amplification and its associated biases as well as the reagents and the need for designing primers appropriate for difficult regions of the genome (i.e., highly repetitive regions); (iii) provides the ability to look directly for methylation sites, rare mutations and other sequence variations with high identification efficiency; (iv) provides high sensitivity for monitoring copy number variations; and (v) generates long reads that can reduce assembly demands, and sequence through high repeat regions.
One type of SMS approach involved synthesis of a polymerase-generated complementary DNA strand composed of fluorescently-labeled deoxynucleotides (Davis et al., “Rapid DNA Sequencing Based Upon Single Molecule Detection,” Genetic Analysis-Biomolecular Engineering 8:1-7 (1991); Goodwin et al., “Application of Single Molecule Detection to DNA Sequencing,” Nucleosides & Nucleotides 16:543-550 (1997); Goodwin et al., “DNA Sequencing by Single Molecule Detection,” Prog. Biophys. Mol. Biol. 65:SMI02-SMI02 (1996)). The complementary DNA strand is anchored to a microbead using a streptavidin:biotin linkage. Optical trapping is used to suspend the bead:DNA complex in a flow stream filled with a highly processive exonuclease, which sequentially clips the terminal mononucleotides (dNMPs) sending them through an excitation laser beam that produce fluorescent photon bursts with the color providing the mechanism for identification. While conceptually simple and, in spite of the demonstration of one-color sequencing (Werner et al., “Progress Towards Single-Molecule DNA Sequencing: A One Color Demonstration,” J. Biotechnol. 102:1-14 (2003)), several challenges with this approach have been encountered, including the inability to build a complement using exclusively dye-modified dNTPs, diffusional misordering resulting from scaling issues, and impurity fluorescence reducing the signal-to-noise ratio during single-molecule detection (Demas et al., “Fluorescence Detection in Hydrodynamically Focused Sample Streams: Reduction of Diffusional Defocusing by Association of Analyte With High-Molecular Weight Species,” Appl. Spectroscopy 52:755-762 (1998) and Goodwin et al., “DNA Sequencing by Single Molecule Detection,” Prog. Biophys. Mol. Biol. 65:SMI02-SMI02 (1996)).
Recently, alternative fluorescence-based SMS strategies have been proposed that follow incorporation events of fluorescently-labeled dNTPs by polymerases and use zero-mode waveguides monitoring dNTPs labeled with spectrally distinct dyes phospholinked to the dNTPs (Eid et al., “Real-Time DNA Sequencing From Single Polymerase Molecules,” Science 323:133-138 (2009)). Another approach uses single DNA molecules arrayed onto a solid support with each incorporation event generating a fluorescence burst of photons (Braslaysky et al., “Sequence Information Can be Obtained From Single DNA Molecules,” Proc. Nat'l. Acad. Sci., U.S.A. 100:3660-3964 (2003)). While these are excellent examples of securing sequence information directly from single molecules, they do provide some common challenges, such as the need for fluorescence substrates, the large amount of spectral overlap between molecular systems generating cross-talk or cross-excitation and the need for extensive optical hardware to read the resulting signatures.
To circumvent the requirement for fluorescence-based reads from SMS formats, nanopore technologies have been proposed to allow for the direct read of DNA sequence data from electrical signatures of mononucleotides comprising the target DNA, obviating the need for fluorescence (Akeson et al., “Microsecond Time-Scale Discrimination Among Polycytidylic Acid, Polyadenylic Acid, and Polyuridylic Acid as Homopolymers or as Segments Within Single RNA Molecules,” Biophys. J. 77:3227-3233 (1999); Deamer & Branton, “Characterization of Nucleic Acids by Nanopore Analysis,” Acc. Chem. Res. 35:817-825 (2002); Meller & Branton, “Single Molecule Measurements of DNA Transport Through a Nanopore,” Electrophoresis 23:2583-2591 (2002); Meller et al., “Voltage-Driven DNA Translocations Through a Nanopore,” Phys. Rev. Lett. 86:3435-3438 (2001); and Meller et al., “Rapid Nanopore Discrimination Between Single Polynucleotide Molecules,” Proc. Nat'l. Acad. Sci. U.S.A. 97:1079-1084 (2000)). In most studies, the nanopore is α-hemolysin, which is a proteinaceous membrane channel produced by the bacterium, S. aureus. From the application standpoint, the use of this pore has several limitations: (1) its mechanical and chemical stability are in many cases, inadequate; (2) it has a fixed pore size that allows transduction of only selected types of molecules; and (3) the ability to manufacture high-density arrays of such nanopores can be problematic. These α-hemolysin limitations have led to the use of synthetic nanopores (Rhee & Burns, “Nanopore Sequencing Technology: Research Trends and Applications,” Trends Biotechnol. 24:580-586 (2006) and Storm et al., “Fabrication of Solid-State Nanopores With Single-Nanometer Precision,” Nat. Mater. 2:537-541 (2003)) that can be fabricated with 1-50 nm sizes in polymer or silicon nitride membranes using electron or ion beams. The attractive feature of the synthetic nanopores is the ability to adopt different readout modalities, such as the use of transverse electrodes decorating the synthetic pore to monitor perturbations in the tunneling current or conductance changes (Lagerqvist et al., “Fast DNA Sequencing Via Transverse Electronic Transport,” Nano Lett. 6:779-782 (2006); Lagerqvist et al., “Influence of the Environment and Probes on Rapid DNA Sequencing Via Transverse Electronic Transport,” Biophys. J. 93:2384-2390 (2007); Zikic et al., “Characterization of the Tunneling Conductance Across DNA Bases,” Phys. Rev. E 74(1 Pt 1):011919 (2006); and Zwolak & Di Ventra, “Colloquium: Physical Approaches to DNA Sequencing and Detection,” Rev. Modern Physics 80:141-165 (2008)).
In principle, structural information of DNA, whether using a natural or synthetic nanopore, is obtained by deducing the identity of a nucleotide from the blockage current magnitude as an intact DNA molecule is moved through the pore. The advantages of this DNA sequencing approach include; (1) the ability to sequence large DNA fragments (≧50 kbp); (2) does not require the use of amplification or sub-cloning techniques; (3) does not require the use of deoxynucleotides or dideoxynucleotides that are fluorescently labeled; (4) small input DNA sample sizes are required, on the order of 1×108 copies for whole genome sequencing and; (5) the rate at which DNA sequence information can be obtained could provide near real-time readout. Unfortunately, a working demonstration of DNA sequencing directly from a nanopore has yet to be demonstrated.
There has been several reviews focused on the potential of nanopore technology for DNA sequencing, and, as these reviews point out, a number of challenges exist to realize this exciting new platform and its potential for DNA sequencing (Branton et al., “The Potential and Challenges of Nanopore Sequencing,” Nat. Biotechnol. 26:1146-1153 (2008) and Zwolak & Di Ventra, “Colloquium: Physical Approaches to DNA Sequencing and Detection,” Rev. Modern Physics 80:141-165 (2008)). First, the translocation times through the pore are fairly high (1-20 μs per nucleotide) requiring the bandwidth of the readout electronics to function in the MHz range. Secondly, the readout resolution requires a pore thickness equal to or less than the single base spacing of DNA molecules, ˜0.34 nm. Because the thickness of both synthetic and α-hemolysin pores is much larger (5-15 nm) than this spacing, multiple bases simultaneously reside within the pore. Even if nanopores could be fabricated with this prerequisite thickness, the effective electric field read region would extend approximately 1 pore diameter unit on either side of the pore (Liu et al., “The Effect of Translocating Cylindrical Particles on the Ionic Current Through a Nanopore,” Biophys. J. 92:1164-1177 (2007)). Third, the production of arrays of nanopores must be done in a high production mode reproducibly with the prerequisite size dimensions and at low-cost to accommodate the intended application. Fourth, high quality genomic DNA must be extracted from a diverse array of samples (blood, tissue, bone marrow, urine, saliva, etc) and then processed to produce DNA fragments (˜50 kbp), which are used as the input for sequencing. The sample preparation and sequencing steps should be integrated into a single platform and operate in a basic turn-key mode to allow a broad user base.
Readout resolution limitations can be mitigated if nucleotides are physically separated from each other while maintaining their original order following clipping from the DNA, for example through the use of an exonuclease enzyme (Davis et al., “Rapid DNA Sequencing Based on Single Molecule Detection,” In Los Alamos Science (1992)). This has been demonstrated to be feasible using a highly processive exonuclease enzyme, which sequentially clips individual nucleotides from an intact DNA fragment and directing these bases through an α-hemolysin nanopore fitted with a cyclodextrin collar (Wu et al., “Protein Nanopores With Covalently Attached Molecular Adapters,” J. Am. Chem. Soc. 129:16142-16148 (2007) and Clarke et al., “Continuous Base Identification for Single-Molecule Nanopore DNA Sequencing,” Nature Nanotechnol. 4:265-270 (2009)). Unfortunately, the single base identification efficiency using blockage currents is 93-98% (Astier et al., “Toward Single Molecule Sequencing: Direct Identification of Ribonucleoside and Deoxyribonucleoside 5′-Monophosphates by Using an Engineered Protein Nanopore Equipped With a Molecular Adaptor,” J. Am. Chem. Soc. 128:1705-1710 (2006)), and therefore, errors in sequencing using blockage currents alone do not generate the necessary sequencing accuracy required to identify mutational sites, for example. Also, salt conditions required for optimum exonuclease activity could not be matched to conditions required for high accuracy base identification and thus, the identification efficiency ranged from 90% to 99%. Therefore, additional base identification strategies must be considered.
The present invention overcomes these and other deficiencies in the art.