Assays for analysis of biological processes are exploited for a variety of desired applications. For example, monitoring the activity of key biological pathways can lead to a better understanding of the functioning of those systems as well as those factors that might disrupt the proper functioning of those systems. In fact, various different disease states caused by operation or disruption of specific biological pathways are the focus of much medical research. By understanding these pathways, one can model approaches for affecting them to prevent the onset of the disease or mitigate its effects once manifested.
A stereotypical example of the exploitation of biological process monitoring is in the area of pharmaceutical research and development. In particular, therapeutically relevant biological pathways, or individual steps or subsets of individual steps in those pathways, are often reproduced or modeled in in vitro systems to facilitate analysis. By observing the progress of these steps or whole pathways in the presence and absence of potential therapeutic compositions, e.g., pharmaceutical compounds or other materials, one can identify the ability of those compositions to affect the in vitro system, and potentially beneficially affect an organism in which the pathway is functioning in a detrimental way. By way of specific example, reversible methylation of the 5′ position of cytosine by methyltransferases is one of the most widely studied epigenetic modifications. In mammals, 5-methylcytosine (5-MeC) frequently occurs at CpG dinucleotides, which often cluster in regions called CpG islands that are at or near transcription start sites. Methylation of cytosine in CpG islands can interfere with transcription factor binding and is associated with transcription repression and gene regulation. In addition, DNA methylation is known to be essential for mammalian development and has been associated with cancer and other disease processes. Epigenetic enhancer patterns have been identified in colon cancer cell lines, and a 5-hydroxymethylcytosine epigenetic marker has been identified in certain cell types in the brain, suggesting that it plays a role in epigenetic control of neuronal function (Akhtar-Zaidi, et al. (2012) Science 336(6082):736-739; and S. Kriaucionis, et al., Science 2009, 324(5929): 929-30, incorporated herein by reference in their entireties for all purposes). Further information on cytosine methylation and its impact on gene regulation, development, and disease processes is provided in the art, e.g., in A. Bird, Genes Dev 2002, 16, 6; M. Gardiner-Garden, et al., J Mol Biol 1987, 196, 261; S. Saxonov, et al., Proc Natl Acad Sci USA 2006, 103, 1412; R. Jaenisch, et al., Nat Genet 2003, 33 Suppl, 245; E. Li, et al., Cell 1992, 69, 915; A. Razin, et al., Hum Mol Genet 1995, 4 Spec No, 1751; P. A. Jones, et al., Nat Rev Genet 2002, 3, 415; P. A. Jones, et al., Nat Genet 1999, 21, 163; and K. D. Robertson, Nat Rev Genet 2005, 6, 597, all of which are incorporated herein by reference in their entireties for all purposes. Further, a large number of other nucleotide modifications are known in the art that play biological roles in some capacity, and these include, without limitation, N6-methyladenosine, N3-methyladenosine, N7-methylguanosine, pseudouridine, thiouridine, isoguanosine, isocytosine, dihydrouridine, queuosine, wyosine, inosine, triazole, diaminopurine, and 2′-O-methyl derivatives of adenosine, cytidine, guanosine, and uridine.
In contrast to determining a human genome, mapping of the human methylome is a more complex task because the methylation status differs between tissue types, changes with age, and is altered by environmental factors (P. A. Jones, et al., Cancer Res 2005, 65, 11241, incorporated herein by reference in its entirety for all purposes). Comprehensive, high-resolution determination of genome-wide methylation patterns from a given sample has been challenging due to the sample preparation demands and short read lengths characteristic of current DNA sequencing technologies (K. R. Pomraning, et al., Methods 2009, 47, 142, incorporated herein by reference in its entirety for all purposes).
Bisulfite sequencing is the current method of choice for single-nucleotide resolution methylation profiling (S. Beck, et al., Trends Genet 2008, 24, 231; and S. J. Cokus, et al., Nature 2008, 452, 215, the disclosures of which are incorporated herein by reference in their entireties for all purposes). Treatment of DNA with bisulfite converts unmethylated cytosine, but not 5-MeC, to uracil (M. Frommer, et al., Proc Natl Acad Sci USA 1992, 89, 1827, incorporated herein by reference in its entirety for all purposes). The DNA is then amplified (which converts all uracils into thymines) and subsequently analyzed with various methods, including microarray-based techniques (R. S. Gitan, et al., Genome Res 2002, 12, 158, incorporated herein by reference in its entirety for all purposes) or 2nd-generation sequencing (K. H. Taylor, et al., Cancer Res 2007, 67, 8511; and R. Lister, et Ed., Cell 2008, 133, 523, both incorporated herein by reference in their entireties for all purposes). While bisulfite-based techniques have greatly advanced the analysis of methylated DNA, they also have several drawbacks. First, bisulfite sequencing requires a significant amount of sample preparation time (K. R. Pomraning, at al., supra). Second, the harsh reaction conditions necessary for complete conversion of unmethylated cytosine to uracil lead to degradation of DNA (C. Grunau, et al., Nucleic Acids Res 2001, 29, E65, incorporated herein by reference in its entirety for all purposes), and thus necessitate large starting amounts of the sample, which can be problematic for some applications.
Furthermore, because bisulfite sequencing relies on either microarray or 2nd-generation DNA sequencing technologies for its readout of methylation status, it also suffers from the same limitations as do these methodologies. For array-based procedures, the reduction in sequence complexity caused by bisulfite conversion makes it difficult to design enough unique probes for genome-wide profiling (S. Beck, et al., supra). Most 2nd-generation DNA sequencing techniques employ short reads and thus have difficulties aligning to highly repetitive genomic regions (K. R. Pomraning, at al., supra). This is especially problematic, since many CpG islands reside in such regions. Given these limitations, bisulfite sequencing is also not well suited for de novo methylation profiling (S. Beck, at al., supra).
In another widely used technique, methylated DNA immunoprecipitation (MeDIP), an antibody against 5-MeC is used to enrich for methylated DNA sequences (M. Weber, et al., Nat Genet 2005, 37, 853, incorporated herein by reference in its entirety for all purposes). MeDIP has many advantageous attributes for genome-wide assessment of methylation status, but it does not offer as high base resolution as bisulfite treatment-based methods. In addition, it is also hampered by the same limitations of current microarray and 2nd-generation sequencing technologies.
Research efforts aimed at increasing our understanding of the human methylome would benefit greatly from the development of a new methylation profiling technology that does not suffer from the limitations described above. Further, additional modifications are known to occur in human genetic material that are not detectable by the methods described above, e.g., hydroxymethylcytosine bases. Accordingly, there exists a need for improved techniques for detection of modifications in nucleic acid sequences, and particularly nucleic acid methylation.
Further, DNA is under constant stress from both endogenous and exogenous sources and is vulnerable to chemical modifications through different types of damage, including oxidation, alkylation, radiation damage, and hydrolysis. DNA base modifications resulting from these types of DNA damage are wide-spread and play important roles in physiological pathways and disease phenotypes (see, e.g., Geacintov, et al. (2010) The Chemical Biology of DNA Damage, Wiley-VCH Verlag GmbH & Co. KGaA; Kelley, M R (2011) DNA Repair in Cancer Therapy: Molecular Targets and Clinical Applications, Elsevier Science; and Preston, et al. (2011) Semin. Cancer Biol. 20:281-293, the disclosures of which are incorporated herein by reference in their entireties for all purposes). Examples include 8-oxoguanine, 8-oxoadenine (oxidative damage; aging, Alzheimer's, Parkinson's), 1-methyladenine, 6-O-methylguanine (alkylation; gliomas and colorectal carcinomas), benzo[α]pyrene diol epoxide (BPDE), pyrimidine dimers (adduct formation; smoking, industrial chemical exposure, UV light exposure; lung and skin cancer), and 5-hydroxycytosine, 5-hydroxyuracil, 5-hydroxymethyluracil, and thymine glycol (ionizing radiation damage; chronic inflammatory diseases, prostate, breast and colorectal cancer). Currently, these and other products of DNA damage are detected using bulk measurements including chromatographic techniques, polymerase chain reaction assays, the Comet assay, mass spectrometry, electrochemistry, radioactive labeling and immunochemical methods (see, e.g., Kumari, et al. (2008) EXCLI J. 7:44-62, incorporated herein by reference in its entirety for all purposes). Sequencing individual DNA molecules would be beneficial for mapping base damage, which can occur at random DNA template positions.
Typically, modeled biological systems rely on bulk reactions that ascertain general trends of biological reactions and provide indications of how such bulk systems react to different effectors. While such, systems are useful as models of bulk reactions in vivo, a substantial amount of information is lost in the averaging of these bulk reaction results. In particular, the activity of and effects on individual molecular complexes cannot generally be teased out of such bulk data collection strategies.
Single-molecule real-time analysis of nucleic acid synthesis has been shown to provide powerful advantages over nucleic acid synthesis monitoring that is commonly exploited in sequencing processes. In particular, by concurrently monitoring the synthesis process of nucleic acid polymerases as they work in replicating nucleic acids, one gains advantages of a system that has been perfected over millions of years of evolution. In particular, the natural DNA synthesis processes provide the ability to replicate whole genomes in extremely short periods of time, and do so with an extremely high level of fidelity to the underlying template being replicated.