1. Technical Field
The present disclosure relates to compositions and methods for accurately detecting mutations using sequencing and, more particularly, uniquely tagging double stranded nucleic acid molecules such that sequence data obtained for a sense strand can be linked to sequence data obtained from the anti-sense strand when obtained via massively parallel sequencing methods.
2. Description of Related Art
Detection of spontaneous mutations (e.g., substitutions, insertions, deletions, duplications), or even induced mutations, that occur randomly throughout a genome can be challenging because these mutational events are rare and may exist in one or only a few copies of DNA. The most direct way to detect mutations is by sequencing, but the available sequencing methods are not sensitive enough to detect rare mutations. For example, mutations that arise de novo in mitochondrial DNA (mtDNA) will generally only be present in a single copy of mtDNA, which means these mutations are not easily found since a mutation must be present in as much as 10-25% of a population of molecules to be detected by sequencing (Jones et al., Proc. Nat'l. Acad. Sci. U.S.A. 105:4283-88, 2008). As another example, the spontaneous somatic mutation frequency in genomic DNA has been estimated to be as low as 1×10−8 and 2.1×10−6 in human normal and cancerous tissues, respectively (Bielas et al., Proc. Nat'l Acad. Sci. U.S.A. 103:18238-42, 2008).
One improvement in sequencing has been to take individual DNA molecules and amplify the number of each molecule by, for example, polymerase chain reaction (PCR) and digital PCR. Indeed, massively parallel sequencing represents a particularly powerful form of digital PCR because multiple millions of template DNA molecules can be analyzed one by one. However, the amplification of single DNA molecules prior to or during sequencing by PCR and/or bridge amplification suffers from the inherent error rate of polymerases employed for amplification, and spurious mutations generated during amplification may be misidentified as spontaneous mutations from the original (endogenous unamplified) nucleic acid. Similarly, DNA templates damaged during preparation (ex vivo) may be amplified and incorrectly scored as mutations by massively parallel sequencing techniques. Again, using mtDNA as an example, experimentally determined mutation frequencies are strongly dependent on the accuracy of the particular assay being used (Kraytsberg et al., Methods 46:269-73, 2008)—these discrepancies suggest that the spontaneous mutation frequency of mtDNA is either below, or very close to, the detection limit of these technologies. Massively parallel sequencing cannot generally be used to detect rare variants because of the high error rate associated with the sequencing process—one process using bridge amplification and sequencing by synthesis has shown an error rate that varies from about 0.06% to 1%, which depends on various factors including read length, base-calling algorithms, and the type of variants detected (see Kinde et al., Proc. Nat'l. Acad. Sci. U.S.A. 108:9530-5, 2011).