Genetic mutations underlie many aspects of life and death—through evolution and disease, respectively. Accordingly, their measurement is critical to several fields of research. Luria and Delbrück's classic fluctuation analysis is a prototypic example of the insights into biological processes that can be gained simply by counting the number of mutations in carefully controlled experiments (1). Counting de novo mutations in humans, not present in their parents, have similarly led to new insights into the rate at which our species can evolve (2, 3). Similarly, counting genetic or epigenetic changes in tumors can inform fundamental issues in cancer biology (4). Mutations lie at the core of current problems in managing patients with viral diseases such as AIDS and hepatitis by virtue of the drug-resistance they can cause (5, 6). Detection of such mutations, particularly at a stage prior to their becoming dominant in the population, will likely be essential to optimize therapy. Detection of donor DNA in the blood of organ transplant patients is an important indicator of graft rejection and detection of fetal DNA in maternal plasma can be used for prenatal diagnosis in a non-invasive fashion (7, 8). In neoplastic diseases, which are all driven by somatic mutations, the applications of rare mutant detection are manifold; they can be used to help identify residual disease at surgical margins or in lymph nodes, to follow the course of therapy when assessed in plasma, and perhaps to identify patients with early, surgically curable disease when evaluated in stool, sputum, plasma, and other bodily fluids (9-11).
These examples highlight the importance of identifying rare mutations for both basic and clinical research. Accordingly, innovative ways to assess them have been devised over the years. The first methods involved biologic assays based on prototrophy, resistance to viral infection or drugs, or biochemical assays (1, 12-18). Molecular cloning and sequencing provided a new dimension to the field, as it allowed the type of mutation, rather than simply its presence, to be identified (19-24). Some of the most powerful of these newer methods are based on Digital PCR, in which individual molecules are assessed one-by-one (25). Digital PCR is conceptually identical to the analysis of individual clones of bacteria, cells, or virus, but is performed entirely in vitro with defined, inanimate reagents. Several implementations of Digital PCR have been described, including the analysis of molecules arrayed in multi-well plates, in polonies, in microfluidic devices, and in water-in-oil emulsions (25-30). In each of these technologies, mutant templates are identified through their binding to oligonucleotides specific for the potentially mutant base.
Massively parallel sequencing represents a particularly powerful form of Digital PCR in that hundreds of millions of template molecules can be analyzed one-by-one. It has the advantage over conventional Digital PCR methods in that multiple bases can be queried sequentially and easily in an automated fashion. However, massively parallel sequencing cannot generally be used to detect rare variants because of the high error rate associated with the sequencing process. For example, with the commonly used Illumina sequencing instruments, this error rate varies from ˜1%(31, 32) to ˜0.05% (33, 34), depending on factors such as the read length (35), use of improved base calling algorithms (36-38) and the type of variants detected (39). Some of these errors presumably result from mutations introduced during template preparation, during the pre-amplification steps required for library preparation and during further solid-phase amplification on the instrument itself. Other errors are due to base mis-incorporation during sequencing and base-calling errors. Advances in base-calling can enhance confidence (e.g., (36-39)), but instrument-based errors are still limiting, particularly in clinical samples wherein the mutation prevalence can be 0.01% or less (11). In the work described below, we show how templates can be prepared and the sequencing data obtained from them can be more reliably interpreted, so that relatively rare mutations can be identified with commercially available instruments.
There is a continuing need in the art to improve the sensitivity and accuracy of sequence determinations for investigative, clinical, forensic, and genealogical purposes.