Identifying sequence variation within complex populations is an actively growing field, particularly with the advent of large scale parallel nucleic acid sequencing. However, large scale parallel sequencing has significant limitations in that the inherent error frequency in commonly-used techniques is larger than the frequency of many of the actual sequence variations in the population. For example, error rates of 0.1-1% have been reported in standard high throughput sequencing. Detection of rare sequence variants has high false positive rates when the frequency of variants is low, such as at or below the error rate.
There are many reasons for detecting rare sequence variants. For example, detecting rare characteristic sequences can be used to identify and distinguish the presence of a harmful environmental contaminant, such as bacterial taxa. A common way of characterizing bacterial taxa is to identify differences in a highly conserved sequence, such as rRNA sequences. However, typical sequencing-based approaches to this are faced with challenges relating to the sheer number of different genomes in a given sample and the degree of homology between members, presenting a complex problem for already laborious procedures. Improved procedures would have the potential to enhance contamination detection in a variety of settings. For example, the clean rooms used to assemble components of satellites and other space craft can be surveyed with the present systems and methods to understand what microbial communities are present and to develop better decontamination and cleaning techniques to prevent the introduction of terrestrial microbes to other planets or samples thereof or to develop methodologies to distinguish data generated by putative extraterrestrial microorganisms from that generated by contaminating terrestrial microorganisms. Food monitoring applications include the periodic testing of production lines at food processing plants, surveying slaughter houses, inspecting the kitchens and food storage areas of restaurants, hospitals, schools, correctional facilities and other institutions for food borne pathogens. Water reserves and processing plants may also be similarly monitored.
Rare variant detection can also important for the early detection of pathological mutations. For instance, detection of cancer-associated point mutations in clinical samples can improve the identification of minimal residual disease during chemotherapy and detect the appearance of tumor cells in relapsing patients. The detection of rare point mutations is also important for the assessment of exposure to environmental mutagens, to monitor endogenous DNA repair, and to study the accumulation of somatic mutations in aging individuals. Additionally, more sensitive methods to detect rare variants can enhance prenatal diagnosis, enabling the characterization of fetal cells present in maternal blood.