The past years have seen a dynamic change in the ability of science to comprehend vast amount of data. Pioneering technologies such as nucleic acid arrays allow scientists to delve into the world of genetics in far greater details than ever before. Exploration of genomic DNA has long been a dream of the scientific community. Held within the complex structures of genomic DNA lies the potential to identify, diagnose, or treat diseases like cancer, Alzheimer disease or alcoholism. Exploitation of genomic information from plants and animals may also provide answers to the world's food distribution problems.
Genome-wide assays, however, must contend with the complexity of genomes; the human genome for example is estimated to have a complexity of 3×109 base pairs. Because of their abundance, single nucleotide polymorphisms (SNPs) have generally emerged as the marker of choice for genome wide association studies and genetic linkage studies.
More recently an abundance of indels have been discovered in the genome, such as within the 1000 Genomes Project. See, e.g., The 1000 Genomes Project Consortium, “An integrated map of genetic variation from 1,092 human genomes,” Nature, 491, 56-65 (November 2012), which is hereby incorporated by reference in its entirety. Indels refer to the deletion (or insertion) of generally up to about 50 base pairs (bps), often 10 or less bps, at a given genomic location. Larger insertions or deletions, such as those associated with duplications, deletions, inversions and translocations that concern hundreds to thousands of bps, are usually referred to as structural variations (SVs).
Thus far, there have been few high-throughput screening methods or assays for detecting or identifying any but the most simple indels, such as single base indels. Previous work to detect more complicated indels has involved, for example, attempts to utilize next generation sequencing data for indel calls. See, e.g., Albers et al., “Dindel: Accurate indel calls from short-read data,” Genome Res., 21(6): 961-973 (2011), which is hereby incorporated by reference in its entirety.
All documents, i.e., publications and patent applications, cited in this disclosure, including the foregoing, are incorporated by reference herein in their entireties for all purposes to the same extent as if each of the individual documents were specifically and individually indicated to be so incorporated by reference herein in its entirety.