Genomic sequencing has progressed in the last few years. Methods can now sequence a sample within a relatively short time period (e.g., days) and with relatively small cost (less than $10,000). This reduced cost has opened up the possibility of using gene sequencing for diagnostic purposes, as well as more areas of research. To be suitable for diagnostic purposes, detailed information is needed, such as knowledge of the haploid genome (as opposed to just the diploid genome) and of structural variations.
Typical sequencing techniques simply provide information about the diploid genome, e.g., the genotype at many locations. However, for two heterozygous genomic locations, it is not known which haplotype (e.g., chromosome copy) that two alleles are on. The specific sequence of each of the two haplotypes can be critical information. However, short sequence data (e.g. 35 bp of each end of a short fragment) does not lend itself to easily identifying the haploid genome.
To make the identification of the haploid genome even more difficult, the genome of the sample currently being tested can have structural variations (e.g. insertions or deletions), which can also be of critical importance. Discovering structural a variation of a genome relative to a general or localized population can provide valuable diagnostic and research information. For example, structural variations typically are the result of disease, such as cancer, or can lead to a greater likelihood of cancer. Besides disease identification, accurate identification of structural variation can be important for many reasons, such as accurately tracking the heritage of a group of people, as the rearrangement might have occurred several generations previously.
In the last few years, structural rearrangements have emerged as an important driver of unchecked growth in many solid tumors. However, there are few easy to use, high-throughput methods for detecting these on a clinical scale. Current next generation sequencing (NGS) methods rely on paired end sequencing of relatively short reads (<200 bp) to detect structural rearrangements (SVs). While these methods have successfully detected some SVs, they perform poorly in regions of the genome containing repetitive elements, which are common hotspots of genomic rearrangements. Also, these mate-pair methods can be cumbersome, require high coverage, and also lose utility for long translocations.
The ideal method for detecting structural rearrangements would be a long single-molecule sequencing technology spanning the chromosome breakpoint usually involving long repeats and containing sufficient unique flanking sequence to allow for accurate mapping. Currently there are no commercial technologies that can achieve this on a scale and cost that is meaningful for genetic analysis in humans.
It is therefore desirable to provide methods, systems, and apparatuses that accurately identify long DNA fragments and structural variations in a genome.