The Human Genome Project has resulted in a remarkable reduction in sequencing costs, from about $10 to less than $0.00001 per finished base. Exome sequencing can now be routinely used in both research and clinical settings for the detection of inherited or acquired mutations related to disease, and the FDA has listed over 100 drugs that have genotype information on their labels. In addition, the use of whole genome sequencing (WGS) has become widespread. However, current nucleic acid sequencing technology can be limited by sequencing length. As such, there can still be major limitations of the current technology which can severely limit the feasibility and utility of WGS for many studies. Namely, the read length of these “Next-Generation Sequencing” (NGS) technologies can be relatively short. The industry standard for sequencing may arguably be the Illumina HiSeq2500, which can sequence paired 150 base reads. With this relatively short read length, whole genome re-sequencing studies can generally be quite useful for identifying single nucleotide variants (SNVs); however, the relatively short read lengths can also be notoriously unreliable for identifying large insertions/deletions (indels) as well as structural variants.
In addition, it can often be difficult to phase the variants using short reads without considerable additional experimentation. Thus many clinical applications require or could benefit from long sequencing.
Currently, there are technologies that can generate long reads that have low accuracy, low throughput, and are costly. Therefore, they are not viable options for whole genome sequencing. Finally, other sequencing technologies do not provide detailed sequence information.
To address these issues, the methods, compositions, systems and kits described herein are provided to produce very long reads, i.e., mega base range, as well as accurately identify many, if not all, genetic variants (e.g., single nucleotide polymorphisms, insertions/deletions, polyploidy, transpositions, repeats and/or structural variants) and phase any identified variant to the appropriate homologous chromosome.