Next generation sequencing (NGS) can be utilized for a wide variety of life science applications, such as, e.g., diagnostics, epidemiology, and forensics. Widely used NGS technologies commonly involve sequencing of short reads (e.g., 100 nt reads) which can then be mapped to a reference genome. While such technologies can be useful for the detection of sequence variants (e.g., mutations), they are also associated with certain limitations. For example, NGS can be used to sequence genomes or genomic regions of diploid organisms. A genome of a diploid organism can comprise sets of chromosome pairs, wherein each chromosome pair comprises a maternal chromosome and a paternal chromosome. For example, a diploid organism can inherent a maternal allele and a paternal allele at a single locus. Thus, alleles detected at a plurality of loci which map to a particular chromosome might exist within a single chromosome (maternal or paternal) of a chromosome pair, or across both chromosomes of a chromosome pair. While many NGS platforms are useful for detecting alleles at a plurality of loci, they currently do not provide phasing and/or haplotype information e.g., to distinguish whether alleles detected at a plurality of loci that map to a particular chromosome are co-located on the same chromosome or are located on separate chromosomes in a chromosome pair. Determining whether a plurality of alleles are co-located on the same (maternal or paternal) chromosome or are located on different chromosomes can be useful for a variety of reasons, as discussed further below.
The pattern of alleles within each individual chromosome can be referred to as haplotype. Haplotyping can have many diagnostic and clinical applications. For example, two inactivating mutations across different loci within a single gene might be of little or no consequence if present on the same individual chromosome (i.e. chromosome of either maternal or paternal origin), because the other copy of the gene product will remain functional. On the other hand, if one of the inactivating mutations is present in the maternal chromosome and the other in the paternal chromosome, there can be no functional copy of the gene product, which can result in a negative phenotype (non-viability, increased risk for disease and others). Haplotyping can also be used to predict risk or susceptibility to specific genetic diseases, as many genetic associations are tied to haplotypes. For example, the various haplotypes of the human leukocyte antigen (HLA) system can be associated with genetic diseases ranging from autoimmune disease to cancers.
Another instance in which phasing information can be useful is distinguishing between functional genes and their non-functional pseudogene counterparts within the genome. One well known functional gene/pseudogene pair is the genes SMN1 and SMN2, which differ in sequence by only five nucleotides over many Kb of sequence, yet one of the nucleotide differences renders the SMN2 gene almost completely non-functional. Using short read sequencing, a mutation may be found in one of the two genes, but unless the mutation happens to occur within the sequencing read that also covers one of the known nucleotide differences between SMN1 and SMN2, it will be difficult to know which of the genes (the functional gene, or the nonfunctional pseudogene) is mutated.
The present NGS methods can employ short read sequencing to query regions of variable DNA sequence (polymorphisms etc.) interspersed within regions of conserved DNA sequence. As significant blocks of conserved sequence can be interspersed between the variable regions, short read sequencing does not lend itself to phasing analysis. Although methods have been developed to obtain phasing information, these methods (for example, Sanger sequencing and subcloning), can be labor intensive and/or costly.
There is a need for improved NGS methods that provide phasing information. Such methods can provide a highly parallel platform for performing multiple sequencing reactions from the same immobilized templates. Methods described herein fulfill this need.