Rapid progress in DNA shotgun sequencing technologies has enabled systematic identification of the genetic variants of an individual (Wheeler et al., Nature 452, 872-876 (2008); Pushkarev et al., Nature Biotechnology 27, 847-850 (2009); Kitzman et al., Science Translational Medicine 4, 137ra176 (2012); and Levy et al., Plos Biology 5, e254 (2007)). However, as the human genome consists of two homologous sets of chromosomes, understanding the true genetic makeup of an individual requires delineation of the maternal and paternal copies, or haplotypes, of the genetic material. The utility of obtaining a haplotype in an individual can be several fold: first, haplotypes are useful clinically in predicting outcomes for donor-host matching in organ transplantation (Crawford et al., Annual Review Of Medicine 56, 303-320 (2005) and Petersdorf et al., PLoS Medicine 4, e8 (2007)) and are increasingly used as a means to detect disease associations (Studies et al., Nature 447, 655-660 (2007); Cirulli, et al., Nature Reviews. Genetics 11, 415-425 (2010); and Ng et al., Nature Genetics 42, 30-35 (2010)). Second, in genes that show compound heterozygosity, haplotypes provide information as to whether two deleterious variants are located on the same or different alleles, greatly impacting the prediction of whether inheritance of these variants are deleterious (Musone et al., Nature Genetics 40, 1062-1064 (2008); and Erythematosus, et al., Nature Genetics 40, 204-210 (2008); and Zschocke, Journal of Inherited Metabolic Disease 31, 599-618 (2008)). In complex genomes such as humans, compound heterozygosity may involve genetic or epigenetic variations at non-coding cis-regulatory sites located far from the genes they regulate (Sanyal et al., Nature 489, 109-113 (2012)), underscoring the importance of obtaining chromosome-span haplotypes. Third, haplotypes from groups of individuals have provided information on population structure (International HapMap, C. et al., Nature 449, 851-861 (2007); Genomes Project, C. et al., Nature 467, 1061-1073 (2010); and Genomes Project, C. et al., Nature 491, 56-65 (2012)), and the evolutionary history of the human race (Meyer et al., Science 338, 222-226 (2012)). Lastly, recently described widespread allelic imbalances in gene expression suggest that genetic or epigenetic differences between alleles may contribute to quantitative differences in expression (Gimelbrant et al., Science 318, 1136-1140 (2007); Kong et al., Nature 462, 868-874 (2009); Xie et al., Cell 148, 816-831 (2012); and McDaniell et al., Science 328, 235-239 (2010)). An understanding of haplotype structure will therefore be critical for delineating the mechanisms of variants that contribute to these allelic imbalances. Taken together, knowledge of complete haplotype structure in individuals is essential for advancing personalized medicine.
Recognizing the importance of haplotypes, several groups have sought to expand the understanding of haplotype structures both at the level of populations and individuals. Initiatives such as International Hapmap project and 1000 genomes project have attempted to systematically reconstruct haplotypes through linkage disequilibrium measures based on populations of unrelated individuals sequencing data or by genotyping family trios. However, the average length of accurately phased haplotypes generated using this approach is limited to ˜300 kb (Fan et al., Nature Biotechnology 29, 51-57 (2011) and Browning et al., American Journal of Human Genetics 81, 1084-1097 (2007)). Numerous experimental methods have also been developed to facilitate haplotype phasing of an individual, including LFR sequencing, mate-pair sequencing, fosmid sequencing, and dilution-based sequencing (Levy et al., PLoS Biology 5, e254 (2007); Bansal et al., Bioinformatics 24, i153-159 (2008); Kitzman et al., Nature Biotechnology 29, 59-63 (2011); Suk et al., Genome Research 21, 1672-1685 (2011); Duitama et al., Nucleic Acids Research 40, 2041-2053 (2012); and Kaper et al., Proc Natl Acad Sci USA 110, 5552-5557 (2013)). At best, these methods can reconstruct haplotypes ranging from several kilobases to about a megabase, but none can achieve chromosome-span haplotypes. Whole chromosome haplotype phasing has been achieved using Fluorescence Assisted Cell Sorting (FACS) based sequencing, chromosome-segregation followed by sequencing and chromosome micro-dissection based sequencing (Fan et al., Nature Biotechnology 29, 51-57 (2011); Yang et al., Proceedings of the National Academy of Sciences of the United States of America 108, 12-17 (2011); and Ma et al., Nature Methods 7, 299-301 (2010)). However, these methods are low resolution as they could phase only a fraction of the heterozygous variants in an individual, and more importantly, they are technically challenging to perform or require specialized instruments. Recently, whole genome haplotyping has been performed using genotyping from sperm cells (Kirkness et al., Genome Research 23, 826-832 (2013)). Although this approach can generate chromosome-span haplotypes at high resolution, it is not applicable to the general population and needs deconvolution of complex meiotic recombination patterns.
Along with whole-genome haplotyping, targeted haplotyping are also of importance. In particular, targeted haplotyping of HLA (Humnan leukocyte antigen) locus can aid in host-donor matching for organ transplantation and elucidating roles of cis-regulatory elements in gene activity.
Computational analysis has shown that an important factor in haplotype reconstruction from previously established DNA shotgun sequencing methods is the length of the sequenced genomic fragment (Tewhey et al., Nature Reviews. Genetics 12, 215-223 (2011)). For example, longer haplotypes can be obtained by mate pair sequencing (fragment or insert size˜5 kb) compared with conventional genome sequencing (fragment or insert size˜500 bp). However, there are technical limitations on how long these fragments can be. For instance, it is difficult to clone DNA fragments that are longer than what is obtained using fosmid clones. Hence, using existing shotgun sequencing approaches, it is difficult to generate haplotype blocks beyond 1 million bases, even at ultra-deep sequencing coverage.
Thus, there is a need for a method for reconstructing haplotypes at the whole genome level, as well as a method for targeted haplotyping.