The present invention is directed generally to genomic sequencing, and more specifically to efficiently obtaining accurate sequences of different haplotypes of a chromosome from a sequencing of a biological sample.
Current high-throughput genotyping technologies, when applied to DNA from a diploid individual, are able to determine which two alleles are present at each locus, but not the haplotype information (which combinations of alleles are present on each of the two chromosomes). Knowledge of the haplotypes carried by sampled individuals would be helpful in many settings, including linkage-disequilibrium mapping and inference of population evolutionary history, e.g., because genetic inheritance operates through the transmission of chromosomal segments and gene functionality loss.
The determination of the haplotype typically uses information from the general population, and not from data of the individual. For example, these methods work by applying the observation that certain haplotypes are common in certain genomic regions. Therefore, given a set of possible haplotype resolutions, these methods choose those that use fewer different haplotypes overall. Therefore, differences from the general population and specific recombinations are not identified, leading to these and other errors.
It is therefore desirable to provide methods of determining haplotypes of an organism (e.g. a person) from sequencing information of the individual that have increased accuracy, and can allow for efficient sequencing.