The present invention generally relates to the field of computational biology, and more particularly relates to exact haplotype reconstruction of F2 populations.
An important question when studying the genetics of an individual is that of determining its haplotype organization, i.e., determining the parental origin of each region in the individual's genome. When diploid organisms reproduce, crossovers frequently occur during meiosis. Therefore, progenies do not always receive complete copies of their parents' chromosomes. Instead, the genetic material inherited from a parent is often a combination of segments from the two chromosomes present in that parent, i.e. a combination of the two haplotypes of the parent (and similarly for material inherited from the other parent).
In practice, haplotypes are often constructed from genotype data. The genotype of an individual represents a sequence of unordered pairs of allele values associated with the diploid genome. Genotypes are often sampled as sequences of SNP (single-nucleotide polymorphism) marker values at locations spread across the genome. Determining the haplotypes for an individual at a given marker involves assigning each of the two unordered allele values to the correct parent from which it was inherited, and more precisely, to the correct haplotype of that parent.
Haplotype studies are extensively applied in the field of population genetics. For example, one can reconstruct so called ancestral founder haplotypes and represent the current population as a mosaic of those sequences. In another example, when the diploid parental genomes are known, one can separate them into haplotypes and represent the progenies as a mosaic of those haplotypes. The challenges of such applications include a) accurately collecting genomic data on the studied population, and b) constructing efficient and accurate algorithms for solving the associated combinatorially challenging problems.