Normal human somatic cells are diploid (i.e., having two copies of genome: a paternal set of chromosomes and a maternal set of chromosomes in each nucleus). Within each individual, these two sets of chromosomes have different nucleotide sequences (single-nucleotide polymorphism (SNP)) at multiple loci. Conventional genotyping assays analyze a mixture of these two sets of chromosomes, which leads to uncertainty and complexity. For example, for any two SNP loci that are both heterozygous, there will be four possible haplotypes between these two SNPs. However, since the phase information was erased when doing the single SNP genotyping using the conventional platforms, none of these four possible haplotypes can be eliminated. One way to solve this problem is to find a reliable method to re-establish or retract the phase information. Another way is to extract the phase information before doing genotyping.
The skilled artisans in this field used the various statistical algorithms to re-establish the phase information. These algorithms include Clark's algorithm, expectation-maximization (EM) algorithm, coalescence-based algorithms (pseudo-Gibbs sampler and perfect/imperfect phylogeny), and partition-ligation algorithms implemented by a fully Bayesian model (Haplotype) or by EM (PLEM) (Liu N, et al., Advances in Genetics, 60: 325-405, 2008). Statistical configuration of haplotypes based on unphased genotype data usually gives a large number of uncertain haplotypes, which significantly reduces the power in genetic applications. In addition, it is still controversial as to whether the configured haplotypes should be treated as objective observations of genotypes and phenotypes in these studies. While genotypes from family members can often help to determine the haplotypes, haplotype inference from family data is often limited by uninformative or missing data. Moreover, late-age onset for most of the common human diseases can preclude collection of DNA samples from previous generations. Therefore, these methods are not suitable for the molecular diagnosis in personalized medicine in the future.
In parallel, some researchers developed experimental methods to extract the phase information in the genomic DNA samples before genotyping. These methods are all based on the physical separation of two homologous genomic DNAs before genotyping. The challenge is how to separate two almost identical copies of chromosomes in diploid cells. Several strategies/technologies have been developed for separating diploid samples into their haploid components, such as 1) Long-range allele-specific genomic PCR (Michalatos-Beloin S, et al., Nucleic Acids Res 24: 4841-4843, 1996; and Yu C E, et al., Genomics 84: 600-612, 2004); 2) Haplotype-Specific Extraction (HSE) (Nagy M, et al., Tissue Antigens 69: 176-180, 2007); 3) Generation of somatic haploid cells, such as GMP conversion (Douglas J A, et al., Nat Genet 28: 361-364, 2001); 4) Polony (Mitra et al., Proc Natl Acad Sci USA 100: 5926-5931, 2003; Zhang K, et al., Nat Genet 38:382-387, 2006); 5) Clone-based systematic haplotype (CSH) (Burgtorf C, et al., Genome Res 13: 2717-2724, 2003); 6) single molecule dilution (SMD) (Ding C, et al., Proc Natl Acad Sci USA 100: 7449-7453, 2003); and 7) Sperm typing.
Long-range Allele-Specific Genomic PCR uses specifically designed PCR primers to selectively amplify the target region from only one of the sister chromosomes. Selective amplification is achieved by designing a primer that will match/mismatch to one of the alleles at the 3′-end of the primer. Thus, the primer cannot amplify the unmatched chromosomal DNA template efficiently. Genotyping will be done subsequently on the amplification products. Because these PCR products are obtained from only one of the chromosomes, the alleles of different SNPs along these PCR products reveal the haplotype.
In this method, the maximal distance of the genetic markers in the haplotype is determined by the maximal length that PCR can reach and the chromosome integrity in DNA preparation. Therefore, the haplotype length is restricted by the PCR capacity, which is about 40 kb for long PCR. This method is often technically challenging and requires extensive optimization of PCR conditions for every primer pair to improve the amplification efficiency of long PCR. Different combinations of several primer pairs and buffers are usually recommended to optimize PCR condition. However, this method is not applicable to high throughput analysis of haplotypes.
Haplotype-Specific Extraction (HSE) uses specifically designed probes to selectively capture the fragments from only one of the sister chromosomes. Selective binding is achieved by designing a probe that specifically recognizes one allele of a SNP. If an individual is a heterozygote, when this probe is added into the denatured genomic DNA samples, the probe will seek and bind only to the genomic DNA fragments containing its target allele. Therefore, the probe-bound DNA fragments are captured by immobilized magnetic beads and the unbound DNA fragments with the other allele of this SNP will be washed away. Now the genomic DNA in diploid state is reduced to haploid state and ready for all subsequent analysis including genotyping/haplotype. Because distinct polymorphic differences always exist between two parental chromosomes, HSE can distinguish and separate the two parental copies for any chromosomal segments.
In this method, the maximal distance of the genetic markers in haplotype is determined by the chromosome integrity in DNA preparation and the DNA denaturation. This method can resolve haplotypes within a distance of <50 kb so far. If molecular haplotypes over extended distances are needed, multiplexed haploseparations have to be carried out.
GMP Conversion Technology is built upon constructions of cell hybrids from viable human cells (typically lymphocytes or fibroblasts) and a rodent cell line. Because these hybrid cells retain only a subset of human chromosomes, they can be either null, monosomic or disomic for each pair of human chromosomes. Those monosomic cells are haploid for the corresponding chromosomes and ready for subsequent genotyping assays for determination of haplotype.
In this technology, cells are electrofused and then propagated under a selective condition, for example, using the HPRT1/HAT (hypoxanthine, aminopterin, and thymidine) system. After 2-4 weeks of growth, fused clones are harvested, and DNA is prepared for analysis. The monosomic clones can be identified by genotyping a few, highly polymorphic markers per chromosome, which minimally requires a single heterozygous genotype. Nonetheless, there are still some technical challenges on conversion-based haplotyping, including low DNA concentrations, preferential amplification, and insertions or deletions of chromosomal segments (Douglas J A et al., Nat Genet 28: 361-364, 2001).
It has been observed that whole chromosomes rather than chromosomal fragments are generally retained in the hybrid cells (Supra Douglas 2001). Therefore, this method does not have any restrictions on the distance of SNPs in a haplotype. The application of GMP Conversion Technology is restricted to a very limited number of subjects and chromosomal regions because of the inefficiencies and variations in fusion and selection conditions. Numerous cell lines are required for each individual. Conversion-based haplotyping is still very time-consuming and very costly.
Polony Technology uses a polyacrylamide gel to work on an in situ single molecule of chromosomal DNA. In this technology, genomic DNA from an individual is first diluted to a very low concentration, and then mixed with acrylamide and spread onto a glass microscope slide to form a thin DNA-containing polyacrylamide gel. Because the DNA concentration is so low, the DNA molecules are well separated from each other. An in-gel PCR is then performed directly on this gel, with 2 pairs of PCR primers to amplify two loci of the SNPs of interest from a single DNA molecule. Because the acrylamide matrix restricts the diffusion of linear DNA molecules, PCR products accumulate around their amplification template forming two overlapping PCR colonies (polony). The genotypes of these two SNPs are determined in situ by single-base extension (SBE) assay separately for these two SNPs and the gels are read by a laser scanner. After overlaying the two SBE images, the alleles observed on the same spot indicate the allele combination (haplotype) of these two SNPs of this patient sample.
The maximal haplotype length of Polony is determined by the DNA fragmentation or degradation before, during and after the acrylamide polymerization. It is reported that this method has measured the haplotype as long as 45 kb so far (Mitra, et al., PNAS USA 100: 5926-5931, 2003; Zhang K, et al., Nat Genet 38:382-387, 2006).
There are several inherent caveats in the Polony method. One major limitation of Polony haplotyping is that it is not efficient for scaling up the number of SNPs. But it is often desirable to haplotype a large number (100-10,000) of SNPs along a chromosome. Second, the DNA molecules may overlap in the gel. Therefore, the DNA concentration and plating condition is critical. Third, the PCR coamplification efficiency is low (4-15% for samples from buccal swabs, 15-34% for samples from the other collection methods). The coamplification efficiency is related to the presence of ungelled acrylamide in the Polony gel during thermal cycling and DNA fragmentation or degradation. Technical optimization (such as degas and polymerization condition) may be required. Lastly, this technology requires metaphase cells.
Clone-based Systematic Haplotyping (CSH) uses fosmid/cosmid cloning to isolate a single copy from diploid chromosomes. Because each vector molecule can hold only one insert molecule, each colony derived from successful vector-insert ligation will hold only a haploid chromosomal segment. By screening the colony library, the clones that contain the target chromosomal segments will be obtained for subsequent haplotyping analysis. Because the vector cannot successfully accept inserts with a very large size beyond their maximal cloning capacities, CSH can separate a haploid fragment of ˜50 kilobases. In addition, this method is very time-consuming and costly.
Single Molecule Dilution (SMD) is built upon the idea that a single molecule is certainly a haploid fragment because diploid chromosomes are a pair of copies and require two DNA molecules to constitute a diploid. To obtain a single molecule in each reaction tube, genomic DNA samples are diluted to an extremely low concentration. We have known that each diploid genome of human is ˜6.7 pg, so if a tube contains only ˜3.3 pg of genomic DNA, it must have single molecules for some chromosomal regions because the DNA amount is not sufficient for every chromosomal region to have two copies in that tube. This very low DNA concentration is achieved by serial dilutions. After serial dilution, for any given chromosomal segment, each tube may contain no DNA, one molecule of DNA for that region, or two molecules of DNA for that region. The tiny amount of DNA samples in these tubes is then amplified and genotyped; allele drop-out at previously identified heterozygous SNP loci of this individual is used to screening out the “single-molecule” tubes for further experiments. The caveat of this method is that it relies on statistical isolation of single DNA molecules, so there is no experimental guarantee for its success.
In this method, due to frequent shearing in serial dilutions, genomic DNA is broken down. The maximal distance is so far reported to be 24 kb in haplotyping distance (Ding C, et al., PNAS USA 100: 7449-7453, 2003).
Sperm Typing is built upon the fact that a sperm is a product of meiosis and only contains a haploid genome. Despite sperm being haploid, sperm haplotypes are not simply equal to the donor's haplotype. The sperm haploid genome is not any one of parental chromosomes of this individual. However, by genotyping several sperms from one individual and then analyzing the haplotype data from these sperms, the haplotypes of this individual can be inferred. Therefore, sperm typing is different from the above molecular haplotyping methods because it is not a direct haplotyping.
Different sperms have gone through different crossing over events in meiotic recombination, so sperms from the same individual will have different haplotypes. In crossing over, two chromatids exchange their distal arms of chromosomes; usually this distal end of the chromosomes are exchanged only once in humans, sometimes twice or more times. Therefore, it is possible to infer the haplotypes of the original patient from a number of sperm under the assumption that only one crossing over event occurred in the studied sperms. However, since sperm typing is limited to male only, the procedure is tedious and costly, and the haplotypes are inferred results, not direct observations; sperm typing is not widely used for molecular haplotyping.
In summary, the currently available experimental methods for chromosome separation often cause the chromosome breakdown so they cannot obtain the long-range haplotypes. In addition, they are extremely time-consuming and labour-intensive, so they are not practically feasible in researcher laboratories and clinics. There still exists a need for a haplotyping method that can be performed quickly at low cost.