Most human traits are genetically complex, oligogenic/polygenic or multifactorial traits. This is the case for most human diseases and other medically relevant phenotypes such as drug-response.
A need exists for a method to rapidly identify the relevant genes. This will lead to improved understanding, prognosis, diagnosis, treatment, prevention, and establishment of markers for individualized medicine and development of new drugs. Complex diseases (or traits) are usually affected by two or more interacting genes. Sometimes more than 100 genes are involved, each contributing a small effect on the risk/susceptibility for the disease. Many of the related alleles appear at low frequency even in the patient population (some <10%, most <30%) since a given gene may not be absolutely required for the disease to occur. However, the related alleles will tend to occur at least at some higher frequency in the DNA of diseased individuals compared to the DNA of control subjects. These allele frequency differences form the basis for strategies to identify those genes linked to the disease.
Humans are an out-bred species. Many polymorphisms or sequence variations exist, of which may have no relation to the trait of interest. In other words, the variants exist not only between the diseased and control cohorts, but also within each group. Actually most of the human polymorphism exists within each population. So, parallel comparison of the variations between the trait population with its control population is critical when the associated genetic markers or genes is desired.
The approaches currently available for gene discovery, such as Functional cloning depending on information about the protein, and Positional cloning relying on gene position information, are mainly successful for simple Mendelian diseases. A few successful efforts have been reported on limited number of genetic markers and on some knowledge for mapping human complex traits, for example using Candidate gene approach or genome scanning with limited number of micro satellites or other markers, yet many obstacles exist. New approaches are required for complex disease gene identification.
One whole genome screening approach previously proposed, as GMS (Genomic Mismatch Screening), is a two-step method for biochemical enrichment of the regions of the genome at which two individuals share identical alleles; it is designed to map all the regions of genetic IBD (identical-by-descent) between two related individuals. First, heterohybrid DNA molecules formed by a process of solution hybridization between two genomic DNA fragment pools from two individuals are purified by a procedure based on differential restriction methylation and endonuclease digestion. A DNA methylase was used to methylate the DNA of one individual but not the DNA of the second individual. The DNAs were then mixed, denatured, and reannealed to from a mixture of heterohybrid and homohybrids DNA. This will result in hemimethylated DNA for heterohybrids that is resistant to certain restriction endonucleases. Homohybrid DNA, in contrast can be eliminated by digestion. Second, mismatch-containing hybrids formed between nonidentical alleles are eliminated by treatment with the Escherichia coli mismatch repair enzymes, Mut H, MutL and Mut S, which are capable of binding and modifying the base mispair-containing hybrids in the existence of “GATC” site. The remaining mismatch-free heterohybrids, representing loci at which the two individuals share identical alleles, can then be mapped in a single genome-wide hybridization step. Researches have shown that GMS can be used to mapp the regions of IBD (Identical-by-Descent) between 2 strains of yeast, mouse, or two human individuals (Nelson S F, Nature Genetics, 4:11, 1993; Mirzayans F, Am J Hum Genet, 61:111, 1997; Cheung V G, Genomics, 47: 1-6, 1998; McAllister L, Genomics, 47: 7, 1998; Cheung V G, Nature Genetics, 18: 225, 1998; Gerton J L, PNAS, 97; 11383,1999).
The genetic analysis of this invention is based on the frequencies of single nucleotide polymorphisms (SNPs). SNPs are the most abundant, stable and evenly distributed bi-allelic polymorphisms in the human genome, and occur at the rate of 1/300- 1/1000 bp between 2 genome samples (>3×106 SNPs) or 1/2000 bp between 2 coding sequences (cSNPs). In human populations, about 2×107 SNPs are expected. As the coding regions are 2.5-5% of the genome, so the total number of cSNPs is estimated to be >2.5% (>5×105 cSNPs) of the total SNPs in the human genome, an average of about 6 per gene, with about half of them resulting in non-synonymous codon usages (Collins F S, Genome Res., 1998: 8:1229-1231; Brookes A J, Gene, 1999, 234:177-186). 90% of the sequence variants in humans are SNPs. In recently years SNPs have been considered to be the best gene-mapping marker.
Different from that of GMS, the genetic analysis of this invention is based on the frequencies of single nucleotide polymorphisms (SNPs). SNPs are considered to be the best gene mapping marker in recent years with the human genome project is coming to be accomplished. SNP marker has several advantages comparing to the other previous commonly used genetic markers such as RFLP (restriction fragment length polymorphism), STR (short tandem repeats) or IBD used in GMS. SNPs are the most abundant, stable and evenly distributed bi-allelic polymorphisms in the human genome, and occur at the rate of 1/300- 1/1000 bp between 2 genome samples (>3×106 SNPs) or 1/2000 bp between 2 coding sequences (cSNPs). In human populations, about 2×107 SNPs are expected. As the coding regions are 2.5-5% of the genome, so the total number of cSNPs is estimated to be >2.5% (>5×105 cSNPs) of the total SNPs in the human genome, an average of about 6 per gene, with about half of them resulting in non-synonymous codon usages (Collins F S, Genome Res., 1998: 8: 1229-1231; Brookes A J, Gene, 1999, 234: 177-186). 90% of the sequence variants in humans are SNPs.
Currently, the major technique for applying SNPs in gene mapping is SNP typing based on knowledge of the individual SNPs. Variants of approaches based on variants of mechanisms have been invented to accomplish SNP typing. These include overlapping genomic sequencing or minisequencing, olignucleotide ligation assays (OLA), primer extension assays, allele-specific oligonucleotide (ASO) hybridization, exonuclease assays or 5′ nuclease assay, single base chain extension, and so on. Only some of them such as microarray techniques and mass spectrometry have the necessary features for industrial-scale SNP typing. Although these techniques succeed in some applications, few excellent examples using these techniques to map a typical complex disease have been reported. The major obstacles facing gene mapping of complex traits with SNPs are two. First, the SNPs currently or in near future available in trait mapping are too few in number, so they do not satisfy the requirement of genome coverage; second, thousands of individual samples may be required according to the theoretic prediction of geneticists. In addition, the intrinsic characteristics of the genetic complexity of complex traits and the complexity of the human genome also make the difficulties.
One approach previously proposed to use SNPs in genetic analysis is based on mixing the DNAs of two individuals together, denaturing, and then reannealing the strands back together. Therefore the DNA strands of the different individuals can base pair with each other. In this case, a mixed double stranded DNA will be formed, called a heterohybrid, in which one of the strands of the double helix is contributed by one individual, and the complementary strand is contributed by a different individual. Where the individuals have different DNA sequences (polymorphism), the strands of the heterohybrid will not form correct base pairs. Therefore, a high rate of polymorphism between the individuals will result in many mispaired bases in the heterohybrid DNA and a low rate of polymorphism will result in more perfectly matched bases. This difference forms the basis for methods to rapidly measure the diseases-related identical-by-descent (IBD) sequences. (IBD refers to sequences that individuals have in common, i.e. having low polymorphism, as a result of inheriting a trait from a common ancestor. IBD is used to associate sequences of low polymorphism to the trait.)
A class of DNA repair enzymes, MutHLS, was used to identify the mispaired bases in heterohybrid DNA. These enzymes are capable of binding and modifying base mispairs. The repair enzymes were used to remove the mispaired bases and thereby reveal the IBD sequences among two or more individuals. Such a strategy using yeast as the test organism was presented by Nelson and associates in 1993, and referred to as Genomic mismatch scanning (GMS)(Nelson S. F., et al. Nature Genetics, 1993, 4:11-18 and related subsequent papers). GMS has been modified and successfully used in screening traits related by IBD from yeast and from human chromosomes in conjunction with putative disease gene localization information. A critical step in the GMS procedure is the enrichment of heterohybrid DNA away from homohybrid DNA. A relatively complex and laborious approach was used (Nelson S. F., et al. Nature Genetics, 1993, 4:11-18; Cheung, V. G., et al. (1998) Nature Genetics, vol. 18, 224-230) that required multiple steps. A DNA methylase was used to methylate the DNA of one individual while not the methylating the DNA of the second individual. The DNAs were then mixed, denatured, and reannealed to from a mixture of heterohybrid and homohybrids DNA. This will result in hemimethylated DNA for heterohybrids that is resistant to certain restriction endonucleases. Homohybrid DNA, in contrast can be eliminated by digestion.
Some repair enzymes such as MutHLS have been used to detect the existence of mismatch-containing DNA fragments. (ref. Taylor G R, Deeble J. Genet Anal. 1999;14(5-6):181-6; Marra G, Schar P. Biochem. J. 1999;338 (Pt 1):1-13). DNA glycosylases have also been tested to detect DNA damage or mutation (Dennog C, et al. Mutat Res. 1999; 17;431(2):351-9; Gualillo O, et al. Vaughan P, et al. Genet Anal. 1999; 14(5-6):169-75).
DNA glycosylases have also been tested to detect DNA damage or mutation (Dennog C, et al. Mutat Res. 1999; 17;431(2):351-9; Gualillo O, et al. Vaughan P, et al. Genet Anal. 1999; 14(5-6):169-75). Most of the researches relaying on these MRS enzymes are focused on the detection of signal to noise ratio of the test sample comparing to control sample regarding one or a limit number of known SN Ps-containing fregmants, so as to determine whether or not any of these potential SNPs really exist, or to detect the mutation in one or a limited number of known genes. No report was found to apply this kind of enzyme to separate mismatch fragment pool from perfectly matched fragment pool.