The DNA that makes up human chromosomes provides the instructions that direct the production of all proteins in the body. These proteins carry out the vital functions of life. Variations in DNA often produce variations in the proteins, thus affecting the function of cells. Although environment often plays a significant role, variations or mutations in DNA are directly related to almost all human diseases, including infectious disease, cancer, inherited disorders, and autoimmune disorders. Moreover, knowledge of human genetics has led to the realization that many diseases result from either complex interactions of several genes or from any number of mutations within one gene. For example, Type I and II diabetes have been linked to multiple genes, each with its own pattern of mutations. In contrast, cystic fibrosis can be caused by any one of over 300 different mutations in a single gene.
Additionally, knowledge of human genetics has led to a limited understanding of differences between individuals when it comes to drug response—the field of pharmacogenetics. Since the first correlation over half a century ago of adverse drug responses with amino acid variations in two drug-metabolizing enzymes, plasma cholinesterase and glucose-6-phosphate dehydrogenase, careful genetic analyses have linked sequence polymorphisms in over 35 drug metabolism enzymes, 25 drug targets and 5 drug transporters with compromised levels of drug efficacy or safety. In the clinic, such information is being used to prevent drug toxicities; for example, patients are often screened for genetic differences in the thiopurine methyltransferase gene that cause decreased metabolism of 6-mercaptopurine or azathiopurine. Yet only a small percentage of observed drug toxicities have been explained adequately by the set of pharmacogenetic markers validated to date. In addition, insufficient therapeutic efficacy or unanticipated side effects in “outlier” individuals when administered drugs previously demonstrated to be both safe and efficacious in clinical trials is a tremendous problem for health care practitioners and presents a significant dilemma to the pharmaceutical industry.
Disease-related and pharmacogenetic gene validation relies on elements of population and quantitative genetics and robust statistical metrics; however, the first step normally relies upon identification of a candidate target gene. To date, various biotechnological methods have been employed to identify candidate genes. For example, differential gene expression has been employed, essentially looking for differences in gene expression between affected and unaffected individuals or between treated and untreated individuals. In addition, protein-protein interaction maps to identify drug receptors and their immediate effectors have been used. Another approach involves mining human sequence databases for sequences similar to accepted disease-related or pharmacokinetic or pharmacodynamic regulators.
Because any two humans are 99.9% similar in their genetic makeup, most of the sequence of the DNA of their genomes is identical. However, there are variations in DNA sequence between individuals. For example, there are deletions of many-base stretches of DNA, insertion of stretches of DNA, variations in the number of repetitive DNA elements in noncoding regions, and changes in single nitrogenous base positions in the genome called “single nucleotide polymorphisms” or “SNPs”. It is estimated that there are 3 to 4 million common SNPs that occur in at least 10 percent of people. These common SNPs do not occur independently but are passed from generation to generation in variable-length blocks of multiple SNPs, forming patterns across the genome. Such blocks of SNPs are called SNP haplotype blocks herein.
The candidate gene identification strategy most relevant to SNPs is whole-genome association on various populations of individuals—that is, scanning the entire genomes of populations of individuals to correlate SNPs to disease or drug response. Such whole-genome analyses would provide a fine degree of genetic mapping and pinpoint specific regions of linkage. Methods have been proposed and are used in connection with whole genome analysis. For example, the methods described in U.S. Ser. No. 60/327,006, filed Oct. 5, 2001, “Identifying Human SNP Haplotypes, Informative SNPs and Uses Thereof,” assigned to the assignee of the present invention, incorporated herein by reference for all purposes) have been proposed for use in such applications. U.S. Ser. No. 10/166,341, filed Sep. 18, 2001, “Human Genomic Polymorphisms”, assigned to the assignee of the present invention incorporated herein by reference for all purposes, provides the identity of SNPs and SNP haplotype blocks across one representative chromosome, ie. Chromosome 21.
While meeting with success, it is desirable to increase the speed and efficiency at which such analyses can be performed, as well as to decrease the cost of performing such analyses.