1. Field of the Invention
The invention relates to the role of genes in human diseases. More particularly, the invention relates to compositions and methods for identifying genes that are involved in human disease conditions.
2. Summary of the Related Art
During the past two decades, remarkable developments in molecular biology 10 and genetics have produced a revolutionary growth in understanding of the implication of genes in human disease. Genes have been shown to be directly causative of certain disease states. For example, it has long been known that sickle cell anemia is caused by a single mutation in the human beta globin gene. In many other cases, genes play a role together with environmental factors and/or other genes to either cause disease or increase susceptibility to disease. Prominent examples of such conditions include the role of DNA sequence variation in ApoE in Alzheimer's disease, CKR5 in susceptibility to infection by HIV; Factor V in risk of deep venous thrombosis; MTHFR in cardiovascular disease and neural tube defects; p53 in HPV infection; various cytochrome p450s in drug metabolism; and HLA in autoimmune disease.
Surprisingly, the genetic variations that lead to gene involvement in human disease are relatively small. Approximately 1% of the DNA bases which comprise the human genome contain polymorphisms that vary at least 1% of the time in the human population. The genomes of all organisms, including humans, undergo spontaneous mutation in the course of their continuing evolution. The majority of such mutations create polymorphisms, thus the mutated sequence and the initial sequence co-exist in the species population. However, the majority of DNA base differences are functionally inconsequential in that they neither affect the amino acid sequence of encoded proteins nor the expression levels of the encoded proteins. Some polymorphisms that lie within genes or their promoters do have a phenotypic effect and it is this small proportion of the genome's variation that accounts for the genetic component of all difference between individuals, e.g., physical appearance, disease susceptibility, disease resistance, and responsiveness to drug treatments.
The relation between human genetic variability and human phenotype is a central theme in modern human genetic studies. The human genome comprises approximately 4 billion bases of DNA. The Human Genome Project is uncovering more and more of the of the consensus sequence of this genome. However, there remains a need to identify the nature and location of genetic variations that are implicated in human disease conditions.
Sequence variation in the human genome consists primarily of single nucleotide polymorphisms (“SNPs”) with the remainder of the sequence variations being short tandem repeats (including microsatellites), long tandem repeats (minisatellite) and other insertions and deletions. A SNP is a position at which two alternative bases occur at appreciable frequency (i.e., >1%) in the human population. A SNP is said to be “allelic” in that due to the existence of the polymorphism, some members of a species may have the unmutated sequence (i.e., the original “allele”) whereas other members may have a mutated sequence (i.e., the variant or mutant allele). In the simplest case, only one mutated sequence may exist, and the polymorphism is said to be diallelic. The occurrence of alternative mutations can give rise to triallelic polymorphisms, etc. SNPs are widespread throughout the genome and SNPs that alter the function of a gene may be direct contributors to phenotypic variation. Due to their prevalence and widespread nature, SNPs have potential to be important tools for locating genes that are involved in human disease conditions. Wang et al., Science 280: 1077-1082 (1998), discloses a pilot study in which 2,227 SNPs were mapped over a 2.3 megabase region of DNA.
To be useful for locating and identifying genetic variations linked to human disease, however, it is necessary to identify and map a much larger number of SNPs, and to do so throughout the human genome. There is therefore a need for the identification and mapping of a very large number of SNPs throughout the entire human genome.