Humans differ from their closest evolutionary relatives, the non-human primates such as chimpanzees, in certain physiological and functional traits that relate to areas important to human health and well-being. For example, (1) humans have unique or enhanced brain function (e.g., cognitive skills, etc.) compared to chimpanzees; (2) humans have a longer life-span than non-human primates; (3) chimpanzees are resistant to certain infectious diseases that afflict humans, such as AIDS and hepatitis C; (4) chimpanzees appear to have a lower incidence of certain cancers than humans; (5) chimpanzees do not suffer from acne or alopecia (baldness); (6) chimpanzees have a higher percentage of muscle to fat; (7) chimpanzees are more resistant to malaria; (8) chimpanzees are less susceptible to Alzheimer's disease; and (9) chimpanzees have a lower incidence of atherosclerosis. At the present time, the genes underlying the above human/chimpanzee differences are not known, nor, more importantly, are the specific changes that have evolved in these genes to provide these capabilities. Understanding the basis of these differences between humans and our close evolutionary relatives will provide useful information for developing effective treatments for related human conditions and diseases.
Classic evolution analysis, which compares mainly the anatomic features of animals, has revealed dramatic morphological and functional differences between human and non-human primates; yet, the human genome is known to share remarkable sequence similarities with that of other primates. For example, it is generally concluded that human DNA sequence is roughly 98.5% identical to chimpanzee DNA and only slightly less similar to gorilla DNA. McConkey and Goodman (1997) TIG 13:350-351. Given the relatively small percentage of genomic difference between humans and closely related primates, it is possible, if not likely, that a relatively small number of changes in genomic sequences may be responsible for traits of interest to human health and well-being, such as those listed above. Thus, it is desirable and feasible to identify the genes underlying these traits and to glean information from the evolved changes in the proteins they encode to develop treatments that could benefit human health and well-being. Identifying and characterizing these sequence changes is crucial in order to benefit from evolutionary solutions that have eliminated or minimized diseases or that provide unique or enhanced functions.
Recent developments in the human genome project have provided a tremendous amount of information on human gene sequences. Furthermore, the structures and activities of many human genes and their protein products have been studied either directly in human cells in culture or in several animal model systems, such as the nematode, fruit fly, zebrafish and mouse. These model systems have great advantages in being relatively simple, easy to manipulate, and having short generation times. Because the basic structures and biological activities of many important genes have been conserved throughout evolution, homologous genes can be identified in many species by comparing macromolecule sequences. Information obtained from lower species on important gene products and functional domains can be used to help identify the homologous genes or functional domains in humans. For example, the homeo domain with DNA binding activity first discovered in the fruit fly Drosophila was used to identify human homologues that possess similar activities.
Although comparison of homologous genes or proteins between human and a lower model organism may provide useful information with respect to evolutionarily conserved molecular sequences and functional features, this approach is of limited use in identifying genes whose sequences have changed due to natural selection. With the advent of the development of sophisticated algorithms and analytical methods, much more information can be teased out of DNA sequence changes. The most powerful of these methods, “KA/KS” involves pairwise comparisons between aligned protein-coding nucleotide sequences of the ratios of                      ⁢                                        nonsynonymous            ⁢                                                   ⁢            nucleotide            ⁢                                                   ⁢            substitutions                                                            per            ⁢                                                   ⁢            nonsynonymous            ⁢                                                   ⁢            site            ⁢                                                   ⁢                          (                              K                A                            )                                                                        synonymous          ⁢                                           ⁢          substitutions                                              per          ⁢                                           ⁢          synonymous          ⁢                                           ⁢          site          ⁢                                           ⁢                      (                          K              S                        )                              (where nonsynonymous means substitutions that change the encoded amino acid and synonymous means substitutions that do not change the encoded amino acid). “KA/KS-type methods” includes this and similar methods. These methods have been used to demonstrate the occurrence of Darwinian molecular-level positive selection, resulting in amino acid differences in homologous proteins. Several groups have used such methods to document that a particular protein has evolved more rapidly than the neutral substitution rate, and thus supports the existence of Darwinian molecular-level positive selection. For example, McDonald and Kreitman (1991) Nature 351:652-654 propose a statistical test of neutral protein evolution hypothesis based on comparison of the number of amino acid replacement substitutions to synonymous substitutions in the coding region of a locus. When they apply this test to the Adh locus of three Drosophila species, they conclude that it shows instead that the locus has undergone adaptive fixation of selectively advantageous mutations and that selective fixation of adaptive mutations may be a viable alternative to the clocklike accumulation of neutral mutations as an explanation for most protein evolution. Jenkins et al. (1995) Proc. R. Soc. Lond. B 261:203-207 use the McDonald & Kreitman test to investigate whether adaptive evolution is occurring in sequences controlling transcription (non-coding sequences).
Nakashima et al. (1995) Proc. Natl. Acad. Sci USA 92:5606-5609, use the method of Miyata and Yasunaga to perform pairwise comparisons of the nucleotide sequences of ten PLA2 isozyme genes from two snake species; this method involves comparing the number of nucleotide substitutions per site for the noncoding regions including introns (KN) and the KA and KS. They conclude that the protein coding regions have been evolving at much higher rates than the noncoding regions including introns. The highly accelerated substitution rate is responsible for Darwinian molecular-level evolution of PLA2 isozyme genes to produce new physiological activities that must have provided strong selective advantage for catching prey or for defense against predators. Endo et al. (1996) Mol. Biol. Evol. 13(5):685-690 use the method of Nei and Gojobori, wherein dN is the number of nonsynonymous substitutions and dS is the number of synonymous substitutions, for the purpose of identifying candidate genes on which positive selection operates. Metz and Palumbi (1996) Mol. Biol. Evol. 13(2):397-406 use the McDonald & Kreitman test as well as a method attributed to Nei and Gojobori, Nei and Jin, and Kumar, Tamura, and Nei; examining the average proportions of Pn, the replacement substitutions per replacement site, and Ps, the silent substitutions per silent site, to look for evidence of positive selection on bindin genes in sea urchins to investigate whether they have rapidly evolved as a prelude to species formation. Goodwin et al. (1996) Mol. Biol. Evol. 13(2):346-358 uses similar methods to examine the evolution of a particular murine gene family and conclude that the methods provide important fundamental insights into how selection drives genetic divergence in an experimentally manipulatable system. Edwards et al. (1995) use degenerate primers to pull out MHC loci from various species of birds and an alligator species, which are then analyzed by the Nei and Gojobori methods (dN: dS ratios) to extend MHC studies to nonmammalian vertebrates. Whitfield et al. (1993) Nature 364:713-715 use Ka/Ks analysis to look for directional selection in the regions flanking a conserved region in the SR Y gene (that determines male sex). They suggest that the rapid evolution of SRY could be a significant cause of reproductive isolation, leading to new species. Wettsetin et al. (1996) Mol. Biol. Evol. 13(1):56-66 apply the MEGA program of Kumar, Tamura and Nei and phylogenetic analysis to investigate the diversification of MHC class I genes in squirrels and related rodents. Parham and Ohta (1996) Science 272:67-74 state that a population biology approach, including tests for selection as well as for gene conversion and neutral drift are required to analyze the generation and maintenance of human MHC class I polymorphism. Hughes (1997) Mol. Biol. Evol. 14(1):1-5 compared over one hundred orthologous immunoglobulin C2 domains between human and rodent, using the method of Nei and Gojobori (dN: dS ratios) to test the hypothesis that proteins expressed in cells of the vertebrate immune system evolve unusually rapidly. Swanson and Vacquier (1998) Science 281:710-712 use dN: dS ratios to demonstrate concerted evolution between the lysin and the egg receptor for lysin and discuss the role of such concerted evolution in forming new species (speciation).
Due to the distant evolutionary relationships between humans and these lower animals, the adaptively valuable genetic changes fixed by natural selection are often masked by the accumulation of neutral, random mutations over time. Moreover, some proteins evolve in an episodic manner; such episodic changes could be masked, leading to inconclusive results, if the two genomes compared are not close enough. Messier and Stewart (1997) Nature 385:151-154. In fact, studies have shown that the occurrence of adaptive selection in protein evolution is often underestimated when predominantly distantly related sequences are compared. Endo et al. (1996) Mol. Biol. Evol. 37:441-456; Messier and Stewart (1997) Nature 385:151-154.
Molecular evolution studies within the primate family have been reported, but these mainly focus on the comparison of a small number of known individual genes and gene products to assess the rates and patterns of molecular changes and to explore the evolutionary mechanisms responsible for such changes. See generally, Li, Molecular Evolution, Sinauer Associates, Sinderland, MA, 1997. Furthermore, sequence comparison data are used for phylogenetic analysis, wherein the evolution history of primates is reconstructed based on the relative extent of sequence similarities among examined molecules from different primates. For example, the DNA and amino acid sequence data for the enzyme lysozyme from different primates were used to study protein evolution in primates and the occurrence of adaptive selection within specific lineages. Malcolm et al. (1990) Nature 345:86-89; Messier and Stewart (1997). Other genes that have been subjected to molecular evolution studies in primates include hemoglobin, cytochrome c oxidase, and major histocompatibility complex (MHC). Nei and Hughes in: Evolution at the Molecular Level, Sinauer Associates, Sunderland, Mass. 222-247, 1991; Lienert and Parham (1996) Immunol. Cell Biol. 74:349-356; Wu et al. (1997) J. Mol. Evol. 44:477-491. Many non-coding sequences have also been used in molecular phylogenetic analysis of primates. Li, Molecular Evolution, Sinauer Associates, Sunderland, Mass. 1997. For example, the genetic distances among primate lineages were estimated from orthologous non-coding nucleotide sequences of beta-type globin loci and their flanking regions, and the evolution tree constructed for the nucleotide sequence orthologues depicted a branching pattern that is largely congruent with the picture from phylogenetic analyses of morphological characters. Goodman et al. (1990) J. Mol. Evol. 30:260-266.
Zhou and Li (1996) Mol. Biol. Evol. 13(6):780-783 applied KA/KS analysis to primate genes. It had previously been reported that gene conversion events likely have occurred in introns 2 and 4 between the red and green retinal pigment genes during human evolution. However, intron 4 sequences of the red and green retinal pigment genes from one European human were completely identical, suggesting a recent gene conversion event. In order to determine if the gene conversion event occurred in that individual, or a common ancestor of Europeans, or an even earlier hominid ancestor, the authors sequenced intron 4 of the red and green pigment gene from a male Asian human, a male chimpanzee, and a male baboon, and applied KA/KS analysis. They observed that the divergence between the two genes is significantly lower in intron 4 than in surrounding exons, suggesting that strong natural selection has acted against sequence homogenization.
Wolinsky et al. (1996) Science 272:537-542 used comparisons of nonsynonymous to synonymous base substitutions to demonstrate that the HIV virus itself (i.e., not the host species) is subject to adaptive evolution within individual human patients. Their goal was simply to document the occurrence of positive selection in a short time frame (that of a human patient's course of disease). Niewiesk and Bangham (1996) J Mol Evol 42:452-458 used the Dn/Ds approach to ask a related question about the HTLV-1 virus, i.e., what are the selective forces acting on the virus itself. Perhaps because of an insufficient sample size, they were unable to resolve the nature of the selective forces. In both of these cases, although KA/KS-type methods were used in relation to a human virus, no attempt was made to use these methods for therapeutic goals (as in the present application), but rather to pursue narrow academic goals.
As can be seen from the papers cited above, analytical methods of molecular evolution to identify rapidly evolving genes (KA/KS-type methods) can be applied to achieve many different purposes, most commonly to confirm the existence of Darwinian molecular-level positive selection, but also to assess the frequency of Darwinian molecular-level positive selection, to understand phylogenetic relationships, to elucidate mechanisms by which new species are formed, or to establish single or multiple origin for specific gene polymorphisms. What is clear is from the papers cited above and others in the literature is that none of the authors applied KA/KS-type methods to identify evolutionary solutions, specific evolved changes, that could be mimicked or used in the development of treatments to prevent or cure human conditions or diseases or to modulate unique or enhanced human functions. They have not used KA/KS type analysis as a systematic tool for identifying human or non-human primate genes that contain evolutionarily significant sequence changes and exploiting such genes and the identified changes in the development of treatments for human conditions or diseases.
The identification of human genes that have evolved to confer unique or enhanced human functions compared to homologous chimpanzee genes could be applied to developing agents to modulate these unique human functions or to restore function when the gene is defective. The identification of the underlying chimpanzee (or other non-human primate) genes and the specific nucleotide changes that have evolved, and the further characterization of the physical and biochemical changes in the proteins encoded by these evolved genes, could provide valuable information, for example, on what determines susceptibility and resistance to infectious viruses, such as HIV and HCV, what determines susceptibility or resistance to the development of certain cancers, what determines susceptibility or resistance to acne, how hair growth can be controlled, and how to control the formation of muscle versus fat. This valuable information could be applied to developing agents that cause the human proteins to behave more like their chimpanzee homologues.
All references cited herein are hereby incorporated by reference in their entirety.