The prolific output from numerous genomic sequencing efforts, including the Human Genome Project, is creating an ever-expanding foundation for large-scale study of protein function. Indeed, this emerging field of proteomics can appropriately be viewed as a bridge that connects DNA sequence information to the physiology and pathology of intact organisms. As such, proteomics—the large-scale study of protein function—will likely be starting point for the development of many future pharmaceuticals. The efficiency of drug development will therefore depend on the diversity and robustness of the methods used to elucidate protein function, i.e., the proteomic tools that are available.
Several approaches are generally known in the art for studying protein function. One method is to analyze the DNA sequence of a particular gene and the amino acid sequence coded by the gene in the context of sequences of genes with known functions. Generally, similar functions can be predicted based on sequence homologies. This “homology method” has been widely used, and powerful computer programs have been designed to facilitate homology analysis. See, e.g., Altschul et al., Nucleic Acids Res., 25:3389–3402 (1997). However, this method is useful only when the function of a homologous protein is known.
Another useful approach is to interfere with the expression of a particular gene in a cell or organism and examine the consequent phenotypic effects. For example, Fire et al., Nature, 391:806–811 (1998) disclose an “RNA interference” assay in which double-stranded RNA transcripts corresponding to a particular target gene are injected into cells or organisms to determine the phenotype associated with the disrupted expression of that gene. Alternatively, transgenic technologies can be utilized to delete or “knock out” a particular gene in an organism and the effect of the gene knockout is determined. See e.g., Winzeler et al., Science, 285:901–906 (1999); Zambrowicz et al., Nature, 392:608–611 (1998). The phenotypic effects resulting from the disruption of expression of a particular gene can shed some light on the functions of the gene. However, the techniques involved are complex and the time required for a phenotype to appear can be long, especially in mammals. In addition, in many cases disruption of a particular gene may not cause any detectable phenotypic effect.
Gene functions can also be uncovered by genetic linkage analysis. For example, genes responsible for certain diseases may be identified by positional cloning. Alternatively, gene function may be inferred by comparing genetic variations among individuals in a population and correlating particular phenotypes with the genetic variations. Such linkage analyses are powerful tools, particularly when genetic variations exist in a traceable population from which samples are readily obtainable. However, readily identifiable genetic diseases are rare and samples from a large population with genetic variations are not easily accessible. In addition, it is also possible that a gene identified in a linkage analysis does not contribute to the associated disease or symptom but rather is simply linked to unknown genetic variations that cause the phenotypic defects.
With the advance of bioinformatics and publication of the full genome sequence of many organisms, computational methods have also been developed to assign protein functions by comparative genome analysis. For example, Pellegrini et al., Proc. Natl. Acad. Sci. USA 96:4285–4288 (1999) discloses a method that constructs a “phylogenetic profile” that summarizes the presence or absence of a particular protein across a number of organisms as determined by analyzing the genome sequences of the organisms. A protein's function is predicted to be linked to another protein's function if the two proteins share the same phylogenetic profile. Another method, the Rosetta Stone method, is based on the theory that separate proteins in one organism are often expressed as separate domains of a fusion protein in another organism. Because the separate domains in the fusion protein are predictably associated with the same function, it can be reasonably predicted that the separate proteins are associated with same functions. Therefore, by discovering separate proteins corresponding to a fusion protein, i.e., the “Rosetta Stone sequence,” functional linkage between proteins can be established. See Marcotte et al., Science, 285:751–753 (1999); Enright et al., Nature, 402:86–90 (1999). Another computational method is the “gene neighbor method.” See Dandekar et al., Trends Biochem. Sci., 23:324–328 (1998); Overbeek et al., Proc. Natl. Acad. Sci. USA 96:2896–2901 (1999). This method is based on the likelihood that if two genes are found to be neighbors in several different genomes, the proteins encoded by the genes share a common function.
While the methods described above are useful in analyzing protein functions, they are constrained by various practical limitations such as unavailability of suitable samples, inefficient assay procedures, and limited reliability. The computational methods are useful in linking proteins by function. However, they are only applicable to certain proteins, and the linkage maps established therewith are sketchy. That is, the maps lack specific information that describes how proteins function in relation to each other within the functional network. Indeed, none of the methods places the identified protein functions in the context of protein-protein interactions.
In contrast with the traditional view of protein function, which focuses on the action of a single protein molecule, a modem expanded view of protein function defines a protein as an element in an interaction network. See Eisenberg et al., Nature, 405:823–826 (2000). That is, a full understanding of the functions of a protein will require knowledge of not only the characteristics of the protein itself, but also its interactions or connections with other proteins in the same interacting network. In essence, protein-protein interactions form the basis of almost all biological processes, and each biological process is composed of a network of interacting proteins. For example, cellular structures such as cytoskeletons, nuclear pores, centrosomes, and kinetochores are formed by complex interactions among a multitude of proteins. Many enzymatic reactions are associated with large protein complexes formed by interactions among enzymes, protein substrates, and protein modulators. In addition, protein-protein interactions are also part of the mechanisms for signal transduction and other basic cellular functions such as DNA replication, transcription, and translation. For example, the complex transcription initiation process generally requires protein-protein interactions among numerous transcription factors, RNA polymerase, and other proteins. See e.g., Tjian and Maniatis, Cell, 77:5–8 (1994).
Because most proteins function through their interactions with other proteins, if a test protein interacts with a known protein, one can reasonably predict that the test protein is associated with the functions of the known protein, e.g., in the same cellular structure or same cellular process as the known protein. Thus, interaction partners can provide an immediate and reliable understanding towards the functions of the interacting proteins. By identifying interacting proteins, a better understanding of disease pathways and the cellular processes that result in diseases may be achieved, and important regulators and potential drug targets in disease pathways can be identified.
There has been much interest in protein-protein interactions in the field of proteomics. A number of biochemical approaches have been used to identify interacting proteins. These approaches generally employ the affinities between interacting proteins to isolate proteins in a bound state. Examples of such methods include coimmunoprecipitation and copurification, optionally combined with cross-linking to stabilize the binding. Identities of the isolated protein interacting partners can be characterized by, e.g., mass spectrometry. See e.g., Rout et al., J. Cell. Biol., 148:635–651 (2000); Houry et al., Nature, 402:147–154 (1999); Winter et al., Curr. Biol., 7:517–529 (1997). A popular approach useful in large-scale screening is the phage display method, in which filamentous bacteriophage particles are made by recombinant DNA technologies to express a peptide or protein of interest fused to a capsid or coat protein of the bacteriophage. A whole library of peptides or proteins of interest can be expressed and a bait protein can be used to screening the library to identify peptides or proteins capable of binding to the bait protein. See e.g., U.S. Pat. Nos. 5,223,409; 5,403,484; 5,571,698; and 5,837,500. Notably, the phage display method only identifies those proteins capable of interacting in an in vitro environment, while the coimmunoprecipitation and copurification methods are not amenable to high throughput screening.
The yeast two-hybrid system is a genetic method that overcomes certain shortcomings of the above approaches. The yeast two-hybrid system has proven to be a powerful method for the discovery of specific protein interactions in vivo. See generally, Bartel and Fields, eds., The Yeast Two-Hybrid System, Oxford University Press, New York, N.Y., 1997. The yeast two-hybrid technique is based on the fact that the DNA-binding domain and the transcriptional activation domain of a transcriptional activator contained in different fusion proteins can still activate gene transcription when they are brought into proximity to each other. In a yeast two-hybrid system, two fusion proteins are expressed in yeast cells. One has a DNA-binding domain of a transcriptional activator fused to a test protein. The other, on the other hand, includes a transcriptional activating domain of the transcriptional activator fused to another test protein. If the two test proteins interact with each other in vivo, the two domains of the transcriptional activator are brought together reconstituting the transcriptional activator and activating a reporter gene controlled by the transcriptional activator. See, e.g., U.S. Pat. No. 5,283,173.
Because of its simplicity, efficiency and reliability, the yeast two-hybrid system has gained tremendous popularity in many areas of research. In addition, yeast cells are eukaryotic cells. The interactions between mammalian proteins detected in the yeast two-hybrid system typically are bona fide interactions that occur in mammalian cells under physiological conditions. As a matter of fact, numerous mammalian protein-protein interactions have been identified using the yeast two-hybrid system. The identified proteins have contributed significantly to the understanding of many signal transduction pathways and other biological processes. For example, the yeast two-hybrid system has been successfully employed in identifying a large number of novel mammalian cell cycle regulators that are important in complex cell cycle regulations. Using known proteins that are important in cell cycle regulation as baits, other proteins involved in cell cycle control were identified by virtue of their ability to interact with the baits. See generally, Hannon et al., in The Yeast Two-Hybrid System, Bartel and Fields, eds., pages 183–196, Oxford University Press, New York, N.Y., 1997. Examples of mammalian cell cycle regulators identified by the yeast two-hybrid system include CDK4/CDK6 inhibitors (e.g., p16, p15, p18 and p19), Rb family members (e.g., p130), Rb phosphatase (e.g., PP1-α2), Rb-binding transcription factors (e.g., E2F-4 and E2F-5), General CDK inhibitors (e.g., p21 and p27), CAK cyclin (e.g., cyclin H), and CDK Thr161 phosphatase (e.g., KAP and CDI1). See id at page 192. “[T]he two-hybrid approach promises to be a useful tool in our ongoing quest for new pieces of the cell cycle puzzle.” See id at page 193.
The yeast two-hybrid system can be employed to identify proteins that interact with a specific known protein involved in a disease pathway, and thus provide valuable understandings of the disease mechanism. The identified proteins and the protein—protein interactions in which they participate are potential targets for use in identifying new drugs for treating the disease.