The present invention is directed to polypeptides having a functional domain of interest or functional equivalents thereof. Methods of identifying these polypeptides are described, along with various methods of their use, including but not limited to targeted drug discovery.
Combinatorial libraries represent exciting new tools in basic science research and drug design. It is possible through synthetic chemistry or molecular biology to generate libraries of complex polymers, with many subunit permutations. There are many guises to these libraries: random peptides, which can be synthesized on plastic pins (Geysen et al., 1987, J. Immunol. Meth. 102:259-274), beads (Lam et al., 1991, Nature 354:82-84) or in a soluble form (Houghten et al., 1991, Nature 354:84-86) or expressed on the surface of viral particles (Cwirla et al., 1990, Proc. Natl. Acad. Sci. USA 87:6378-6382; Kay et al., 1993, Gene 128:59-65; Scott and Smith, 1990, Science 249:386-390); nucleic acids (Ellington and Szostak, 1990, Nature 346:818-822; Gao et al., 1994, Proc. Natl. Acad. Sci. USA 91:11207-11211; Tuerk and Gold, 1990, Science 249:505-510); and small organic molecules (Gordon et al., 1994, J. Med. Chem. 37:1385-1401). These libraries are very useful in mapping protein-protein interactions and discovering drugs.
Phage display has become a powerful method for screening populations of peptides, mutagenized proteins, and cDNAs for members that have affinity to target molecules of interest. It is possible to generate 108-109 different recombinants from which one or more clones can be selected with affinity to antigens, antibodies, cell surface receptors, protein chaperones, DNA, metal ions, etc. Screening libraries is versatile because the displayed elements are expressed on the surface of the virus as capsid-fusion proteins. The most important consequence of this arrangement is that there is a physical linkage between phenotype and genotype. There are several other advantages as well: 1) virus particles which have been isolated from libraries by affinity selection can be regenerated by simple bacterial infection, and 2) the primary structure of the displayed binding peptide or protein can be easily deduced by DNA sequencing of the cloned segment in the viral genome.
Combinatorial peptide libraries have been expressed in bacteriophage. Synthetic oligonucleotides, fixed in length, but with multiple unspecified codons can be cloned into genes III, VI, or VIII of bacteriophage M13 where they are expressed as a plurality of peptide:capsid fusion proteins. The libraries, often referred to as random peptide libraries, can be screened for binding to target molecules of interest. Usually, three to four rounds of screening can be accomplished in a week""s time, leading to the isolation of one to hundreds of binding phage.
The primary structure of the binding peptides is then deduced by nucleotide sequencing of individual clones. Inspection of the peptide sequences sometimes reveals a common motif, or consensus sequence. Generally, this motif when synthesized as a soluble peptide has the full binding activity. Random peptide libraries have successfully yielded peptides that bind to the Fab site of antibodies (Cwirla et al., 1990, Proc. Natl. Acad. Sci. USA 87:6378-6382; Scott and Smith, 1990, Science 249:386-390), cell surface receptors (Doorbar and Winter, 1994, J. Mol. Biol. 244:361-369; Goodson et al., 1994, Proc. Natl. Acad. Sci. USA 91:7129-7133), cytosolic receptors (Blond-Elguindi et al., 1993, Cell 75:717-728), intracellular proteins (Daniels and Lane, 1994, J. Mol. Biol. 243:639-652; Dedman et al., 1993, J. Biol. Chem. 268:23025-23030; Sparks et al., 1994, J. Biol. Chem. 269:23853-23856), DNA (Krook et al., 1994, Biochem. Biophys. Res. Comm. 204:849-854), and many other targets (Winter, 1994, Drug Dev. Res. 33:71-89).
Most vital cellular processes are regulated by the transmission of signals throughout the cell in the form of complex interactions between proteins. As the study of signal transduction, or the flow of information throughout the cell, has broadened and matured, it has become apparent that these protein-protein interactions are often mediated by modular domains within signalling proteins. Src, both the first proto-oncogene product and the first tyrosine kinase discovered (Taylor and Shalloway, 1993, Current Opinion in Genetics and Development 3:26-34), is the prototypic modular domain-containing protein.
Src is a protein tyrosine kinase of 60 kilodaltons and is located at the plasma membrane of cells. It was first discovered in the 1970""s to be the oncogenic element of Rous sarcoma virus, and in the 1980""s, it was appreciated to be a component of the signal transduction system in animal cells. However, since the identification of viral and cellular forms of Src (i.e., v-Src and c-Src), their respective roles in oncogenesis, normal cell growth, and differentiation have not been completely understood.
In addition to its tyrosine kinase region (sometimes called a Src Homology 1 domain), Src contains two regions that have been found to have functionally and structurally homologous counterparts in a large number of proteins. These regions have been designated the Src Homology 2 (SH2) and Src Homology 3 (SH3) domains. SH2 and SH3 domains are modular in that they fold independently of the protein that contains them, their secondary structure places N- and C-termini close to one another in space, and they appear at variable locations (anywhere from N- to C-terminal) from one protein to the next (Cohen et al., 1995, Cell 80:237-248). SH2 domains have been well-studied and are known to be involved in binding to phosphorylated tyrosine residues (Pawson and Gish, 1992, Cell 71:359-362).
The Src-homology region 3 (SH3) of Src is a domain that is 60-70 amino acids in length and is present in many cellular proteins (Cohen et al., 1995, Cell 80:237-248; Pawson, 1995, Nature 373:573-580). Within Src, the SH3 domain is considered to be a negative inhibitory domain, because c-Src can be activated (i.e., transforming) through mutations in this domain (Jackson et al., 1993, Oncogene 8:1943-1956; Seidel-Dugan et al., 1992, Mol Cell Biol 12:1835-1845).
To deduce the binding specificity of the Abl SH3 domain, a group led by David Baltimore screened cDNA libraries with radiolabeled GST-Abl SH3 fusion protein and identified two binding cDNA clones (Cicchetti et al., 1992, Science 257:803-806). Both clones encoded proteins with proline rich regions that were later shown to be SH3 binding domains.
Subsequently, others have screened combinatorial peptide libraries and identified peptides that bound to the Src SH3 domain (Yu et al., 1994, Cell 76:933-945; Cheadle et al., 1994, J. Biol. Chem. 269:24034-24039). Using the SH3 domain of Src, Sparks et al., 1994, J. Biol. Chem. 269:23853-23856 screened phage-display random peptide libraries and identified a consensus peptide sequence that binds with specificity and high affinity to the Src SH3 domain.
The consensus from these various studies is that the optimal Src SH3 peptide ligand is RPLPPLP (SEQ ID NO:45). Recently, the structures of the peptide-SH3 domain complexes have been deduced by NMR and the peptides have been shown to bind in two possible orientations with respect to the SH3 domain (Feng et al., 1994, Science 266:1241-1247; Lim et al., 1994, Nature 372:375-379).
Since SH3 domains have been found to have such important roles in the function of crucial signalling and structural elements in the cell, a method of identifying proteins containing SH3 regions is of great interest. In this regard, it is important to note that such a method is unavailable because of the low sequence similarity of modular functional domains, including SH3. See, e.g., FIG. 6, which illustrates the minimal primary sequence homology among various known SH3 domains.
Sequence homology searches can potentially identify known proteins containing not yet recognized functional domains of interest, however, sequence homology generally needs to be  greater than 40% for this procedure to be successful. Functional domains generally are less than 40% homologous and therefore many would be missed in a sequence homology search. In addition, homology searches do not identify novel proteins; they only identify proteins already defined by nucleotide or amino acid sequence and present in the database.
Another approach is to use hybridization techniques using nucleotide probes to search expression libraries for novel proteins. This method would have limited applicability to finding novel proteins containing functional domains due to the low sequence homology of the functional domains.
Methods for isolating partner proteins involved in protein-protein interactions have generally focused on finding a ligand to a protein that has been found and characterized. Such approaches have included using anti-idiotypic antibodies that mimic the known protein to screen cDNA expression libraries for a binding ligand (Jerne, 1974, Ann. Immunol. (Inst. Pasteur) 125c:373-389; Sudol, 1994, Oncogene 9:2145-2152). Skolnick et al., 1991, Cell 65:83-90 isolated a binding partner for PI3-kinase by screening a cDNA expression library with the 32P-labeled tyrosine phosphorylated carboxyl terminus of the epidermal growth factor receptor (EGFR).
An easy method for isolating operationally defined ligands involved in protein-protein interactions and for optimally identifying an exhaustive set of modular domain-containing proteins implicated in binding with the ligands would be highly desirable.
If such a method were available, however, such a method would be useful for the isolation of any polypeptide having a functioning version of any functional domain of interest. Such a general method would be of tremendous utility in that whole families of related proteins each with its own version of the functional domain of interest could be identified. Knowledge of such related proteins would contribute greatly to our understanding of various physiological processes, including cell growth or death, malignancy, and immune reactions, to name a few. Such a method would also contribute to the development of increasingly more effective therapeutic, diagnostic, or prophylactic agents having fewer side effects.
According to the present invention, just such a method is provided.
Regarding SH3 domain-containing proteins, the method of the present invention will contribute greatly to our understanding of cell growth (Zhu et al., 1993, J. Biol. Chem. 268:1775-1779; Taylor and Shalloway, 1994, Nature 368:867-871), malignancy (Wages et al., 1992, J. Virol. 66:1866-1874; Bruton and Workman, 1993, Cancer Chemother. Pharmacol. 32:1-19), subcellular localization of proteins to the cytoskeleton and/or cellular membranes (Weng et al., 1993, J. Biol. Chem. 268:14956-14963; Bar-Sagi et al., 1993, Cell 74:83-91), signal transduction (Duchesne et al., 1993, Science 259:525-528), cell morphology (Wages et al., 1992, J. Virol. 66:1866-1874; McGlade et al., 1993, EMBO J. 12:3073-3081), neuronal differentiation Tanaka et al., 1993, Mol. Cell. Biol. 13:4409-4415), T cell activation (Reynolds et al., 1992, Oncogene 7:1949-1955), and cellular oxidase activity (McAdara and Babior, 1993, Blood 82:A28).
Citation of a reference hereinabove shall not be construed as an admission that such is prior art to the present invention.
In general, the present invention is directed to a method of using isolated, operationally defined ligands involved in binding interactions for optimally identifying an exhaustive set of compounds binding to such ligands. In one embodiment, the isolated ligands are peptides involved in specific protein-protein interactions and are used to identify a set of novel modular domain-containing proteins that bind to the ligands. Using this method, proteins sharing only modest similarities but a common function can be found.
The present invention is directed to a method of identifying a polypeptide or family of polypeptides having a functional domain of interest. The basic steps of the method comprise: (a) choosing a recognition unit or set of recognition units having a selective affinity for a target molecule with a functional domain of interest; (b) contacting the recognition unit with a plurality of polypeptides; and (c) identifying a polypeptide having a selective binding affinity for the recognition unit, which polypeptide includes the functional domain of interest or a functional equivalent thereof.
In one particular embodiment of the invention, exhaustive screening of proteins having a desired functional domain involves an iterative process by which ligands or recognition units for SH3 domains identified in the first round of screening are used to detect SH3 domain-containing proteins in successive expression library screens.
More particularly, the method of the present invention includes choosing a recognition unit having a selective affinity for a target molecule with a functional domain of interest. With this recognition unit (particularly under the multvalent recognition unit screening conditions taught by the present invention), it has further been discovered that a plurality of polypeptides from various sources can be examined such that certain polypeptides having a selective binding affinity for the recognition unit can be identified. The polypeptides so identified have been shown to include the functional domain of interest; that is, the functional domains found are working versions that are capable of displaying the same binding specificity as the functional domain of interest. Hence, the polypeptides identified by the present method also possess those attributes of the functional domain of interest which allow these related polypeptides to exhibit the same, similar, or analogous (but functionally equivalent) selective affinity characteristics as the domain of interest of the initial target molecule. By screening the plurality of peptides for recognition unit binding, the methods of the present invention circumvent the limitations of conventional DNA-based screening methods and allow for the identification of highly disparate protein sequences possessing functionally equivalent functional domains.
In specific embodiments of the present invention, the plurality of polypeptides is obtained from the proteins present in a cDNA expression library. The specificity of the polypeptides which bear the functional domain of interest or a functional equivalent thereof for various peptides or recognition units can subsequently be examined, allowing for a greater understanding of the physiological role of particular polypeptide/recognition unit interactions. Indeed, the present invention provides a method of targeted drug discovery based on the observed effects of a given drug candidate on the interaction between a recognition unit-polypeptide pair or a recognition unit and a xe2x80x9cpanelxe2x80x9d of related polypeptides each with a copy or a functional equivalent of (e.g., capable of displaying the same binding specificity and thus binding to the same recognition unit as) the functional domain of interest.
The present invention also provides polypeptides comprising certain amino acid sequences. Moreover, the present invention also provides nucleic acids, including certain DNA constructs comprising certain coding sequences. Using the methods of the present invention, more than eighteen different SH3 domain-containing proteins have been identified, over half of which have not been previously described.
The present inventors have found, unexpectedly, that the valency (i.e., whether it is a monomer, dimer, tetramer, etc.) of the recognition unit that is used to screen an expression library or other source of polypeptides apparently has a marked effect upon the specificity of the recognition unit-functional domain interaction. The present inventors have discovered that recognition units in the form of small peptides, in multivalent form, have a specificity that is eased but not forfeited. In particular, biotinylated peptides bound to a multivalent (believed to be tetravalent) streptavidin-alkaline phosphatase complex have an unexpected generic specificity. This allows such peptides to be used to screen libraries to identify classes of polypeptides containing functional domains that are similar but not identical in sequence to the peptides"" original target functional domains.
The present invention also provides methods for identifying potential new drug candidates (and potential lead compounds) and determining the specificities thereof. For example, knowing that a polypeptide with a functional domain of interest and a recognition unit, e.g., a binding peptide, exhibit a selective affinity for each other, one may attempt to identify a drug that can exert an effect on the polypeptide-recognition unit interaction, e.g., either as an agonist or as an antagonist (inhibitor) of the interaction. With this assay, then, one can screen a collection of candidate xe2x80x9cdrugsxe2x80x9d for the one exhibiting the most desired characteristic, e.g., the most efficacious in disrupting the interaction or in competing with the recognition unit for binding to the polypeptide.
In addition, the present invention also provides certain assay kits and methods of using these assay kits for screening drug candidates for their ability to affect the binding of a polypeptide containing a functional domain to a recognition unit. In a particular aspect of the present invention, the assay kit comprises: (a) a polypeptide containing a functional domain of interest; and (b) a recognition unit having a selective binding affinity for the polypeptide. Yet another assay kit may comprise a plurality of polypeptides, each polypeptide containing a functional domain of interest, in which the functional domain of interest is a domain selected from the group consisting of an SH1, SH2, SH3, PH, PTB, LIM, armadillo, Notch/ankyrin repeat, zinc finger, leucine zipper, and helix-turn-helix, and at least one recognition unit having a selective affinity for each of the plurality of polypeptides.
Other objects of the present invention will be apparent to those of ordinary skill upon further consideration of the following detailed description.