Combinatorial libraries represent exciting new tools in basic science research and drug design. It is possible through synthetic chemistry or molecular biology to generate libraries of complex polymers, with many subunit permutations. There are many guises to these libraries: random peptides, which can be synthesized on plastic pins (Geysen et al., 1987, J. Immunol. Meth. 102:259-274), beads (Lam et al., 1991, Nature 354:82-84) or in a soluble form (Houghten et al., 1991, Nature 354:84-86) or expressed on the surface of viral particles (Cwirla et al., 1990, Proc. Natl. Acad. Sci. USA 87:6378-6382; Kay et al., 1993, Gene 128:59-65; Scott and Smith, 1990, Science 249:386-390); nucleic acids (Ellington and Szostak, 1990, Nature 346:818-822; Gao et al., 1994, Proc. Natl. Acad. Sci. USA 91:11207-11211; Tuerk and Gold, 1990, Science 249:505-510); and small organic molecules (Gordon et al., 1994, J. Med. Chem. 37:1385-1401). These libraries are very useful in mapping protein-protein interactions and discovering drugs.
Phage display has become a powerful method for screening populations of peptides, mutagenized proteins, and cDNAs for members that have affinity to target molecules of interest. It is possible to generate 10.sup.8 -10.sup.9 different recombinants from which one or more clones can be selected with affinity to antigens, antibodies, cell surface receptors, protein chaperones, DNA, metal ions, etc. Screening libraries is versatile because the displayed elements are expressed on the surface of the virus as capsid-fusion proteins. The most important consequence of this arrangement is that there is a physical linkage between phenotype and genotype. There are several other advantages as well: 1) virus particles which have been isolated from libraries by affinity selection can be regenerated by simple bacterial infection, and 2) the primary structure of the displayed binding peptide or protein can be easily deduced by DNA sequencing of the cloned segment in the viral genome.
Combinatorial peptide libraries have been expressed in bacteriophage. Synthetic oligonucleotides, fixed in length, but with multiple unspecified codons can be cloned into genes III, VI, or VIII of bacteriophage M13 where they are expressed as a plurality of peptide:capsid fusion proteins. The libraries, often referred to as random peptide libraries, can be screened for binding to target molecules of interest. Usually, three to four rounds of screening can be accomplished in a week's time, leading to the isolation of one to hundreds of binding phage.
The primary structure of the binding peptides is then deduced by nucleotide sequencing of individual clones. Inspection of the peptide sequences sometimes reveals a common motif, or consensus sequence. Generally, this motif when synthesized as a soluble peptide has the full binding activity. Random peptide libraries have successfully yielded peptides that bind to the Fab site of antibodies (Cwirla et al., 1990, Proc. Natl. Acad. Sci. USA 87:6378-6382; Scott and Smith, 1990, Science 249:386-390), cell surface receptors (Doorbar and Winter, 1994, J. Mol. Biol. 244:361-369; Goodson et al., 1994, Proc. Natl. Acad. Sci. USA 91:7129-7133), cytosolic receptors (Blond-Elguindi et al., 1993, Cell 75:717-728), intracellular proteins (Daniels and Lane, 1994, J. Mol. Biol. 243:639-652; Dedman et al., 1993, J. Biol. Chem. 268:23025-23030; Sparks et al., 1994, J. Biol. Chem. 269:23853-23856), DNA (Krook et al., 1994, Biochem. Biophys. Res. Comm. 204:849-854), and many other targets (Winter, 1994, Drug Dev. Res. 33:71-89).
Most vital cellular processes are regulated by the transmission of signals throughout the cell in the form of complex interactions between proteins. As the study of signal transduction, or the flow of information throughout the cell, has broadened and matured, it has become apparent that these protein-protein interactions are often mediated by modular domains within signalling proteins. Src, both the first proto-oncogene product and the first tyrosine kinase discovered (Taylor and Shalloway, 1993, Current Opinion in Genetics and Development 3:26-34), is the prototypic modular domain-containing protein.
Src is a protein tyrosine kinase of 60 kilodaltons and is located at the plasma membrane of cells. It was first discovered in the 1970's to be the oncogenic element of Rous sarcoma virus, and in the 1980's, it was appreciated to be a component of the signal transduction system in animal cells. However, since the identification of viral and cellular forms of Src (i.e., v-Src and c-Src), their respective roles in oncogenesis, normal cell growth, and differentiation have not been completely understood.
In addition to its tyrosine kinase region (sometimes called a Src Homology 1 domain), Src contains two regions that have been found to have functionally and structurally homologous counterparts in a large number of proteins. These regions have been designated the Src Homology 2 (SH2) and Src Homology 3 (SH3) domains. SH2 and SH3 domains are modular in that they fold independently of the protein that contains them, their secondary structure places N-and C-termini close to one another in space, and they appear at variable locations (anywhere from N-to C-terminal) from one protein to the next (Cohen et al., 1995, Cell 80:237-248). SH2 domains have been well-studied and are known to be involved in binding to phosphorylated tyrosine residues (Pawson and Gish, 1992, Cell 71:359-362).
The Src-homology region 3 (SH3) of Src is a domain that is 60-70 amino acids in length and is present in many cellular proteins (Cohen et al., 1995, Cell 80:237-248; Pawson, 1995, Nature 373:573-580). Within Src, the SH3 domain is considered to be a negative inhibitory domain, because c-Src can be activated (i.e., transforming) through mutations in this domain (Jackson et al., 1993, Oncogene 8:1943-1956; Seidel-Dugan et al., 1992, Mol Cell Biol 12:1835-1845).
To deduce the binding specificity of the Abl SH3 domain, a group led by David Baltimore screened cDNA libraries with radiolabeled GST-Abl SH3 fusion protein and identified two binding cDNA clones (Cicchetti et al., 1992, Science 257:803-806). Both clones encoded proteins with proline rich regions that were later shown to be SH3 binding domains.
Subsequently, others have screened combinatorial peptide libraries and identified peptides that bound to the Src SH3 domain (Yu et al., 1994, Cell 76:933-945; Cheadle et al., 1994, J. Biol. Chem. 269:24034-24039). Using the SH3 domain of Src, Sparks et al., 1994, J. Biol. Chem. 269:23853-23856 screened phage-display random peptide libraries and identified a consensus peptide sequence that binds with specificity and high affinity to the Src SH3 domain.
The consensus from these various studies is that the optimal Src SH3 peptide ligand is RPLPPLP (SEQ ID NO:45). Recently, the structures of the peptide-SH3 domain complexes have been deduced by NMR and the peptides have been shown to bind in two possible orientations with respect to the SH3 domain (Feng et al., 1994, Science 266:1241-1247; Lim et al., 1994, Nature 372:375-379).
Since SH3 domains have been found to have such important roles in the function of crucial signalling and structural elements in the cell, a method of identifying proteins containing SH3 regions is of great interest. In this regard, it is important to note that such a method is unavailable because of the low sequence similarity of modular functional domains, including SH3. See, e.g., FIG. 6, which illustrates the minimal primary sequence homology among various known SH3 domains.
Sequence homology searches can potentially identify known proteins containing not yet recognized functional domains of interest, however, sequence homology generally needs to be &gt;40% for this procedure to be successful. Functional domains generally are less than 40% homologous and therefore many would be missed in a sequence homology search. In addition, homology searches do not identify novel proteins; they only identify proteins already defined by nucleotide or amino acid sequence and present in the database.
Another approach is to use hybridization techniques using nucleotide probes to search expression libraries for novel proteins. This method would have limited applicability to finding novel proteins containing functional domains due to the low sequence homology of the functional domains.
Methods for isolating partner proteins involved in protein-protein interactions have generally focused on finding a ligand to a protein that has been found and characterized. Such approaches have included using anti-idiotypic antibodies that mimic the known protein to screen cDNA expression libraries for a binding ligand (Jerne, 1974, Ann. Immunol. (Inst. Pasteur) 125c:373-389; Sudol, 1994, Oncogene 9:2145-2152). Skolnick et al., 1991, Cell 65:83-90 isolated a binding partner for PI3-kinase by screening a cDNA expression library with the .sup.32 P-labeled tyrosine phosphorylated carboxyl terminus of the epidermal growth factor receptor (EGFR).
An easy method for isolating operationally defined ligands involved in protein-protein interactions and for optimally identifying an exhaustive set of modular domain-containing proteins implicated in binding with the ligands would be highly desirable.
If such a method were available, however, such a method would be useful for the isolation of any polypeptide having a functioning version of any functional domain of interest. Such a general method would be of tremendous utility in that whole families of related proteins each with its own version of the functional domain of interest could be identified. Knowledge of such related proteins would contribute greatly to our understanding of various physiological processes, including cell growth or death, malignancy, and immune reactions, to name a few. Such a method would also contribute to the development of increasingly more effective therapeutic, diagnostic, or prophylactic agents having fewer side effects.
According to the present invention, just such a method is provided.
Regarding SH3 domain-containing proteins, the method of the present invention will contribute greatly to our understanding of cell growth (Zhu et al., 1993, J. Biol. Chem. 268:1775-1779; Taylor and Shalloway, 1994, Nature 368:867-871), malignancy (Wages et al., 1992, J. Virol. 66:1866-1874; Bruton and Workman, 1993, Cancer Chemother. Pharmacol. 32:1-19), subcellular localization of proteins to the cytoskeleton and/or cellular membranes (Weng et al., 1993, J. Biol. Chem. 268:14956-14963; Bar-Sagi et al., 1993, Cell 74:83-91), signal transduction (Duchesne et al., 1993, Science 259:525-528), cell morphology (Wages et al., 1992, J. Virol. 66:1866-1874; McGlade et al., 1993, EMBO J. 12:3073-3081), neuronal differentiation Tanaka et al., 1993, Mol. Cell. Biol. 13:4409-4415), T cell activation (Reynolds et al., 1992, Oncogene 7:1949-1955), and cellular oxidase activity (McAdara and Babior, 1993, Blood 82:A28).
Citation of a reference hereinabove shall not be construed as an admission that such is prior art to the present invention.