It is estimated that the human protein-protein interaction (PPI) interactome contains as many as 650,000 different PPIs, and understanding them is expected to lead to new therapeutic targets (Stumpf, M P H et al. Proc. Natl. Acad. Sci. U.S.A. 105, 6959-6964, 2008). Proteins are the fundamental functional components of most of the cellular machinery, and formation of specific protein complexes mediated by the respective PPI underpins many cellular processes. Aberrant PPI, either through the loss of function or through formation and/or stabilization of a protein-protein complex at an inappropriate time or location, is implicated in many diseases such as cancer and autoimmune disorders. Identifying and characterizing regions that drive the PPI of proteins involved in such disease will help in understanding the proteins' functions and in designing drugs that target such regions (Thanos et al. Proc. Natl. Acad. Sci. U.S.A. 103, 15422-15427, 2006; and Bullock et al. J. Am. Chem. Soc. September 14; 133(36):14220-3, 2011).
In the last decade, a large number of protein structures have been solved, and the number of structures of protein-protein complexes is also increasing. These structures of complexes yield information on the residues present in the protein-protein binding region. These residues constitute the PPI structural epitope of the protein. However, not all the residues present in the binding region contribute equally to the binding energy of the complex. For example, work on the binding of human growth hormone (GH) to its receptor identified a region of energetically important residues on the protein surface that were critical to the binding (Cunningham and Wells Science 244, 1081-1085, 1989). It has thus become evident that only a few of the binding-region residues contribute a significant fraction of the binding energy. These residues, which constitute the PPI functional epitope, are termed hot-spot residues. One of the rigorous thermodynamic characteristics of a hot-spot residue is that the residue contributes more than 1.3 kcal/mol to the binding energy of the PPI (Ofran and Rost Plos Computational Biology 3, 1169-1176, 2007). An operational characteristic of a hot-spot residue is that when the residue is mutated to alanine, the mutation leads to an at least a 10-fold increase in the protein-protein dissociation constant (KD) of the protein.
Experimentally, site-directed mutagenesis has been widely used to analyze how protein-protein interfaces function. In this method, subsets of the protein residues are systematically mutated, typically one at a time, and the effect of mutation on the protein-protein binding energy is analyzed. Preferably, the residue is replaced with alanine, as the alanine amino acid lacks a side chain beyond the β-carbon. Accordingly, binding assays performed in conjunction with alanine mutagenesis can identify hot-spot residues, based on the above operational characteristic. One problem with this technique is that it is tacitly assumed that mutation of a residue to alanine does not lead to structural perturbations of the protein. However, it has been demonstrated that mutating to alanine can in fact result in structural changes to a protein (Rao and Brooks Biochemistry 50, 1347-1358, 2011).
One solution to the above problem is to use computational techniques to identify hot spot residues. However, only a few tools have been developed to identify hot spot residues. These tools can be broadly classified into two categories: (1) tools that utilize the structure of the protein-protein complex, and (2) tools that utilize the sequence/structure of the unbound protein.
The first category includes tools that perform in silico alanine scanning mutagenesis of protein-protein interfaces (Kortemme and Baker Proc. Natl. Acad. Sci. U.S.A. 99, 14116-14121, 2002; Lise et al. Plos One February 28; 6(2):e16774, 2011; Xia et al. BMC Bioinformatics April 8; 11:174, 2010; and Tuncbag et al. Bioinformatics 25, 1513-1520, 2009). These tools can computationally simulate the effect of mutating an interface residue to alanine on the protein-protein binding free energy (ΔG). Using the structure of the protein-protein complex as an input, these tools can also computationally calculate ΔΔG (change in binding free energy) upon mutation. The parameters of the energy function used to calculate ΔΔG are often obtained by fitting the computational ΔΔG to the experimentally observed ΔΔG for a set of proteins. However, while these tools are able to identify hot spot residues with a reasonable accuracy, the tools require protein-protein complex structures. The requirement for such complex structures is problematic, as it severely limits the application of such tools.
The second category of computational tools overcomes the problem of requiring protein-protein complex structures, by utilizing the sequence or structure of the unbound protein to identify hot spot residues. However, the vast majority of tools appear to only identify binding-region residues using protein structures (Fernandez-Recio WIREs Comput Mol Sci, 2011, 1:680-698; and Tuncbag et al. Briefings in Bioinformatics 2009, 10, 217-232). To date only the ISIS tool alleges to be able to identify hot spot residues using protein sequences alone (Ofran and Rost Plos Computational Biology 2007, 3, 1169-1176; and Ofran and Rost Bioinformatics 2007, Jan. 15; 23(2):e13-6).
ISIS is a machine-learning based tool (Ofran and Rost Plos Computational Biology 2007, 3, 1169-1176; and Ofran and Rost Bioinformatics 2007, Jan. 15; 23(2):e13-6). For each residue in the protein sequence, ISIS bases its predictions on the sequence environment of the residue, its evolutionary profile, its predicted secondary structure, and its solvent accessibility. However, one problem with the ISIS tool is that it does not take into account hydrophobic patches and polar residues within the vicinity of the patches in identifying hot spot residues. It has been shown that the detection of hydrophobic patches on the surfaces of proteins can be used to identify protein binding regions (Lijnzaad, P and Argos, P. Proteins-Structure Function and Genetics 28, 333-343, 1997; Chennamsetty et al. Proc. Natl. Acad. Sci. U.S.A. 106, 11937-11942, 2009; Trout et al. Proteins-Structure Function and Bioinformatics 79, 888-897, 2011; WO 2009/155518; and U.S. patent application Ser. No. 13/000,353). Moreover, it has been demonstrated that protein hot spots are characterized by regions patterned with hydrophobic and polar residues (Kozakov, D et al. Proc. Natl. Acad. Sci. U.S.A. August 16; 108(33):13528-33, 2011).
Another example of a protein sequence/structure-based tool is meta-PPISP, which identifies binding-region residues from the protein structure (Qin and Zhou Bioinformatics 23, 3386-3387, 2007). Meta-PPISP is built on three individual methods: cons-PPISP (Chen and Zhou Proteins-Structure Function and Bioinformatics 61, 21-35, 2005), Promate (Neuvirth et al. J. Mol. Biol. 338, 181-199, 2004), and PINUP (Zhou et al. Nucleic Acids Research 34, 3698-3707, 2006). All three of these methods use sequence conservation along with various different attributes as inputs to predict binding-region residues. Cons-PPISP is based on a neural network and uses evolutionary profiles and solvent accessibility of spatially neighboring residues as inputs. Promate is based on a composite probability calculated from 13 different properties that distinguishes between binding and non-binding region residues. These properties, among others, include evolutionary profile, secondary structure, chemical composition (e.g., amino acid propensities in binding regions), and hydrophobic patch rank. PINUP is based on an empirical energy function, which is a linear sum of three terms: side-chain energy score, residue conservation score and residue interface propensity (it is a function of residue solvent-accessible area). Meta-PPISP combines the raw score of these three methods via a linear equation where the coefficients of the linear equation are obtained by fitting to a database of interacting proteins. However, the meta-PPISP tool is designed to identifying binding-region residues rather than hot-spot residues. Moreover, similar to the problems with ISIS, the meta-PPISP tool does not take into account polar residues within the vicinity of hydrophobic patches.
A further example of a protein sequence/structure-based tool is ConSurf, which maps evolutionarily conserved residues on protein surfaces (Armon, A et al. J. Mol. Biol. 307, 447-463, 2001). It is widely accepted that the residues buried in the protein core, which are required for proper folding of the protein, are conserved throughout the evolution. The ConSurf tool is based on the belief that the residues present on the protein surface that are involved in protein-protein interactions are also evolutionarily conserved. However, similar to meta-PPISP, the ConSurf tool identifies residues likely to be at the interface rather than hot-spot residues. Moreover, as ConSurf only utilizes data on evolutionarily conserved resides, it does not take into account hydrophobic patches and polar residues within the vicinity of the patches in identifying hot spot residues.