Despite significant progress over the last few years, predicting 3-D protein structure and protein-ligand binding remain difficult problems to solve. Research in this area has focused on complex physics-based models using a large number of particles to describe not only the amino acids in the proteins, but also the solvent that surrounds them.
One particular example of protein-ligand binding that is of great interest to researchers is the interaction between a Major Histocompatibility Complex (MHC) molecule and a peptide. One example of a structural model that can be used to predict peptide-MHC affinity is the threading model. The threading model is based on the premise that proteins fold in a finite number of ways and that the change in the short peptide that binds to MHC does not dramatically influence the 3-D binding configuration. Therefore, instead of screening all theoretically possible ways a particular sequence can fold and bind to another peptide to properly choose the sequence's 3-D structure, the protein binding configurations that are already known are used to compute binding energy (or affinity).
Many structures of MHC-peptide binding configurations have been obtained by crystallographers. Since x-ray crystallography reveals that MHC-peptide complexes exhibit a finite number of conformations, the threading approach can be applied to the problem of predicting MHC-peptide binding. The threading approach assumes that energy is additive, but it introduces a simplification that allows estimation of the binding energy of a peptide with an MHC molecule whose 3-D configuration of binding with some other peptide is known. In particular, the assumption is that the binding energy is dominated by the potentials of pairwise amino acid interactions that occur when the amino acids are in close proximity (e.g., distance smaller than 4.5 Å). Another assumption underlying the threading approach is that the proximity pattern of the peptide in the groove (i.e., MHC binding site) does not change dramatically with the peptide's amino acid content. As the pairwise potentials are assumed to depend only on the amino acids themselves and not on their context in the molecule, the energy becomes a sum of pairwise potentials taken from a symmetric 20×20 matrix of pairwise potentials between amino acids. These parameters are computed based on the amino acid binding physics and there are several published sets derived in different ways.
The MHC-peptide threading procedure utilizes solved MHC-peptide complexes as the threading template, a definition of interacting residues and a pairwise contact potential table. To predict MHC-peptide binding, the query sequence is “threaded” through the various known MHC structures to find the best fit. These structural data files are available, for instance, from the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB). The algorithm for the threading model proceeds as follows—given a known structure of an MHC-peptide complex, the contacting MHC residues for each peptide position are determined, the amino acid-amino acid pairwise potentials are used to score the interaction of a peptide amino acid at a certain position with all its contacting residues and assuming position independence, the peptide's score is the sum of the amino acid scores.