Living organisms possess various mechanisms for preventing disease states. For instance, the vertebrate immune system provides both humoral-mediated and cellular-mediated immunological defenses. As part of the cellular arm, cytotoxic CD8+ T cells kill infected cells if they recognize short peptides (amino acid subsequences) from a pathogenic protein, which are presented within the Major Histocompatibility Complex class 1 (MHC-1) molecules on a cell's surface. Most human cells create such short peptides by a process that trims proteins down to a length of 8-11 amino acids suitable for binding to MHC-I molecules, or around 20 amino acids suitable for binding to MHC-II. The MHC molecules bind to some of the processed peptides (referred to as epitopes) and present them on the surface of the cell where the cells of the immune system can encounter and recognize the epitopes. The particular epitopes that can be presented by a cell depend on the type of MHC molecules expressed by the organism.
The human MHC molecules are also often referred to as the Human Lymphocyte Antigen (HLA) molecules. MHC-I (HLA-1) molecules are encoded in three regions of the human genome, labeled A, B, and C. Since each individual inherits genes from two parents, each individual expresses from three to six different MHC molecules. The regions of the genome that code for MHC molecules are among the most variable in the human genome. The diversity is concentrated in those nucleotide sequences coding for the groove region of the MHC molecule where an epitope binds to the MHC molecule.
Since different MHC molecules typically bind to different peptides, it is very important clinically to classify MHC types. For example, organ transplant recipients may reject organs received from donors with different MHC types because the cells in these transplanted organs will present MHC-peptide complexes that are new to the immune system of the recipient. Modern MHC typing is performed by sequencing, and the sequence data for all known MHC variants is publicly available.
The interaction between an MHC molecule and a peptide (or any two molecules) can be characterized by a binding free energy. The lower the binding free energy, the greater the affinity between the two proteins. The binding free energy is the difference between the free energy of the bound and unbound states. The binding energy for an MHC-peptide complex can be directly measured by competition experiments with a standard peptide. It is expressed as the ratio between the half-maximal inhibitory concentration (IC50) of the standard peptide to that of the test peptide. In the context of MHC-peptide binding, IC50 is the concentration of the test peptide required to inhibit binding of the standard peptide to MHC by 50%. The result of such experiments is a set of relative binding energies (negative logarithms of the relative concentrations), for different MHC-peptide combinations.
Despite significant progress over the last few years, predicting 3-D protein structure and binding remains difficult to solve problems. Research in this area has focused on complex physics-based models using a large number of particles to describe not only the amino acids in the proteins, but also the solvent that surrounds them. One example of a structural model that can be used to predict peptide-MHC affinity is the threading model. The threading model is based on the premise that proteins fold in a finite number of ways and that the change in the short peptide that binds to MHC does not dramatically influence the 3-D binding configuration. Therefore, instead of screening all theoretically possible ways a particular sequence can fold and bind to another peptide to properly choose the sequence's 3-D structure, the protein binding configurations that are already known are used to compute the binding energy (or affinity).
Due to the importance of MHC complexes, many structures of MHC-peptide binding configurations have been obtained by crystallographers. Since x-ray crystallography reveals that MHC-peptide complexes exhibit a finite number of conformations, the threading approach can be applied to the problem of predicting MHC-peptide binding. The threading approach assumes that energy is additive, but it introduces a simplification that allows estimation of the binding energy of a peptide with an MHC molecule whose 3-D configuration of binding with some other peptide is known. In particular, the assumption is that the binding energy is dominated by the potentials of pairwise amino acid interactions that occur when the amino acids are in close proximity (e.g., distance smaller than 4.5 Å). Another assumption underlying the threading approach is that the proximity pattern of the peptide in the groove (i.e., MHC binding site) does not change dramatically with the peptide's amino acid content. As the pairwise potentials are assumed to depend only on the amino acids themselves and not on their context in the molecule, the energy becomes a sum of pairwise potentials taken from a symmetric 20×20 matrix of pairwise potentials between amino acids. These parameters are computed based on the amino acid binding physics and there are several published sets derived in different ways.
The MHC-peptide threading procedure utilizes solved MHC-peptide complexes as the threading template, a definition of interacting residues and a pairwise contact potential table. To predict MHC-peptide binding, the query sequence is “threaded” through the various known MHC structures to find the best fit. These structural data files are available, for instance, from the Research Collaboratory for Structural Bioinformatics (RCSB) protein data bank. The algorithm for the threading model proceeds as follows. Given a known structure of an MHC-peptide complex, the contacting MHC residues for each peptide position are determined. The amino acid-amino acid pairwise potentials are used to score the interaction of a peptide amino acid at a certain position with all its contacting residues. Assuming position independence, the peptide's score is the sum of the amino acid scores.
An example of an MHC-peptide complex is given in FIG. 1, which shows the 3-D structure of MHC A0201 bound to a peptide. The peptide amino acid centroids are marked in 3-D space by triangles and the centroids of the MHC amino acids are marked by circles. The MHC amino acids that are in proximity (<4 Å) of the peptide are marked by filled circles.