Cytotoxic T-cells (TC or CD8-T lymphocytes) and helper T-cells (TH or CD4-T lymphocytes) have the capability of recognizing short, processed fragments of a protein antigen, referred to as antigenic peptides or T-cell epitopes. However, recognition does not occur by direct binding to free peptides. Specific receptor molecules on T-cells (T-cell receptors or TCRs) recognize a peptide antigen only when it is bound to another receptor known as a major histocompatibility complex (MHC) molecule. Such MHC-peptide complexes serve the role of cell markers: when the MHC contains an endogenous (self) peptide, it marks the cell as “healthy”; when it contains a foreign peptide, the cell is marked as “infected”. The MHC-mediated presentation of antigenic peptides to the repertoire of T-cells can thus be seen as the primary stimulus to elicit an immune response. Depending on the type of MHC presenting an antigen, which is correlated with the type of cell expressing it, the immune system is triggered to either destroy the antigen presenting cell or to produce antibodies directed against the infectious agent.
MHC molecules are subdivided into classes I and II. While their general function is the same (presenting antigen), they differ in a number of aspects. MHC class I is expressed on the cell surface as a heterodimeric complex between a 46-kDa heavy chain (the a-chain) and a 12 kDa light chain (the β2-microglobulin or β2m chain). The α-chain consists of three domains, α1, α2 and α3; the α1 and α2 domains are responsible for binding of a peptide ligand, while the α3 domain is membrane-bound and involved in CD8 co-receptor binding. Class II MHC molecules have the same overall shape, although they are constituted of two membrane-bound chains: an α chain of ˜35 kDa and a β chain of ˜28 kDa. Both the α and the β chain form two domains (α1 and α2 on the one hand and β1 and β2 on the other). The α1 and β1 domain jointly form the peptide binding domain. The β2 domain is involved in CD4 co-receptor binding.
Both MHC class I and class II molecules show a high degree of polymorphism. They have been further subdivided into different subtypes. The existence of different MHC allotypes lies at the basis of the capacity of MHCs to bind a broad range of peptides while still preserving some specificity. Given this polymorphism, being able to predict which peptides specifically bind to which MHC subtypes, is thought to be of great value in vaccination strategies and de-immunization programs. Thanks to the recent burst of information derived from experimentally determined 3D-structures, valuable insights about the determinants of peptide binding specificity have been obtained. This, in turn, has led to the idea that a structure-based prediction of potentially antigenic peptides (or T-cell epitopes) is within reach.
Functional human leukocyte antigens (HLAs or human MHCs) are characterized by a deep binding groove to which endogenous as well as potentially antigenic peptides bind. The groove is further characterized by a well-defined shape and physico-chemical properties. HLA class I binding sites are closed, in that the peptide termini are pinned down into the ends of the groove. They are also involved in a network of hydrogen bonds with conserved HLA residues (Madden, D. R. et al., (1992) Cell 70, 1035-1048). In view of these restraints, the length of bound peptides is limited to 8-10 residues. Superposition of the structures of different HLA complexes confirmed a general mode of binding wherein peptides adopt a relatively linear, extended conformation. At the same time, a significant variability in the conformation of different peptides was observed also. This variability ranges from minor structural differences to notably different binding modes. Such variation is not unexpected in view of the fact that class I molecules can bind thousands of different peptides, varying in length (8-10 residues) and in amino acid sequence. The different class I allotypes bind peptides sharing one or two conserved amino acid residues at specific positions. These residues are referred to as anchor residues and are accommodated in complementary pockets (Falk, K. et al., (1991) Nature 351, 290-296). Besides primary anchors, there are also secondary anchor residues occupied in more shallow pockets (Matsumura, M. et al., (1992) Science 257, 927-934). In total, six allele-specific pockets termed A-F have been characterized (Saper, M. A. et al., (1991) J. Mol. Biol. 219, 277-312; Latron, F. et al., (1992) Science 257, 964-967). The constitution of these pockets varies in accordance with the polymorphism of class I molecules, giving rise to both a high degree of specificity (limited cross reactivity) while preserving a broad binding capacity.
In contrast to HLA class I binding sites, class II sites are open at both ends. This allows peptides to extend from the actual region of binding, thereby “hanging out” at both ends (Brown. J. et al., (1993) Nature 364, 33-39). Class II HLAs can therefore bind peptide ligands of variable length, ranging from 9 to more than 25 amino acid residues. Similar to HLA class I, the affinity of a class II ligand is determined by a “constant” and a “variable” component. The constant part again results from a network of hydrogen bonds formed between conserved residues in the HLA class II groove and the main-chain of a bound peptide. However, this hydrogen bond pattern is not confined to the N- and C-terminal residues of the peptide but distributed over the whole of the chain. The latter is important because it restricts the conformation of complexed peptides to a strictly linear mode of binding. This is common for all class II allotypes. The second component determining the binding affinity of a peptide is variable due to certain positions of polymorphism within class II binding sites. Different allotypes form different complementary pockets within the groove, thereby accounting for subtype-dependent selection of peptides, or specificity. Importantly, the constraints on the amino acid residues held within class II pockets are in general “softer” than for class I. There is much more cross reactivity of peptides among different HLA class II allotypes. Unlike for class I, it has been impossible to identify highly conserved residue patterns in peptide ligands (so-called motifs) that correlate with the class II allotypes.
The different characteristics of class I and class II MHC molecules are responsible for specific problems associated with the prediction of potential T-cell epitopes. As discussed before, class I molecules bind short peptides that exhibit well-defined residue type patterns. This has led to various prediction methods that are based on experimentally determined statistical preferences for particular residue types at specific positions in the peptide. Although these methods work relatively well, uncertainties associated with non-conserved positions limit their accuracy. Prediction methods for MHC class II-mediated T-cell epitopes essentially follow the same strategy, but are hampered by the fact that the binding groove is open. The latter makes it difficult to locate, in a pool of peptides identified as binders, the 9-residue segment that is actually responsible for the binding. This fact, combined with the intrinsically weaker constraints of the complementary pockets in class II binding grooves, makes the establishment of (pseudo-) motifs very difficult (Mallios, R. R. (2001) Bioinformatics 17, 942-948). On the other hand, class II peptide binding motifs generally include more anchor residues than class I motifs.
Methods for MHC/peptide binding prediction can grossly be subdivided into two categories: “statistical methods” that are driven by experimentally obtained affinity data and “structure-related methods” that are based on available 3D structural information of MHC molecules.
Statistical methods have been promoted under the impulse of a growing amount of binding data. Sources of binding information are, typically, elution and pool sequencing of peptides bound naturally to MHC molecules inside cells (Falk, K. et al., (1994) Immunogenetics 39, 230-242), phage display of peptide libraries (Hammer, J. et al., (1993) Cell 74, 197-203. Fleckenstein, B. et al., (1999) Sem. Immunol. 11, 405-416), data sets compiled from reports in the literature (Brusic, V. et al., (1998) Nucleic Acids Res. 26, 368-371, Rammensee, H. G. et al., (1999) Immunogenetics 50, 213-219). A common approach is to decompose, in a statistical way, the available experimental information into MHC type-specific and peptide residue position-specific numerical values reflecting the preference for individual amino acid types at that position (Parker, K. C. et al., (1994) J. Immunol. 152, 163-175). The matrices obtained in this way may then serve as profiles from which the binding affinity of a peptide sequence of interest can be estimated.
Structure-based methods generally include a first step wherein the structure of a specific MHC/peptide complex is modeled and a second step wherein the binding strength of the peptide is estimated from the modeled complex in accordance with an empirical scoring function. Examples include WO 98/59244, Altuvia, Y. et al., (1995) J. Mol. Biol. 249, 244-250; Doytchinova, I. A. and Flower, D. R. (2001) J. Med. Chem. 44, 3572-3581). Alternatively, a molecular dynamics simulation is sometimes performed to model a peptide within an MHC binding groove (Lim, J. S. et al. (1996) Mol. Immunol. 33, 221-230). Another approach is to combine loop modeling with simulated annealing (Rognan, D. et al., (1999) J. Med. Chem. 42, 4650-4658). Most research groups emphasize the importance of the scoring function used in the affinity prediction step. Schueler-Furman et al. (Schueler-Furman, O. et al., (2000) Prot. Sci. 9, 1838-1864) apply a statistical potential to evaluate the contacts between the peptide and the MHC receptor. Rognan et al. (1999) rely on a quantification of physicochemical effects (like H-bond formation, lipophilic contacts, desolvation, etc.). Swain et al. (Swain, M. T., et al., (2001) Proceedings of the second IEEE International Symposium on Bioinformatics and Biomedical Engineering. IEEE computer Society Press, Bethesda, Md., pp. 81-88) also apply a heuristic scoring function based on inter-atomic contacts, electrostatic interactions and H-bond formation. Doytchinova and Flower (2001) consider essentially the same contributions but follow a quantitative structure-affinity relationship (QSAR) method to assess the binding affinity. Logean et al. (Logean, A., et al., (2001) Bioinorg. & Med. Chem. Letters 11, 675-679) have analyzed the performance of 7 universal scoring functions. They found that many of these scoring functions yield poor correlation with experiment, in contrast to their “Fresno” scoring function. However, it was also recognized that the Fresno function cannot be universally applied but requires recalibration for different protein-ligand systems.
There is a need to substantially improve both the structure prediction and the affinity assessment steps of methods which predict the affinity of a peptide for a major histocompatibility (MHC) class I or class II molecule. The main problem encountered in this field is the poor performance of prediction algorithms with respect to MHC alleles for which experimentally determined data (both binding and structural information) are scarce. It is an aim of the present invention to provide a novel method for predicting the affinity of a peptide for a major histocompatibility (MHC) class I or class II molecule, also in cases where experimental information is rare.