The search for correlations in many types of data, such as biological data, can be difficult if the data are not exchangeable or independent and identically distributed (IID). For example, a set of DNA or amino acid sequences are rarely exchangeable because they are derived from a phylogeny (e.g., an evolutionary tree). In other words, some sequences are very similar to each other but not to others due to their position in the evolutionary tree. This phylogenetic structure can confound the statistical identification of associations. For instance, although a number of candidate disease genes have been identified by genome wide association (GWA) studies, the inability to reproduce these results in other studies is likely due in part to confounding by phylogeny. Other areas in which phylogeny may confound the statistical identification of associations include the identification of coevolving residues in proteins given a multiple sequences alignment and the identification of Human Leukocyte Antigen (HLA) alleles that mediate escape mutations of the Human Immunodeficiency Virus (HIV).
The human adaptive immune response is composed of two core elements: antibody-mediated response (sometimes called humoral response), and T-cell-mediated response (sometimes called cellular response). To date, essentially human vaccines have been made by exploiting the underlying mechanisms of the antibody-mediated response, for example with diseases such as polio and measles. However, for these diseases, it was known that people could recover upon acquisition of humoral immunity. In contrast, for certain viruses—for example, HIV—there are no known documented cases of a person recovering from the infection, and it is highly unlikely that the same principles of vaccine design could be successfully applied in these cases. In particular, it is thought that vaccines for diseases such as HIV must prime the cellular immune response rather than or in addition to the humoral response.
Generally, cellular response mechanisms can be characterized by an ability of certain antigen-presenting cells to ingest and digest viral proteins into smaller peptides, and then to present these peptides, known as epitopes, at the surface of the cell. This process is mediated by HLA molecules which form a complex with the epitope before it is presented. The epitope/HLA complexes can then be recognized by a T-cell, thereby activating the T-cell to subsequently recognize and kill virally infected cells. Several types of T-cells exist, each playing its own role. In ongoing HIV vaccine research, the elicitation of a CD8+ T-cell response has shown promise.
T-cell epitopes are presented on the surface of an antigen-presenting cell, where they are bound to Major Histocompatibility Complex (MHC) molecules. T-cell epitopes presented by MHC class I molecules are typically peptides between 8 and 11 amino acid in lengths, while MHC class II molecules present longer peptides, and non-classical MHC molecules also present non-peptidic epitopes such as glycolipids.
Due to specificity in a number of sequential mechanisms, only certain epitopes are both presented at the surface of antigen-presenting cells and then subsequently recognized by T-cells. This specificity is determined in part by the sequence and properties of the presented epitope and by the genetic background (i.e., allelelic diversity) of the host (humans have up to six HLA class I alleles arising from the A, B and C loci). A crucial task in vaccine development is the identification of epitopes and the alleles that present them, since it is thought that a good vaccine will include a robust set of epitopes (robust in the sense of broad coverage and of covering regions that are essential for viral fitness in a given population characterized by a particular distribution of HLA alleles).
Because experiments required to prove that a peptide is an epitope for a particular HLA allele are time-consuming and expensive, epitope prediction can be of tremendous help in identifying new potential epitopes whose identity can then be confirmed experimentally. Beyond vaccine design, epitope prediction may have important applications such as predicting infectious disease susceptibility and transplantation success.