The function of a protein is related to its three-dimensional structure. The three-dimensional structure of some proteins can be determined using X-ray crystallography or Nuclear Magnetic Resonance (NMR). However, for other proteins, these methods cannot be used to determine the three-dimensional structure. When these methods cannot be used, computer-based protein modeling techniques have been used with some success. These protein modeling techniques use the known three-dimensional structure of a homologous protein to approximate the structure of another protein.
In one such technique, the known three-dimensional structures of the proteins in a given family are superimposed to define the structurally conserved regions in that family. Among the members of a given family, there is considerable variation in the conformations of regions located between two consecutive structurally conserved regions, and thus, these regions are called the variable regions. These variable regions essentially contribute to the identity of a protein in its family. However, the modeling of the variable regions has been unsatisfactory.
Conventional homology modeling techniques have been used routinely to build models of proteases and antibodies. These techniques generally model the three-dimensional position of amino acids in structurally conserved regions by taking the Cartesian coordinates from homologous amino acids in a template protein with known three-dimensional structure. For the amino acids in the variable regions, these techniques take suitable loops from the Protein Data Bank (PDB). (The PDB is a collection of protein information relating to the known structure of protein. The PDB is administered by Brookhaven National Laboratory.) Although these techniques are generally successful in modeling the structurally conserved regions for structurally undefined members of the family, these techniques have been unsuccessful in modeling of the variable regions. Since variable regions and structurally conserved regions of the model protein come from different protein structures, high energy short contacts are often found in the models. These high energy contacts are usually between inter-variable regions that are grafted from different known protein structures. In cases where the sequence identity in the structurally conserved regions between the template and the model protein is weak, the interior amino acids are also susceptible to short contacts. Generally, these short contacts are removed by performing rotation around single bonds using interactive graphics, which is a tedious, and at times, an impractical procedure. Energy minimization has been used to relax strains in a model. However, the minimization procedure leads to structures that are trapped in local minima and relies entirely on the integrity of the starting structure.
Certain proteins with very weak sequence identities fold into similar three-dimensional structures. For example, the three-dimensional structures of a number of helical cytokines fold in similar three-dimensional topology in spite of weak sequence homology. Members of helical cytokine family not only show diversity in their disulfide topology but also the disulfide crosslinks are part of the variable regions. Besides the helical cytokines, there are other protein families with weak sequence identities and non-homologous disulfides in the variable regions. The prior homology modeling techniques produce unsatisfactory modeling of these families because of the absence of sequence homology in the structurally conserved regions and the presence of non-homologous disulfide crosslinks in the variable regions.