Bioinformatics is an area of research which employs applied computer science, mathematics and physics to solve biological problems. Structural bioinformatics refers to the use of bioinformatics to solve the unique set of biological problems which relate to the three dimensional structures of polypeptide or protein sequences, herein referred to as protein structures. Protein structures are sets of atomic coordinates representative of a three dimensional structure of a protein. Atom coordinates may be determined computationally or experimentally by using a variety of techniques such as x-ray crystallography, electron microscopy and nuclear magnetic resonance spectroscopy.
Conservation is the phenomenon by which residues or polypeptides in homologous protein structures are subject to lower rates of substitution than other parts of the protein structure. Conservation is thought to be representative of structural and functional importance of these residues and polypeptides. Obtaining an accurate characterization of conservation in a protein structure therefore is critical for addressing biological problems such as targeted drug design and pathogen detection.
Conservation is a relative value because substitution rates for residues are determined relative to a set of homologous protein structures. Consequently, identifying a proper set of homologous protein structures for a given protein structure is a prerequisite for obtaining a good characterization of conservation in the protein structure.
Identifying a set of homologous protein structures for a given protein structure is complicated by the fact that a single metric will not usually provide an optimal indication of protein homology. This is largely due to variability of conservation in different domains of protein structures. For instance, proteins with overall similarity in structure, herein referred to as global similarity, may not have good local correspondence between domains. Conversely, proteins that have a high degree of local similarity due to evolutionarily conserved domains may not always have good global similarity due to structurally variable or unstructured regions, such as loops.
Therefore, one of the best methods in characterizing the conservation in a protein structure is to determine a family or category of related protein structures to which the protein structure belongs. However, the identification of the family of protein structures is also complicated for the above reasons.
Thus, there is a need in the art for improved methods of characterizing conservation in protein structures. The present invention addresses these and other shortcomings of the prior art.