The bioactive properties of a protein are primarily a function of that protein's structure. Though research may focus on one particular application, a single protein may be reactive with many different proteins within the body. When such a protein is being developed as a potential drug, these other interactions can result in inefficiency due to blood plasma bonding, harmful side effects or even useful off-label uses.
Laboratory experiments for drug discovery traditionally required in vitro experimentation to study behavior between molecules and proteins in control test conditions, with the later possibility of in vivo experiments in animals. Computer-modeled experimentation, or “in silico” experimentation, has been shown to be a faster, cheaper and safer alternative to in vitro or in vivo experimentation.
To aid in silico research, the Protein Database (PDB) has been established and is freely accessible to researchers. PDB files give three-dimensional spatial coordinates for each atom in a given protein. Due to the breadth of the data base and the ease of access, the PDB file format is a widely accepted convention for in silico research.
Effective computer modeling requires highly accurate modeling of the structure of the individual proteins. The natural structure of a protein is determined by its chemical bond structure, called the primary structure, and the folded arrangement the protein occupies in three-dimensional space, called the secondary structure. The secondary structure of a protein is the arrangement which is generally the most compact possible, with the lowest energy state. Protein structures are generally composed of subunits, where each subunit has a characteristic structure and profile. Structure of the entire protein can be described as a composite of the individual subunits.
Existing superpositioning software is capable of atomic-level distance minimization of proteins to attempt to predict bonding structure, but the accuracy of this software is dependent on preliminary protein sequencing alignment. Because of its dependence on preliminary protein sequencing alignment, existing superpositioning software cannot be used for superimposing divergent homologous protein structures.
Existing structural alignment software is able to predict protein alignment without preliminary sequencing. But such software generally calculates distance minimization on the fold-level, by pairing a short list of oligopeptides which are known to have high affinity, rather than on the atomic-level, leading to bonding structure predictions which undermine the individual proteins' secondary structures. Further, because sequence alignment is based on matching oligopeptide pairs, it is not possible to obtain a complete sequence alignment of the protein. Additionally, because structural alignment software does not perform calculations on the atomic-level, existing structural alignment software is incapable of generating PDB format files.
Another method is to model atomic-level alignments and use statistical analysis to determine the most likely arrangement. Though computationally intensive due to the enormous numbers of possible configurations, calculating the possible arrangements of atoms in a three-dimensional protein structure is not prohibitively difficult. The challenge is in determining the configurations which are most likely to be correct.