Although three-dimensional structures of proteins are available at the atomic level, for example, from experimental measurements such as X-ray crystallography and nuclear magnetic resonance (NMR) or computational simulations, the description of protein folding and the consequent shapes is still a challenging subject. In a folded protein, some local fragments can be described as α-helices and β-strands that are due to hydrogen bond formation. However, the remaining local fragments of the protein are commonly irregular coils, loops and other shapes and conformations that are difficult to identify and describe.
Several methods have been developed to compare protein structures with alignment of secondary structures, such as Dali (see Holm L, Sander C., J. Mol. Biol., 1993a; 233: 123-138), STRUCTAL (see Gerstein M, Levitt, M. In Proc. Fourth Int. Conf. on Intell. Sys. for Mol. Biol. Menlo Park, Calif.: AAAI Press. 1996. p 59-67.), VAST (see Gibrat J F, Madel T, Bryant S H. Curr. Opin. Struct. Biol. 1996; 6:377-385.), LOCK (see Singh A P, Brutlag D L. In Proc. Fifth Int. Conf. on Intell. Sys. for Mol. Biol. Menlo Park, Calif.: AAAI Press. 1997. p 284-293.), 3DSearch (see Singh A, Brutlag D. 3dSearch http://gene.stanford.edu/3dSearch.), CE (see Shindyalov I N, Bourne P E. Protein Eng. 1998; 11(9):739-47.), SSM (see Krissinel E, Henrick K, Acta Crystallogr D Biol Crystallogr. 2004; 60(Pt 12 Pt 1): 2256-2268.), PALI (see Balaji S, Sujatha S, Kumar S S C, Srinivasan, N. PALI, Nucleic Acids Res. 2001; 29: 61-65.), and the like, all of which are hereby incorporated by reference. The structural classification of protein has been defined and stored by SCOP and CATH database (see Park J H, Ryu S Y, Kim C L, Park I K J., Genome Informatics 2001; 12: 350-351; and Hadley C, Jones D T. Structure 1999; 7(9): 1099-112).
A significant challenge in the study of protein folding relates to the need or the requirement to describe and compare the possible types of folding motifs. It has been estimated that there can be as many as 4,000 possible types of folding in protein, among which about 2,000 types are known in naturally-occurring proteins (see Govindarajan S, Recabarren R, Goldstein R A., Proteins. 1999; 35(4): 408-414). Because of the existence of such a large number of rare and unnatural types of folds, a comprehensive database for all the existing types of folding is difficult. The lack of knowledge regarding protein folding and conformation has led to the development of many technologies.
For example, U.S. Pat. No. 5,265,030 to Skolnick et al. is a method for determining a protein's tertiary structure from a primary sequence of amino acid residues. Specifically, the method in the '030 patent considers the free unconstrained interactions between residues and between side chains, and tracks the entire folding operation from the protein's unfolded state to its full folded state. The '030 patent does not use torsion angles and pitch distances within overlapping elements, wherein each element consists of five consecutive amino acids, to describe protein folding.
U.S. Pat. No. 5,680,319 to Rose et al. is directed to a computer-assisted method for predicting the three-dimensional structure of a protein fragment from its amino acid sequence. This method starts with a defined polypeptide chain of defined sequence, preferably in a fully extended conformation, and uses idealized geometry and highly simplified energy functions to fold the chain in hierarchic stages to predict both secondary and super-secondary structures. The '319 patent does not use torsion angles and pitch distances within overlapping elements, wherein each element consists of five amino acids to describe protein folding.
U.S. Pat. Nos. 6,345,235 and 6,516,277 to Edgecombe et al. are directed to determining multi-dimensional topology of a substance within a volume. Specifically, the methods determine molecular shape and structural information of proteins using van der Waals surfaces, electrostatic potentials or electron density. The '235 and '277 patents do not use torsion angles and pitch distances within overlapping elements, wherein an element consists of five amino acids, to describe protein folding.
U.S. Pat. No. 6,512,981 Eisenberg et al. is directed to a computer-assisted method for assigning an amino acid probe sequence to a known three-dimensional protein structure. Specifically, the method uses the amino acid sequence of the probe, and the sequence-derived properties of the probe sequence, such as the secondary structure, and solvent accessibility to compute an alignment score. The '981 patent does not use torsion angles and pitch distances within overlapping elements, wherein each element consists of five amino acids, to describe protein folding.
U.S. Pat. No. 6,792,355 to Hansen et al. is a method for separating two or more subsets of polypeptides within a set of polypeptides using the steps of selecting a sequence comparison signature for each amino acid sequence, constructing a distance arrangement according to the distance between each of the sequence comparison signatures, and identifying a first and second cluster of sequence comparison signatures. The '355 patent does not use torsion angles and pitch distances within overlapping elements, wherein each element consists of five amino acids, to describe protein folding.
U.S. Pat. No. 6,832,162 to Floudas et al. is directed to an ab initio prediction of the secondary and tertiary protein structures by using selected force fields to calculate first, the low energy conformations of overlapping pentapeptides and then, the total free energy of the entire system. The '162 patent does not use torsion angles and pitch distances within overlapping elements, wherein each element consists of five amino acids, to describe protein folding.
U.S. Pat. No. 7,158,888 to McRee et al. is related to determining a structure of a target biomolecule such as a protein from X-ray diffraction data. Specifically, the method in the '888 patent performs multiple molecular replacement searches on the X-ray data using a search model, compares molecular replacement solutions thus derived, and predicts which search model biomolecule has superior structure identity with the target biomolecule. The '888 patent does not use torsion angles and pitch distances within overlapping elements, wherein each element consists of five amino acids, to describe protein folding.
U.S. Pat. No. 7,288,382 to Harbury et al. is a method for structural analysis of proteins, including mapping of the sites for ligand binding, and protein-protein interactions. Specifically, the method in the '382 patent introduces cysteine residues by translational misincorporation such that the misincorporated cysteines serve as targets for modification. The '382 does not use torsion angles and pitch distances within overlapping elements, wherein each element consists of five amino acids, to describe protein folding.
In sum, none of the conventional methods for describing protein folding and conformations are satisfactory. Therefore, there remains a need for a method to describe all possible types of folding in proteins. There also remains a need for an algorithm to compare folding among different proteins or different conformations of the same protein.