The 3-dimensional structure of a folded and functioning protein is determined by the primary sequence of amino acids. This protein structure is commonly referred to as the native state and is the protein conformation which allows the protein to perform its designated function. This native state is considered to be deterministic, as the given primary sequence of a protein has a unique 3-dimensional structure in the native state (under ambient conditions). A better understanding of proteins in their natural environment is based on recent investigations that have lead to a slightly refined definition of the native state, defining it as an average structure of an ensemble of structures that deviate within small limits (N. Furnham et al., Nature Structure & Molecular Biology (2006), 13, 184-185, M. A. DePristo et al., Structure (2004) 12, 831-838). Other definitions of native state suggest the most populated state in an ensemble of similar structures (M. A. DePristo et al., Structure (2004) 12, 831-838). An algorithm to predict the folded structure (hereinafter called the native state) of a protein has immense potential applications, including drug design for protein targets where the structure is not available from experimental studies or is expensive to produce through experimental investigations. Further, the fundamental understanding of the mechanism of protein folding is very poorly understood. A better understanding of protein folding mechanisms has broad implications in understanding diseases, such as Alzheimer's and Parkinson's diseases, which result from misfolded or disordered proteins.
Numerous approaches have been proposed to predict the 3-dimensional native structure of the protein based on the primary sequence, which have been surveyed in detail elsewhere (C. D. Snow et al., Ann. Rev. Biophysics Biomolecular Structure (2005), 34:43-69, B. Kuhlman et al., Current Opinion in Structural Biol. Feb. 14, 2004 (1):89-95). Some of these approaches are known as “ab-initio”, as they require only the knowledge of primary sequence. Other approaches use information from experimentally solved structures in some form of a database or a reference set. The later approach has made significant progress in the recent years; however, it has limited success in cases where relevant protein folds are not available (i.e. the database is limited, for example in the case of membrane proteins). Further, these methods are not able to discern the mechanism of protein folding.
In most ab-initio approaches, some form of atomistic modeling of the protein (or a reduced model of protein structure—for example only considering the protein backbone or including hydrogen atoms into heavy atoms) is used, which is commonly referred to as molecular mechanics. The challenge of folding a protein, through computer modeling and simulations using molecular mechanics, lies in finding the global minimum of the protein conformational energy landscape. There is sufficient experimental and theoretical evidence that the folded structure, or the native state, corresponds to a region close to the global energy minimum. A popular understanding of the protein conformational energy landscape describes it as a “folding funnel” (K A Dill et al., Nature Structural Biol., 1997, 4(1):10-9). The bottom of this funnel represents the native state with lowest energy conformation, and the surface of this funnel is not smooth and consists of local energy minima and maxima (see FIG. 1). Further the starting points with different extended conformations start from the different top portions of the funnel but eventually reach the same native state following alternate pathways.
In the molecular mechanics approach, the conformational energy is described as a mathematical function of geometrical variables inside the protein. The underlying mathematical function and variables are referred to as the potential function and the degrees of freedom of a protein, respectively. A typical potential function is shown in Equation 1, where the first term corresponds to the bond stretching term, the second term corresponds to the angle bending term, the third term corresponds to the angle torsion (dihedral) term. The last term is commonly referred as the non-bonded term, representing van der Waals and electrostatic energy components respectively.
                              V          ⁡                      (                          r              N                        )                          =                                            ∑              bonds                        ⁢                                                            K                  l                                ⁡                                  (                                      l                    -                                          l                      eq                                                        )                                            2                                +                                    ∑              angles                        ⁢                                                            K                  θ                                ⁡                                  (                                      θ                    -                                          θ                      eq                                                        )                                            2                                +                                    ∑              torsions                        ⁢                                                            V                  n                                2                            ⁢                              (                                  1                  +                                      cos                    ⁡                                          [                                                                        n                          ⁢                                                                                                          ⁢                          ϕ                                                -                        γ                                            ]                                                                      )                                              +                                    ∑                              i                =                1                            N                        ⁢                                          ∑                                  j                  =                                      i                    +                    1                                                  N                            ⁢                              (                                                      [                                                                                            A                          ij                                                                          r                          ij                          12                                                                    -                                                                        B                          ij                                                                          r                          ij                          6                                                                                      ]                                    +                                                                                    q                        i                                            ⁢                                              q                        j                                                                                    ɛ                      ⁢                                                                                          ⁢                                              r                        ij                                                                                            )                                                                        (                  Eq          .                                          ⁢          1                )            A protein with N atoms has 3N−6 internal degrees of freedom; in other words, the protein conformational space has 3N−6 dimensions. If only ten points are used to sample each dimension (in an oversimplified description of the protein conformational space), the total number of points to be sampled are 103N−6. The size of a protein conformational energy space, therefore, increases very rapidly with the number of atoms in the protein. Mathematically, the conformational energy function is highly multi-dimensional (if each degree of freedom defines one dimension). For a small protein (with about 100 amino acid residues) N is typically between 1,000 and 2,000. The conformational space for a biologically relevant protein is extremely large with total number of points to be evaluated are M3N−6, if it is assumed that M points on average are evaluated per dimension. Therefore, using such equations, it is impossible to exhaustively explore the complete conformational space even for a small protein in a reasonable time-frame. It is known that a native protein can fold in milliseconds, however, sampling of such a large number of points in a protein would take several millions years. This is what has been mentioned as Levinthal's Paradox in literature (Levinthal, C., Journal de Chimie Physique et de Physico-Chimie Biologique 1968, 65, 44).
In nature, proteins have been known to fold quickly from an extended or misfolded state to the native state. It is known that a native protein can fold in milliseconds, however, sampling a large number of points (as discussed above) even by a protein would take several millions years. This is what has been mentioned as Levinthal's Paradox in literature (Levinthal, C., Journal de Chimie Physique et de Physico-Chimie Biologique 1968, 65, 44). Therefore, it has been widely suggested that a protein goes from extended conformations to the native state through a process which is referred to as the mechanism of protein folding. During this process the extended conformation which is higher in energy, follows pathways consisting of local minima and local maxima, to eventually reach the global energy minima that corresponds to the native state (see FIG. 1). In most cases it has been shown that the information required to achieve the native state is present in the protein sequence. In some cases molecules known as chaperones assist the proteins to reach their native state. Experimentally, it has been difficult to obtain detailed information about the mechanism of protein folding because the intermediate states are transient (C. R. Matthews et al., Ann. Rev. Biochem. (1992), 62: 653-83, A. C. Apetri et al., J. Am. Chem. Society (2006);128(35):11673-8).
Some previous investigations have explored displacement of protein structure along normal modes to predict folded protein structure. However, these studies suffer from serious limitations, preventing them from predicting three-dimensional structure of proteins. Further, these studies have not reported any information on the mechanism of protein folding. First, they have only explored the use of fast vibrations on femtosecond time-scale using peptides of 15-amino acid residues (C. Chen et al., Biophysical J., (2005), 88: 3276-3285). Note in the cited study the modes were computed using principal component analysis on very short molecular dynamics trajectories, and did not explore the more important vibrational modes from longer time-scales. Further, past studies have not proposed the use of protein vibrational modes that are computed using quasi harmonic analysis based on a set of structures from different areas of the protein conformational space, which allows overcoming large energy barriers between different conformations. More importantly, past studies they have only explored very small model size (15-20 amino acid residues) peptides and have not shown results from actual proteins (Z. Zhang et al., Biophysical J., (2003) 84: 3583-3593 and P. Carnevali et al., Am. Chem. Soc. (2003) 125 (47): 2003 14245).
Therefore, there is considerable interest in and need for a methodology that can efficiently predict the three-dimensional structure of a protein from its primary amino acid sequence and the mechanism of protein folding.