1. Field of the Invention
This invention is concerned with a novel algorithm for determining the spatial-arrangement topology of a protein from its secondary structure. The invention is an essential aid in finding the topology and ultimately the three dimensional (3D) structure of unknown protein structures. The invention also employs the algorithm to estimate the structural stability of a protein with a given secondary structure and sequence using a global entropy evaluation method combined with thermodynamic parameters obtained from experimental data of binding free energy (FE). The method also predicts the dominant folding pathway of a protein as a result of using this global entropy evaluation method.
2. Description of the Related Art
Determination of the three dimensional structure of a protein remains a very difficult process (1-5). The most successful approach is x-ray crystallography, which involves fitting diffraction data obtain at very large government-funded synchrotron facilities (1). Although academic professionals are able to apply for government funding to measure and analyze protein structures at such facilities, the costs of maintaining and operating such facilities that provide the beam time to support these experiments are prohibitively expensive and not accessible for commercial enterprises to fund. Moreover, biology related commercial enterprises need this information; particularly pharmaceutical companies where new drugs are always under development.
Furthermore, protein structures obtained by x-ray crystallography require skilled techniques to express and crystallize a given protein before such a measurement can be made (7, 8). It remains questionable whether all proteins can be crystallized and whether the crystalline structures fully represent the in vivo features of many biologically relevant proteins. Whereas many enzymes remain active even in this crystalline geometry (9), the true dynamics of these structures and the range of conformations can only be inferred in the x-ray data because the protein structures are rigidly locked in a crystal. The in vivo structures of protein subunits are even more difficult to assess as crystals.
A second approach is NMR spectroscopy (10, 11). NMR spectroscopy is cost efficient for a company to carry out. However, this technique is often fraught with difficulties due to the time resolution of NMR experiments, the effects of solvent exchange and other complex coupling effects (7, 8). In addition, the same problems that hamper x-ray crystallography research—protein expression, isolation, and characterization—also render this approach costly.
The easiest information to obtain accurately with NMR spectroscopy is the protein secondary structure (11). However, the protein secondary structure carries insufficient information to unambiguously identify the topology of a given protein (12).
The most important topological information gained by NMR experiments is the nuclear Overhauser effect (NOE) constraints (10-12, see also U.S. Pat. No. 6,512,997). One must first obtain many unambiguous NOE-constraints to obtain a successful prediction. However, many proteins have highly ambiguous NOE-constraints or the NOE signals are too weak and broad banded to properly assign. In such cases, the protein structure cannot be resolved by NMR and the only remaining option is to turn to x-ray crystallography.
A third approach is protein threading (13-20, see also U.S. Pat. Nos. 6,512,981; 5,878,373; 5,884,230 and 6,377,893). However, many proteins still have less than 25% homology with known protein structures in the protein data bank (PDB). To find a plausible template structure for a protein of 25% homology, considerably more information is needed to insure the accuracy of the prediction (13, 17) and there is no objective method for deducing which structures make acceptable threads.
A remaining option is to carry out a molecular dynamics (MD) simulation (21-26). Currently, molecular dynamics however, the time frame for a full protein refolding experiment remains intractable because of the long calculation times required from even the fastest computers (thousands of years even on a parallel processor supercomputer to achieve one ms of biological simulation time). Moreover, the uncertainties and ambiguities of even the state of the art MD simulation program render whatever conclusions can be made from such a long simulation questionable (21, 23-26).
Combinatorial folding models of secondary structure alone (27, 28) yield an intractable number of structural topologies to test in an MD simulation in explicit water (15, 26) If the correct topology can be obtained, the computational cost of an MD simulation is drastically reduced and the confidence level of the predictions improved to a root mean square (RMS) deviation of no more than 3 Å (29).
What is needed is an intermediate cost effective and objective approach that can infer the topology without having to wait several millennia for the answer to be produced, applying for large grants to budget synchrotron machine time, spending long hours in the lab searching for ways to isolate proteins, or utilizing subjective methodologies to infer the protein structure. The topology indicates how the secondary structure of a protein is arranged spatially and is the main juncture between the secondary structure and the full 3D structure. The topology (spatial arrangement) cannot be obtained from the three-state secondary-structure alone.
The invention is a semi-dynamic thermodynamic model of protein folding that we developed from RNA research (30) to account for the entropy of folding. Once the topology is known, a protein can be tested for its 3D structure with only a small fraction of the computer simulation time required for a complete protein refolding MD simulation. The invention is intended to aid the NMR and x-ray crystallographer in finding the 3D structure of an unknown protein based upon partially determined structural information, specifically the protein secondary structure.
The importance of gaining a foothold on protein topology cannot be emphasized enough. First, the experimental conditions that complicate the NMR experiments on proteins are generally the norm. Highly flexible proteins may have marginally stable secondary structures that make their structures difficult to determine experimentally with high precision by NMR. Second, functional proteins are dynamic entities, not static crystals (31, 32). For regions of structure that exhibit a high degree of flexibility and polar-regions where there is rapid solvent exchange, NMR spectroscopy is limited by its time resolution (10, 11). X-ray crystallography can obtain the structure of a protein that can be crystallized; however, the overall dynamics of the protein in solution are less clear. Topology prediction offers an independent tool to guide the structural determination and improve our understanding of the physics of protein structure and folding dynamics.
The folding model considers the direction in which biological proteins are synthesized and transported through the cell as a basis for considering the step-by-step thermodynamics of folding.