The present invention relates to methods of identifying binding sites on proteins, methods for identifying classes of compounds suitable for binding a protein, and methods of conducting experiments to identify compounds that interact with a protein to affect a biological process.
Determinations of protein structures have to date been conducted by isolating crystals of the protein of interest, and analyzing structure by X-ray crystallography. Typically, the protein has been co-crystallized with heavy metal component, or subjected to multiple co-crystallizations, with the heavy metal providing a reference for solving the crystallographic data.
With a determination of the structure of a protein, or the structure of another macromolecule having significant tertiary structure, such as a DNA or RNA, workers often seek to identify the binding sites that are or may be of significance to a biological process, such as an enzyme active site or a site for interacting with another macromolecule or with itself. Computational efforts have been focused on efforts to sample the surface of a molecule to find good fits with known binding agents. These methods have had modest success, and are dependent on knowledge of (a) the structure of good binding agents and, often, (b) the function of the protein. A more traditional approach has sought to co-crystallize binding substances with the macromolecule to identify binding sites. With the binding site identified, educated guesses can be made as to new molecules that could bind the site. These educated guesses can guide synthetic methods, including combinatorial chemistry methods, to make and test new molecules. When such prospective binding agents prove effective binding agents, and possibly are also found effective in an appropriate biological model, the structural correlations drawn from the results can be tied to information about the binding site to make still further inferences about the structure important to a biological function. This co-crystallization approach depends on an initial knowledge of active agents, and is experimentally difficult and time consuming.
The present inventor has found a method of identifying, from a three-dimensional structural solution of a macromolecule, the binding sites for molecules. The structural solution used as the basis for the method can be derived from crystallography, spectroscopic analyses such as NMR, computational derivations, or any other method of determining the structure of a macromolecule. The method does not require or typically use information on the function of the macromolecule, as the method avoids subjective biases and instead depends purely on physical parameters. Further, the method can be refined further to narrow the possible choices of binding sites and identify the functionalities, i.e., organic fragments or xe2x80x9cORFs,xe2x80x9d that effectively interact with the binding site(s). The data obtained for ORFs further identifies the orientations of the functionalities useful in a candidate binding agent, thereby providing a tool for searching chemical databases to identify candidate binding agents. Where the methods described herein identify more than one potential binding site, the data generated through these methods can be used to energetically rank the binding sites, and thereby quantitatively determine which site has the potential to more strongly bind molecules.
The computational method described here generates maps of binding site preferences that are nearly identical with maps produced by compiling data generated by traditional methods, but with one important differencexe2x80x94the experimentally produced data took many years to produce while the data produced as described herein can be produced in no more than a few weeks. The invention provides an important development in unbiased simulation methods for predicting the character of agents that bind to biological macromolecules to affect the function of the macromolecules.
In one embodiment, provided is a method of identifying binding sites on a macromolecule comprising: (a) for at least one organic fragment (ORF), conducting, at separate values of parameter B, two or more simulated annealing of chemical potential calculations using the ORF as the inserted solvent; and (b) comparing converged solutions from step (a) to identify first locations at which the relevant ORF is strongly bound, thereby identifying candidate sites for binding ligand molecules. In one preferred aspect, the method further comprises: (c) identifying clusters of sites that strongly bind an ORF. In another preferred aspect, the method further comprises: (d) conducting steps (a) and (b) for each of two or more ORFs and identifying clusters where two or more distinct ORFs bind. Preferably, a cluster that binds three or more distinct ORFs is identified. The method can identify further functionalities that contribute to the binding of bioactive agents by reducing the binding stringency in the vicinity of a cluster to further identify elements that would contribute to the binding of a bioactive agent.
In another preferred aspect, the method further comprises: (e) conducting, at separate values a measure of chemical potential, two or more simulated annealing of chemical potential calculations using water as the inserted solvent; (f) comparing converged solutions from step (c) to identify locations at which water is strongly bound, thereby identifying locations on the protein which are not candidate sites for binding ligand molecules; and (g) identifying first locations that are not water locations.
In still another preferred aspect, the simulated annealing of chemical potential calculations comprise multiple steps of sampling, and wherein in a number of steps of the sampling the ORFs position is changed by a small amount and the resulting new position is accepted or rejected based on the change in energy as a result of the change attempted.
Further provided is a method of identifying the chemical characteristics of compounds that bind a macromolecule comprising examining the functionalities and relative orientations of the ORFs found in a cluster pursuant to the binding site identifying method outlined above.
Also provided is a method of conducting combinatorial chemistry to identify compounds that interact with a macromolecule comprising: (a) identifying classes of reactants that are modeled by the functionalities of the ORFs found in a cluster pursuant to the binding site identifying method of macromolecule; (b) designing a combinatorial synthetic protocol that calls for two or more synthetic procedures that react reagents of at least two of the classes identified in step (a); and (c) conducting the combinatorial synthetic protocol to create candidate binding molecules.
Further provided is a method of conducting a bioactive agent discovery process comprising: (a) from a group of established combinatorial synthetic protocols or a collections of chemical compounds or pools of chemical compounds, identifying those members of the group that provide a high density of compounds that meet for a macromolecule selection criteria identified from the binding site identifying method of macromolecule; and (b) conducting binding or functional assays to identify compounds obtained from the identified collections or protocols which bind or affect the function of the macromolecule.