This invention is related to the fields of drug design and computational chemistry. More particularly, the invention relates to methods and devices for calculating and predicting the functional behavior of different molecules based on molecular similarity.
Structure-based drug design has produced some success stories in the area of prospective rational design of high-affinity ligands, particularly in the case where an X-ray structure of the biological target protein has been available and docking approaches such as Hammerhead and DOCK are applicable. More often, however, a high-resolution structure of an interesting target protein is not available. In these cases, computational techniques for molecular diversity optimization of screening libraries as well as techniques for three-dimensional quantitative structure-activity (3D QSAR) modeling become important. These techniques require methods of quantitative and/or geometric comparison between pairs of molecules.
Molecular diversity optimization requires a metric for estimating the relative redundancy of one molecule compared to another. For the problem of designing a library of molecules to screen against a variety of biological targets, the relevant notion of redundancy is the degree to which one molecule is likely to bind the same sites as another. Several popular methods for molecular diversity optimization rely on topological diversity in the space of two-dimensional molecular representations and are based on the work of Willett et al., J. Chem Information and Computer Sci (1986) 26:109-18.
FIG. 1 illustrates the problem with an example reported by Y. Martin et al., xe2x80x9cExperience with the Application of Computers to Library Designxe2x80x9d, Cambridge Heath-tech Institute""s Second Annual Conference on Chemoinformatics (1998). Nicotine and several analogs are shown along with a known oxazole-containing nicotinic agonist and acetylcholine, the natural ligand. The molecules are listed in order of decreasing similarity, according to the Tanimoto coefficient of their 2D fingerprints, as implemented in the Daylight software package (referred to herein as the xe2x80x9ctopological methodxe2x80x9d). Note that the simple nicotine analogs show high computed similarity to nicotine. However, a known, potent, competitive ligand with obvious structural similarity has low computed similarity, and the natural ligand is judged to be unrelated using this metric.
While there have been some attempts at three-dimensional approaches to the diversity optimization problem, none have successfully addressed the fundamental issue, that of the pairwise distance measure and its relationship to the biological functional relatedness of molecules. To the extent that a method can predict likely geometric relationships of molecules in the context of binding to protein active sites, it may also have applicability to the 3D QSAR problem.
A new method for rapidly comparing two molecules and determining a measure of similarity having biological relevance has now been invented.
One aspect of the invention is a method for comparing two molecules to predict if they will exhibit similar biological activities, by providing a set of reference points having reference coordinates, computing a molecular surface for a first molecule, determining the distance from each reference point to the molecular surface to provide a first set of distances, computing a molecular surface for a second molecule, determining the distance from each reference point to the second molecular surface to provide a second set of distances, and calculating the difference between the first set and second set of distances to determine the difference between the first molecular surface and the second molecular surface.
Another aspect of the invention is a system for comparing two molecules to predict if they will exhibit similar biological activities, comprising an input means for providing a set of reference points having reference coordinates; computation means for computing a molecular surface for a first molecule and determining the distance from each reference point to the molecular surface to provide a first set of distances, computing a molecular surface for a second molecule, determining the distance from each reference point to the second molecular surface to provide a second set of distances, and calculating the difference between the first set and second set of distances to determine the difference between the first molecular surface and the second molecular surface; storage means for storing intermediate and final results; and output means for displaying the results.
Another aspect of the invention is a machine-readable medium having stored a set of instructions capable of causing an appropriate machine to accept a set of reference points having reference coordinates, compute a molecular surface for a first molecule, determine the distance from each reference point to the molecular surface to provide a first set of distances, compute a molecular surface for a second molecule, determine the distance from each reference point to the second molecular surface to provide a second set of distances, and calculate the difference between the first set and second set of distances to determine the difference between the first molecular surface and the second molecular surface, thereby determining the morphological similarity between two molecules.