1. Technical Field
The invention relates to the field of molecular similarity searching, and, more specifically, similarity searching in databases of three dimensional molecular structures.
2. Description of the Related Art
In the field of drug design, where one is attempting to expand the number of lead compounds that show activity toward a particular therapeutic target, structural information about the target is often lacking or unavailable. Similarity searching in files of chemical compounds is a common way to uncover new leads in such situations. Typically, one or more compounds that are known to be active toward the target of interest are selected, and a feature scheme is defined that characterizes the molecular properties of interest. Features are derived from the selected structures and used to search against a database of structures that have been keyed under the same feature scheme.
Feature schemes may be structural (three-dimensional) in nature or topological (derived solely from the molecular graph). Features that are three-dimensional (3D) characterize a whole or part of a particular conformation of a molecule, and thus are dependent on the particular conformations of the molecules stored in the database. 3D features can include a) pharmacophoric descriptors, such as distances, angles, or dihedral angular relationships between key groups (see Martin, Y. et. al., A fast new approach to pharmacophore mapping and its application to dopaminergic and benzodiazepine agonists. J. Comput.-Aided Mol. Des. 1992, Vol 6, pp. 475-486; b) Surfaces characterizations (see Perkins et. al. Molecular surface-volume and property matching to superpose flexible dissimilar molecules. J. Comput.-Aided Mol. Des. 1995, Vol 9, 479-490); or c) Field-based properties that characterize regions of a molecule (see Willet. et. al. in Similarity searching in files of three-dimensional chemical structures: Flexible field-based searching of molecular electrostatic potentials. J. Chem. Inf. Comput. Sci. 1996, Vol 36, pp. 900-908).
Similarity searching for compounds in 3D databases is an important part of lead generation, and is commonly practiced in the drug design process (see Klebe G, Structural Alignment of Molecules, in 3D QSAR in Drug Design. Theory, Methods, and Practace, and Kearsley, S. K. et. al, An alternative method for the alignment of molecular structures: Maximizing electrostatic and steric overlap. Tetrohedron Comput. Methodol. 1990, Vol 3, pp. 615-633). It is useful in expanding the list of active compounds for a therapeutic target, finding new uses for existing compounds, getting around a competitors patent, or gaining more insight into the nature of the therapeutic target under investigation. There are, however, several ways one may conduct such searches, with no one method proven superior or universally applicable. Novel procedures are thus a current research interest.
A common problem that arises in similarity searching is that of preparing an appropriate distance metric. The problem arises when one must decide how to weight the relative importance of descriptors when evaluating whether two features are similar. The problem is compounded by the fact that different contexts warrant different scalings of descriptors. Appropriate distance metrics in one context may not be suitable for another.
The present invention embodies a novel procedure for a 3D similarity searching that is based on the alignment of heuristic property fields. The particular novelty offered by the present invention is the independence of the particular property field used, and a context dependent scaling procedure that allows a training set to scale the descriptors.