That potent and selective enzyme inhibitors must fit snugly into their respective binding sites has been recognized since Emil Fischer first formulated his “Lock & Key” hypothesis in 1895, with the idea having been extended since to interactions between agonists and antagonists with their target receptors. Researchers subsequently came to appreciate that complementary shapes alone are not enough to confer potency—that the natures of the juxtaposed surfaces must also be more or less complementary. This added dimension led to the realization that ligands that bind tightly to a particular target often share groups of localized interaction features—hydrogen bond donors and acceptors, as well as hydrophobic and ionized groups. Moreover, it was recognized that those groups of localized interaction features tend to be arrayed in a specific pattern in three-dimensional space, provided the ligand in question has been put into its binding conformation1. This concept was widely disseminated after 1970. The three dimensional spatial distribution of the localized interaction features has been commonly referred to as a pharmacophore or pharmacophore pattern. Subsequently, software was designed to find pharmacophore patterns shared among groups of flexible analog molecules active against a common target2 and to use such patterns as 3-D search queries to identify additional candidate molecules for testing3 and further drug development.
Until recently, pharmacophore identification and searching have focused primarily on such feature complementarity, and to some degree the broader notion of shape complementarity underlying the lock-and-key hypothesis has been set aside. Progress in the area of computer implemented in silico docking made in the last several years, however, has served to re-emphasize the importance of minimizing surface mismatches between ligands and their binding sites.4 In the area of pharmacophore searching, this has resulted in the inclusion of exclusion volume constraints into queries derived from X-ray crystallographic data.5,6 The same approach can, in principle, be used for binding site structures based on homology modelling, though such structures are rarely precisely enough defined for this to be a practical approach.
However, exclusion volumes suffer from some rather severe limitations in their ability to encapsulate steric information. In particular, they represent intrinsically negative boundary constraints. Worse yet, optimal solutions are represented by snug fits, where ligand atoms lie as close as possible to an exclusion surface approximated by an ensemble of exclusion volumes, rather than as far as possible away from those surfaces. Unfortunately, such sharply bounded negative constraints are ill-suited for incorporation into efficient search methods based on genetic algorithms, steepest descent, simplex or directed tweak methods.7 
In addition, exclusion volumes do not adequately reflect the plasticity of most binding sites, where some kind of partial match constraint would more appropriately reflect the observed dynamic nature of the interactions involved. The shape of the binding site “lock” usually changes to a greater or lesser extent depending on the exact nature of the ligand “key.”
Pharmacophore Multiplets:
Pharmacophore multiplet fingerprints were originally developed for assessing molecular diversity8,9 and were subsequently applied to assess molecular similarity.10 Such fingerprints capture the spatial relationship between features by decomposing the complete pharmacophoric pattern of a molecule into its constituent k-tuples of features—an ensemble of pairs, triplets and quartets, where k=2, 3 or 4, respectively. For f features, the maximum number Mmax of such constituent elements (multiplets) is given by:
      M    max    =                    f        !                              k          !                ⁢                              (                          f              -              k                        )                    !                      .  
By characterizing each possible multiplet as a colored graph (i.e., by the feature types involved (vertex colors) and the binned inter-feature distances (edge lengths)) it is possible to construct a bitstring (fingerprint) in which a particular bit is 1 if the corresponding pharmacophore multiplet was found in a molecular conformation of interest and is 0 otherwise. Comparing such fingerprints to each other then provides a quantitative measure of the pharmacophoric similarity of the molecules from which the fingerprints were generated. The number of distinct multiplets actually found in a given structure may be (and usually is) less than Mmax due to symmetry, because of limitations in the coarseness of the feature type categorization or because the granularity of edge length binning used is finite, or some combination thereof.
In early work on pharmacophoric diversity, the principal focus was on identifying unique pharmacophoric elements for preferential incorporation into combinatorial library designs.11 For such applications, it makes sense to examine multiple conformers for each molecule of interest and set a fingerprint bit if the corresponding multiplet is found in any conformation; this is mathematically equivalent to applying a Boolean OR (union) across the fingerprint obtained for each conformation. For similarity applications, better discrimination can be obtained in some cases by using count vectors rather than bitstring fingerprints; in such applications, each count can reflect the number of conformations in which each multiplet is found. Alternatively, the count can correspond to the number of occurrences in each conformation summed across all conformations considered. In general it is more efficient to generate and use compressed bitstrings (bitmaps) or compressed count vectors.
By examining a set of bitmaps derived from a collection of effective ligands which bind to a common target, it is possible to pick out bits corresponding to highly discriminating pharmacophore multiplets shared by an unexpectedly large fraction of the ligands. The bits set by these shared multiplets can then be used to construct an hypothesis bitmap that can be useful in screening virtual databases for ligands whose bitmaps are especially similar to the hypothesis.22 By construction, such hypotheses readily encompass partial match constraints, since fingerprints can be similar even if not identical. It is very unlikely that any candidate molecule/potential ligand will present all multiplets encoded in the query; it is enough that most of them are found in a potential ligand, and that the number of extraneous multiplets not be too large.
Applicants are unaware of any scientific literature that describes or anticipates the invention disclosed in this patent document. Additionally, applicants searched the USPTO database for abstracts containing the terms “molecular” and “shape”. A total of 386 hits were examined.
Silverman in U.S. Pat. No. 6,671,626 (Determination and use of three-dimensional moments of molecular property fields) and D. E. Platt & B. D. Silverman in U.S. Pat. No. 5,784,294 (System and method for comparative molecular moment analysis [CoMMA]} deal with the use of steric molecular moments to characterize molecules. These make use of every atom, hydrogen and non-hydrogen, in the molecule of interest, and characterize their distribution in space by a series of moments calculated from the full aggregate. The terms in such a characterization contain little or no local information about the structure and are not amenable to dynamic interpretation, i.e., cannot be averaged usefully across the ensemble of possible conformations that a flexible molecule can take on. The extracted moments, like harmonics, are aggregate properties derived from the whole molecule.
The two CoMFA patents of Cramer and Wold, U.S. Pat. Nos. 5,307,287 and 5,025,388 (Comparative molecular field analysis), involve point-by-point comparisons between molecular fields calculated on a Cartesian lattice into which each molecule has been placed. Again, such comparisons are only meaningful when a single conformation is specified for each molecule and both molecules of interest must be embedded in a common frame of reference. Related techniques have subsequently been described for identifying canonical conformations and orientations for generalizing such comparisons,12,13,14 but these cannot account for molecular flexibility and are not based on molecular connectivity.
In U.S. Pat. No. 6,182,016 (Molecular classification for property prediction) Liang and Edelsbrunner teach a topologically based approach that is primarily local in nature. The technology they describe involves characterizing a Voronoi partition15 of each molecule by applying a Delaunay triangulation16 to the heavy atoms in that molecule. This transforms each molecule into an assemblage of terahedra (mostly from quaternary carbon, nitrogen, phosphorous and sulfur), triangles (mostly tertiary carbon, nitrogen, and sulfur), and isolated edges (from other bonded heavy atom pairs). The invention's descriptor is then constructed as a list of the topological elements found in the molecule of interest and their frequencies. When greater resolution is desired, properties are attached to the toplogical elements by indicating the elemental types of the various atoms comprising them. Each topological element can also be characterized by the types of topological elements adjoined to them in the Delaunay triangulation and how they are adjoined; two tetrahedra, for example, can be joined corner to corner, edge to edge, or face to face.
The invention described in this patent document does take topological factors into account when deciding which heavy atoms and combinations thereof should be used to define steric features, but unlike Lang and Edelsbrunner, it does not utilize local toplogical elements in constructing its descriptors. Rather, the spatial relationship between the steric features is characterized in terms of all triangles (triplets) or tetrahedra (quartets) formed among them.
The Liang and Edelsbrunner patent also mentions extension to “groups” such as amino acid residues, but does not provide any systematic way to define such groups or how the Voronoi partition is defined in such a case.
Useful as the application of multiplets was to localized pharmocophoric interaction features, the problem remained in the prior art of how to incorporate the broader notion of shape complementarity into multiplets; that is, how to encapsulate the steric information about the whole 3-D structure of a molecule into a steric multiplet which could then be used for searching and comparison purposes. Previous attempts to incorporate steric definitions utilizing all atoms in a molecule or all heavy atoms not assigned to a pharmacophore, did not solve the problem. The present invention solves this problem through the development of a method by which multiplets encompass steric features. The inventors of the method disclosed in this patent document have determined that implementing a useful steric multiplets methodology is critically dependent upon finding an effective way of defining the steric features. The feature definitions and application to multiplet methodology constitute the basis of the present invention.