1. Field of the Invention
This invention relates generally to a method for performing the powerful CoMFA shape analysis methodology on certain classes of molecular structures, and, in particular, on molecular structures which may be decomposed/viewed as assemblies of discrete identifiable subunits such as those structures formed by combinatorial synthesis. In this invention, alignment of the molecular subunits for CoMFA analysis is achieved by a rule based procedure. The fields of the aligned subunits used in the CoMFA can be used to search a Virtual Library of precomputed fields for other subunits accessible in the chemical universe which have a similar shape and could be substituted as subunits in the molecular structures for which the CoMFA was derived. The likely activity for the molecules assembled using the subunits identified in the Virtual Library can be predicted using the CoMFA derived coefficients.
2. Description of Related Art
Since its introduction approximately a decade ago, Comparative Molecular Field Analysis (CoMFA) has become recognized1 as a superior 3D-QSAR methodology. One recent reference2 notes that from the years 1989 to 2000 over 5,000 publications are indexed using the keyword “CoMFA.” A wide variety of problems in medicinal chemistry have been the subject of CoMFA modelling. CoMFA, like earlier 3D-QSAR approaches, represents a relevant measured molecular parameter for each molecule (typically biological affinity when used in medicinal chemistry) as a linear combination of descriptors which reflect the three dimensional molecular shape. Typically, several molecules in a series possessing similar activity, but differing in molecular share, are analyzed using CoMFA to determine those shape features associated with increased or decreased activity. Thus, CoMFA correlates the shapes of molecules with their (biological) activities. A full description of the CoMFA methodology is provided in U.S. Pat. No. 5,025,388 and U.S. Pat. No. 5,307,287.
In CoMFA, a quantitative description of the shape of a molecule is derived from the steric and electrostatic interaction energies between a test probe and each of the atoms comprising the molecule. Each molecule in the activity series is placed by the computer implemented methodology in a three dimensional lattice and the interaction energies determined as the probe is placed at all intersections of the lattice. The resulting interaction energies for each grid position are entered into column positions in a row of a data table associated with the measured parameter (activity) of each molecule. This procedure is repeated for all molecules in a series and is schematically illustrated in FIG. 1 of U.S. Pat. Nos. 5,025,388 and 5,307,287 which Figure is included in the present patent document as FIG. 1. After the data table is completed, Partial Least Squares (PLS) analysis using a cyclic cross-validation procedure is utilized to extract a set of coefficients for each column position (lattice point) that best reflects that position's contribution to the measured activity. The PLS procedure is schematically illustrated in FIG. 2 of U.S. Pat. Nos. 5,025,388 and 5,307,287 which Figure is included in the present patent document as FIG. 2.
An important consequence of the CoMFA method is that the likely activity of a molecule not included in the CoMFA model can be predicted using the column coefficients derived from the CoMFA analysis. The molecule of interest is aligned and positioned in the lattice, interaction energies are determined, and those interaction energies are placed in their respective columns. The predicted activity is then calculated by multiplying each interaction energy by the coefficients derived from the original CoMFA model data table:VPREDICTED=b+A001S1(001)+A002S1(002)+ . . . ANS1(N)+a001E1(001)+a002E1(002)+ . . . aNE1(N)                where VPREDICTED is the predicted activity for the proposed molecule; b is the intercept for the CoMFA model; A_ and a_ are the coefficients of the steric and electrostatic terms which reflect the relative contribution of each spatial location, the subscripts indicating both different coefficient values and the lattice positions with which the values are associated; Sx(N) and Ex(N) are the steric and electrostatic interaction energies calculated at lattice position N (where N ranges from 1 to the maximum number of lattice intersection points) determined for the proposed molecule.It is important to note that CoMFA does not tell a chemist/user what alterations to the molecular structure to test. CoMFA only indicates those volumes around the known structures which are associated with increased or decreased activity. The chemist/user decides what changes to the molecular structure to try. The results of the CoMFA analysis (column coefficient values) can then be used to predict the likely activity for the shape of the molecule specified by the chemist/user.        
As noted in the extensive discussion in the cited patents, the alignments of a series of molecules in the three dimensional lattice is critical to obtaining good results. Two aspects of the alignment are crucial. First, recognizing that even for the same molecule a slight shift in its position in the lattice will produce different interaction energies at different lattice locations, it is important that similar parts of similar molecules must be located at identical locations so as not to introduce meaningless differences. Second, it is important that, to the maximum extent possible, the major space occupying features of each molecule should be aligned with each other. In this manner the CoMFA methodology can distinguish the three dimensional features which are relevant to the observed activity. An extensive literature has grown up describing different alignment techniques to employ with the CoMFA methodology.
As combinatorial synthesis techniques were developed over the past few years to generate libraries of compounds which could be screened against different (primarily biological) targets, a similar alignment problem arose for those trying to design the libraries. Depending on the requirements, a library of similar compounds might be desired having similar activities in a specific assay or a library of dissimilar compounds might be desired which could be used to look for compounds which might have an activity in a chosen assay. The problem was how to choose the molecules before synthesis so that a great deal of time and money would not be wasted on synthesizing and assaying compounds which did not have a high probability of providing useful information. Over the years a variety of molecular structural metrics had been devised with which to characterize molecular structures. However, in the absence of any methodology which would indicate which, if any, of the metrics behaved as desired, use of the metrics to design libraries was not much better than a random selection process.
In U.S. Pat. No. 6,185,506 a method of validating molecular metrics is taught. The Patterson Plot methodology is based on the similarity principle which requires that any valid descriptor must have a neighborhood property; that is, the descriptor must meet the similarity principle's constraint that it measure the chemical universe in such a way that similar structures (as defined by the descriptor) have substantially similar properties (activities). This can also be stated to require that, within some radius in descriptor space of any given molecule possessing some property, there should be a high probability that other molecules found within that radius will also have the same property. Only descriptors which have the neighborhood property are “valid.” Here “validity” is based on a high probability, not a certainty, that compounds similar in descriptor space will have similar activity. The Patterson Plot validation methodology can be applied to any molecular structural descriptor. As a consequence of the metric validation methodology, a “neighborhood radius” for each type of descriptor is defined.
In combinatorial syntheses, two or more reactants are combined to yield a product molecule. In the simplest case, reactant A and reactant B are joined by a common bond as in the molecule: A-B as shown in FIG. 3(a). In a slightly more complex case as shown in FIG. 3(e), reactant R1 and reactant R2 are joined by separate bonds to a common core or scaffold structure: R1-CORE-R2. In more complex cases as shown in FIG. 3(g), three or more reactants R1, R2, and R3 may be individually bonded to a common core. For library design, a metric was needed which would validly characterize combinatorially derived molecules. A further problem which was presented was how to define a metric that could take into account the fact that reactants may assume many conformations both before and after chemical combination. The solution was to define a rule based procedure for aligning the reactants which was uniformly applied to every reactant. [As will be more fully described below, it is the fragments derived from reactants which are aligned by this procedure.] The particular rule base alignment procedure taught in U.S. Pat. No. 6,185,506 is referred to as the “topomeric” alignment. The procedure specifies a unique orientation in space as well as a similar conformation for each reactant. While the topomeric alignment of any given reactant may or may not resemble the conformation the reactant might naturally assume when binding to a receptor as part of a ligand, the topomeric generated conformation turned out to be a valid alignment approach.
In particular, when a metric is defined by the steric interaction fields around each topomerically aligned reactant fragment in a three dimensional lattice, the resulting metric was shown to be valid by the Patterson Plot methodology by application across a wide range of biological activities. Use of a metric consisting of the steric fields about topomerically aligned fragments enabled the computer implemented virtual design of molecular libraries having either similar structures or diverse structures. Use of this metric enabled an estimation of the similarity of combinatorially assembled molecules. Molecules with similar structures within the metric neighborhood radius should have similar biological properties. Molecules with structures outside the metric neighborhood radius should not have highly similar properties.
Initially, the metric consisting of the steric fields about topomerically aligned fragments was used to design libraries involving few starting reactants and cores. However, it was soon discovered that searches through vast chemical spaces of molecules which could be combinatorially assembled could be achieved. The construction and searching of such a vast library (referred to as the “Virtual Library”) is taught in U.S. Pat. No. 6,240,374. Using metrics validated by the Patterson Plot methodology, it is possible to precompute the metric properties of the various component parts of molecules which could be combined in a combinatorial synthesis. A combination of the metric properties of the component parts yields a valid estimation of the properties of the resulting whole molecule. Potential combinatorially derived molecules can then be selected for similarity or dissimilarity before they are synthesized. At present the Virtual Library employed by the inventors contains precomputed metric data on sufficient component parts to characterize tens of trillions of possible combinatorially derivable molecules. The structure of the Virtual Library permits any characterizing data related to each component part to be associated with that part and searched for independently of any other data. For instance, in addition to the characterizing metric values, information on suppliers, cost, possible routes of synthesis for the molecules incorporating the component part, properties affecting bio-availability, etc. may all be associated with the component part in the Virtual Library by virtue of the manner of its construction.
One very important aspect of the characterization of the component parts of the Virtual Library with the metric consisting of the steric fields about topomerically aligned fragments is the ability to search through the vast chemical space of the Virtual Library to identify possible molecules which have a high probability of having the same activities as a molecule of interest. In addition, since the overall shape similarity (similarity in steric fields) is searched, it is possible that molecules arising from different chemistries may well be found to possess sufficiently similar shapes to display activity at the same target. In practice, searches of the Virtual Library for similarly shaped component parts, and molecules derived therefrom, amongst the trillions of molecules possible can be accomplished in relatively short time. Depending on where the cut-off level for identifying similarly shaped fragments is set, searches of a chemical space of billions of possibilities may take only a few hours.
The use of the metric consisting of the steric fields about topomerically aligned fragments has proven to be very fruitful in the design of combinatorial libraries and in searching a vast combinatorially accessible molecular structural universe. However, due to the inherently artificial structures generated by the rule based topomeric alignment procedure, further use outside the combinatorial design field has not been previously implemented. In particular, nothing in the prior art of CoMFA alignments suggests that such artificially rule based generated molecular shapes would be useful or valid in generating a CoMFA model.