The present invention relates to mathematical descriptors of molecules and, more particularly, relates to the determination and use of three-dimensional moments of molecular property fields.
The three-dimensional characterization of molecular physical and chemical properties has been a subject of interest because of numerous procedures that attempt to correlate this characterization with molecular biological activity. The expectation is that three-dimensional molecular features should be central to the delivery and binding of a drug molecule to its targeted receptor site. Molecules with similar three-dimensional features should interact the same. Thus, if there is a first drug molecule that binds well with a targeted receptor site and a second drug molecule that has particular three-dimensional features similar to features of the first drug molecule, it is expected that the second drug molecule will also bind well with this targeted receptor site.
There are a variety of three-dimensional molecular analysis procedures that are in use and that attempt to compare two molecules for molecular similarities. Molecular analysis procedures that involve descriptions of molecular properties, which in turn can relate to the biological activity of the molecules, are often called Quantitative Structure Activity Relations (QSAR).
Some of these three-dimensional molecular analysis procedures involve the detailed enumeration of molecular properties over a set of grid points. To be able to properly compare two molecules, these procedures subsequently require an alignment or superposition step. This step attempts to align two molecules so that features of the molecules may be compared. This alignment step is required if there is a detailed characterization of a three-dimensional molecular property field, whether the property field is steric, electrostatic, or hydrophobic.
One problem with these procedures, or any procedure that requires an alignment, is that the alignment may not be correct. This can lead to an incorrect analysis. Additionally, it is very hard to determine, for complex structures, where and how to align two structures. Finally, alignment can be time consuming and numerically intensive. This is particularly true because three-dimensional rotations and translations must be performed in order to align two molecules. These translations and rotations take processing power and time. Moreover, after each translation or rotation, the similarities between the two molecules must be determined again.
Another procedure for comparing two molecules is to create a similarity matrix. While similarity matrices significantly reduce the number of descriptors compared with the grid based procedures, they still require a molecular alignment step.
There have, however, been a number of characterizations, dependent upon three-dimensional structure, that capture molecular features in ways not requiring an alignment or superposition step for the assignment of molecular similarity. These alignment-free procedures also generate a relatively small set of three-dimensional descriptors. The descriptors are essentially mathematical terms, derived from a molecule""s three-dimensional structure, that allow comparisons between molecules. These descriptors enable greater ease of statistical analysis.
Additional procedures involve molecular moments of molecules. These procedures do not require alignment. Molecular moments descriptive of some molecular property provide a small set of alignment-free descriptors that can be utilized in QSAR. For instance, there is a technique that examines moments of the shape and charge distributions of neutrally charged molecules. The molecular charge distribution is responsible for the electrostatic field external to the molecule. In this technique, a particular moment representation of the charge distribution was developed that utilized a special feature of this electrostatic field. This lead to the definition of the xe2x80x9ccenter-of-dipole,xe2x80x9d about which quadrupole descriptors had been obtained. This procedure required that the zeroth-order moment or net molecular charge was identically equal to zero. By definition, this is a condition satisfied by neutrally charged molecules. This method is explained in greater detail in U.S. Pat. No. 5,784,294, xe2x80x9cSystem and Method for Comparative Molecular Moment Analysis (COMMA),xe2x80x9d the disclosure of which is incorporated herein by reference.
While the COMMA method tremendously improved drug discovery techniques, there are some areas that could be improved in this method. One area for improvement is that the method can only be used on neutral molecules. A side effect of this is that the zeroth-order moment is zero. In fact, if the zeroth-order moment is not zero, this technique cannot be used. This makes the zeroth-order moment less effective for comparison purposes. Another area for improvement on this method is that the first-order moment is invariant. This means that, regardless of from what reference the first-order moment is calculated, the first-order moment will be the same. Additionally, this technique does not generalize well to third-order and higher-order moments.
If the zeroth-order moment of the property field does not vanish, the nature of the expansion changes. For this case, neither the first nor second-order moments are invariant with respect to the choice of the origin of the expansion. This means that the moments will change depending on selection of the origin. For such expansion, the first-order or linear moment is generally non-vanishing. Linear moments of the hydrophobic property fields of alpha helical secondary structures have provided a measure of the amphiphilicity of such helices. This has been used in identifying the helical regions of proteins that bind to the surface of biological membranes. For more information on this, see Eisenberg et al., xe2x80x9cThe helical hydrophobic moment: a measure of the amphiphilicity of a helix,xe2x80x9d Nature 1982, 299, 371-374, the disclosure of which is incorporated herein by reference.
The second-order moment in the expansion about the centroid of the molecule yields second-order moments that can be written as the elements of a Weighted Holistic Invariant Molecule (WHIM) covariance matrix. For more information on WHIM, see Todeschini et al., xe2x80x9cNew 3D Molecular Descriptors: The Whim theory and QSAR Applications,xe2x80x9d 3D QSAR in Drug Design, 1998, Vol.2, Part 3, 355; and Gancia et al., xe2x80x9cGlobal 3D-QSAR methods: MS-WHIM and autocorrelation,xe2x80x9d J. Comput-Aided Mol. Des. 2000, 14, 293-306, the disclosures of which are incorporated herein by reference. The centroid is generally calculated by determining the spatial locations of the atoms in a molecule and determining the center of mass for the molecule, with each atom assigned a mass of one. The WHIM covariance matrix yields a number of descriptors that can be used to compare molecules. The WHIM descriptors change if the centroid changes. Thus, the WHIM descriptors involve an explicit relationship between the property field and the underlying structure of the molecule. While the WHIM descriptors are beneficial, they are written in way that has no molecular shape frame of reference. The only reference is the centroid.
Thus, what is needed is a way of overcoming the problems of alignment, a zero zeroth-order moment, an invariant first-order moment, and an explicit relationship, between the property field and the underlying structure of the molecule.
Generally, the present invention provides and uses a set of descriptors of three-dimensional molecular property fields. A portion of the descriptors are calculated in such as way as to separate property fields from the underlying structure of the molecule. These descriptors are calculated through reference to a property field center. Thus, only if the property field changes, such as by moving an atom having a non-zero property value, will the descriptors need to be recalculated.
Additionally, a portion of the descriptors do relate to the underlying molecular structure, only these descriptors contain information from more than one reference point. In particular, a displacement is determined between a property field center and the centroid of a molecule. This descriptor contains information from two reference points. Furthermore, components of a property field are mapped onto a principal geometric frame, which essentially references the property field to the molecular shape. These descriptors thus contain information relating to the geometric frame of the molecule.
A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings.