The number of known chemical compounds is vast, and the number is increasing constantly because methods for isolating and synthesizing molecules continue to improve. For instance, chemists are now able to employ the techniques of combinatorial chemistry to synthesize thousands of different chemical compounds at one time using a mixture of only a few interchangeable chemical building blocks. Furthermore, chemists are now able to use combinatorial computer models to generate large numbers of chemical structures that may in theory be synthesized.
While there are many chemical compounds, only a relative few of those compounds may exhibit a particular desirable property, such as pharmaceutical activity. Random testing of known compounds to find those that show pharmaceutical activity is very expensive and time-consuming. Similarly, there is a need to screen compounds for toxicity, so that rational decisions can be made regarding the use and regulation of compounds that have toxic potential. At present, only a fraction of known compounds have been thoroughly tested for their toxicological and potential therapeutic properties. To address this problem, scientists have developed methods which attempt to predict which compounds are likely to exhibit a particular property.
Methods for predicting the properties of chemical compounds are generally based upon the related observations that the structure of a compound is related to its biological, chemical, and physical properties, and that compounds of similar structure exhibit similar properties. These observations have been used to search for new compounds exhibiting a particular property. For instance, a benzene ring is present in both acetaminophen and salicylamide, both of which are analgesics. Although incorporating a benzene ring into a new molecule increases the likelihood that it too will exhibit analgesic activity, this deduction only narrows the compounds to be tested. This approach is still basically one of trial and error, because many compounds with a benzene ring are not analgesics. Moreover, analgesics without a benzene ring will be missed in the search.
Quantitative structure-property relationships and quantitative structure-activity relationships (collectively QSAR) are attempts to quantify the observed relationships between the structure of chemical compounds and the extent to which those compounds exhibit certain properties. For instance, a QSAR might attempt to quantify how the analgesic activity of known analgesics that contain a benzene ring (such as acetaminophen and salicyamide) depends upon the number and identity of substituents on their benzene rings. Once established, such a QSAR could be used to predict the analgesic activity of other compounds that contain benzene rings, and identify those compounds that warrant further investigation as analgesics based on their predicted analgesic activity.
In the terminology associated with the QSAR method, the property for which a prediction is sought, such as analgesic activity, is termed the “endpoint.” In general the endpoint may be any measurable biological, chemical or physical property.
To establish a QSAR, endpoint values are obtained for a set of compounds and a correlation is then sought between the endpoint values and some measure(s) of structure available for each of the compounds. The measures used to describe or reflect the structure of the compounds for which a correlation is sought are termed structure descriptors. Structure descriptors may be defined directly with reference to the known structures of the compounds, or may indirectly reflect the structure through a property of the molecule that is sensitive to changes in structure. For example, an investigator might try to correlate the analgesic activity of compounds that contain a benzene ring with either a direct measure of the structure, such as the number of hydroxyl groups attached to the benzene ring, or an indirect measure of the structure of the compounds, like water solubility. If the direct measure is chosen, the attempted correlation could only include those compounds with hydroxyl groups on the benzene ring, while the indirect measure is more general and could be used to include all benzene ring containing compounds in the attempted correlation.
The endpoint data and the structure descriptor(s) for the set of compounds that are chosen to establish a QSAR are termed the training set. In general, the reliability of a QSAR increases as the number of compounds in the training set increases. If possible, the training set desirably includes compounds that exhibit a wide range of endpoint values and possess diverse structures.
If the endpoint and structure descriptor(s) are sufficiently correlated, a mathematical or graphical representation may be obtained. For example, the growth inhibition of certain gram negative bacteria by aromatic amines is correlated in a linear fashion with the logarithm of the octanol-water partition coefficient, and the correlation indicates that as the aromatic amines become more hydrophobic they are more likely to inhibit the growth of these bacteria (Hansch and Leo, Exploring QSAR: Fundamentals and Applications in Chemistry and Biology, American Chemical Society, 1995, p. 416).
The QSAR representation may be used to predict endpoint values for other compounds from their structure descriptor(s), and the reliability of a QSAR may be tested using a validation set of data. A validation set includes structural and endpoint data for compounds that were not part of the training set. The validation set desirably exhibits a diversity of endpoint values and structures that is commensurate with the training set. The QSAR is tested by how reliably it predicts endpoint data for the validation set compounds from the validation set structural data. For example, α-naphthylamine, an aromatic amine, is about two times less toxic than predicted by its octanol-water partition coefficient, indicating that the simple linear relationship described above is not always reliable and that some specific interaction is responsible for its behavior.
The Hammet equation is another example of a simple QSAR that relates a single structure descriptor (in this case, derived directly from the structure of the compounds) to an endpoint (in this case, the equilibrium or rate constant for a particular type of reaction). The Hammet equation is a linear equation of the form:ln Kx=ρσ+ln KHThe electronic parameter (σ) is a measure of the ability of a group of atoms (typically a substituent at a particular structural position) to donate (negative value) or withdraw (positive value) electron density to or from the reaction center of the molecule to which it is attached. The slope parameter (ρ) is a measure of the sensitivity of the reaction to the withdrawal or release of electron density, and is constant for a particular type of reaction. Kx and KH are, respectively, the equilibrium constant for reaction of the substituted molecule and the equilibrium constant for reaction of the unsubstituted parent molecule. The reaction rate constants, kx and kH, may replace the equilibrium constants. A plot of In Kx (or ln kx) versus σ is often linear and may be used, for example, to predict the equilibrium constant (or rate constant) for other structurally similar compounds from only the σ values of their substituents.
The Hammet parameter σ is an example of a structure descriptor that is derived from experimental data for compounds of known structure. As described above, σ is a measure of the electronic properties of a substituent. Specifically σ measures the ability of a substituent at a particular position to donate or withdraw electron density and may be defined according to the following equation:σ=log Kx−log KHwhere Kx and KH are, respectively, the acid ionization constants for an aromatically substituted benzoic acid derivative and for unsubstituted benzoic acid. The parameter σ for amino group substitution in a position para to the reaction center is determined, for example, from the measured acid ionization constants of p-aminobenzoic acid and benzoic acid using the above equation. Although depending on the position of substitution, electron-withdrawing substituents generally tend to stabilize the anion formed when benzoic acid ionizes, making Kx larger than KH and σ positive. Conversely, electron-donating substituents tend to destabilize the anion and generally have negative σ values that are also dependent upon the position of substitution. Acid dissociation data for benzoic acid derivatives with various substituents in ortho, meta or para positions have been used to generate σ values for many substituents in these structural positions. Unfortunately, the σ values derived for substituents in this manner are typically not valid for multiply substituted molecules because σ values may not be additive.
Due to the inability of σ values to accurately reflect the cumulative electronic effect of multiple substituents, alternative structure descriptors have been investigated. For example, Ijzerman et al. derived structure descriptors from the assigned 13C NMR shifts of the aromatic carbons in a family of N-tert-butylphenylethanolamines (Ijzerman et al., J. Med. Chem., 29: 549-554, 1986). Ijzerman et al. subtracted the value of the chemical shift for the carbon atoms in benzene (128.5 ppm) from the assigned chemical shift values for each of the carbons in the benzene ring of each of the substituted N-tert-butylphenylethanolamines to yield sets of structure descriptors for the compounds. A correlation was found between certain linear combinations of these structure descriptors and the β2-andrenoreceptor intrinsic activity of the substituted N-tert-butylphenylethanolamines. Since these structure descriptors are defined in a manner similar to the Hammet σ parameter, they necessarily rely upon knowledge of structure beforehand. Also, the scope of usefulness for this type of structure descriptor is limited to the compounds containing a benzene ring. Furthermore, these structure descriptors ignore the aliphatic carbon atoms of the molecules, and their contribution to the intrinsic activity (a valid assumption only when comparing the intrinsic activities of compounds that differ minimally in structure). Additionally, the correlations of Ijzerman et al. have been questioned because the structure descriptors they derived from the chemical shifts of the aromatic carbons in these compounds are themselves highly correlated, and their use leads to the unrealistic result that substitution at one ortho position of the benzene ring affects intrinsic activity in a manner opposite to substitution at the other ortho position (see, Cramer et al., Quant. Struct.-Act. Relat., 7: 18-25, 1988).
A similar attempt to utilize spectroscopically derived parameters as structure descriptors was made by Nishikawa and Tori (Nishikawa and Tori, J. Med. Chem., 27: 1657-1663, 1984). The reactivity of the β-lactam ring of 3-substituted and 3-methylene substituted cephalosporins toward alkaline hydrolysis showed a correlation with changes in the 13C NMR chemical shift of selected carbon atoms, and with changes in the infrared (IR) stretching frequency observed for the β-lactam carbon-oxygen bond. Selection of these measures was guided by knowledge of the structure of these compounds, and the role β-lactam reactivity plays in their antibiotic activity. Like the method of Ijzerman et al., the method of Nishikawa defines its structure descriptors in terms based upon a reference compound of known structure and utilizes spectral features assigned to particular atoms of known structures. Furthermore, the method, through its definition of the structure descriptors, is similarly restricted to narrow classes of compounds that differ only by simple substitutions.
Structure descriptors may be obtained theoretically or experimentally. Dipole moments and lowest unoccupied molecular orbital (LUMO) energies are examples of structure descriptors that may be obtained from theoretical quantum mechanical calculations on known structures. Experimental data correlated with a specific structural feature, common to a set of closely related compounds, may also be used to generate measures of structure (for example, the Hammet σ parameter). Bulk experimental measures of structure, such as partition coefficients (as a measure of polarity) and molar refractivities (as a measure of steric size) can also be utilized as structure descriptors. Structure descriptors based upon bulk physical properties have the advantage that they do not require structural knowledge beforehand, however such descriptors lack specificity. For example, compounds with vastly different biological activities may have very similar partition coefficients.
A particularly important type of biological QSAR uses, as an endpoint, the ability of one molecule to bind to another molecule. An example of such an endpoint would be the ability of a series of molecules to act as ligands for a regulatory protein, such as a hormone receptor. In recognition of the important role played by three-dimensional structure, especially for biochemical reactions where molecular recognition is an important factor, the field of 3D-QSAR has emerged. The 3D-QSAR technique is exemplified by the Comparative Molecular Field Analysis (CoMFA) method of Cramer and Wold (U.S. Pat. No. 5,025,388). The CoMFA method attempts to correlate the three-dimensional steric and electrostatic properties of a series of molecules with their relative endpoint values. The steric and electrostatic properties of a molecule are obtained from quantum mechanical or electrostatic calculations based upon known molecular structures and serve as structure descriptors. The calculations, in effect, map the electron density distribution around a molecule to create a 3-D picture of its steric and electrostatic fields (collectively, the molecular field). Those steric features (e.g., bulky substituents) and/or electrostatic properties (e.g., a strong molecular dipole) that are most important in determining endpoint values are revealed by comparing the molecular fields of the molecules in the training set to their endpoints. The advantage of calculated 3D-QSAR molecular field structure descriptors is that unlike structure descriptors referenced to a certain structural feature, molecular field structure descriptors enable the identification of structurally dissimilar molecules that have similar steric and electrostatic properties.
However, a particular problem associated with the CoMFA method and other 3D-QSAR techniques is that these methods generally require some assumptions about how molecules orient themselves relative to each other upon binding. Selecting the common alignment of a training set containing diverse structures may be problematic, leading to incorrect predictions of binding ability. Furthermore, QSAR based upon quantum mechanical or electrostatic potential calculations also suffers to some extent from the inaccuracy of the calculations themselves. These calculations are by nature approximate, and become less and less reliable as molecular size increases. In addition, the effects of solvation on the quantum mechanical properties of a molecule are often difficult to calculate and typically are ignored. Finally, these calculations are time-consuming and require knowledge of the molecular structures beforehand.
Isolation and structure elucidation of molecules is an expensive and time-consuming process that is desirably avoided, especially when screening large numbers of molecules (e.g., those generated in a combinatorial library). Furthermore, many molecular structures are proprietary, so there may be reluctance on the part of investigators to share training set structural and endpoint data with competitors in a common effort to improve the reliability of predictive QSARs.
It would be advantageous to have QSAR methods that utilize structure descriptors which may be obtained without knowledge of or disclosure of molecular structures. Such structure descriptors could be more specific than bulk physical properties, and enable the QSAR methods to differentiate dissimilar molecular structures that exhibit similar bulk physical properties. Furthermore, the structure descriptors could avoid definition with reference to a particular structural element (e.g., the Hammet c parameter), and therefore be useful for establishing QSARs for more structurally diverse sets of compounds. Additionally, such QSAR methods could eliminate the costly step of structure elucidation and obviate the need for isolation of the subject compounds.
QSAR methods that utilize structure descriptors that are inherently reflective of the steric and electrostatic properties of molecules and the effects of solvation thereon are also needed. Such methods would also desirably eliminate the necessity for assumptions regarding molecular orientation in relationship to intermolecular binding and obviate the need to rely upon approximate calculations.