The number of known chemical compounds is vast and increasing constantly because methods for isolating and synthesizing molecules continue to improve. For instance, chemists are now able to employ the techniques of combinatorial chemistry to synthesize thousands of different chemical compounds, at once, using a mixture of only a few interchangeable chemical building blocks. Furthermore, chemists are now able to use combinatorial computer models to generate large numbers of chemical structures in silico.
Methods for predicting the properties of chemical compounds are generally based upon the related observations that the structure of a compound is related to its biological, chemical, and physical properties, and that compounds of similar structure exhibit similar properties. These observations are used to search for new compounds exhibiting a particular property. For example, dimethystilbesterol and estradiol have a phenol ring and both bind strongly to the estrogen receptor. From this observation, a reasonable deduction is that the presence of a phenol ring in a molecule increases the likelihood that the molecule will bind to the estrogen receptor. The deduction, which is a simple structure-activity relationship (SAR), narrows the scope of the search, but identification of estrogen receptor binders amongst phenolic compounds remains a matter of trial and error. Furthermore, compounds that bind to the estrogen receptor but do not contain a phenol ring are missed.
Quantitative structure-property relationships and quantitative structure-activity relationships (collectively QSAR) are attempts to quantify the observed relationships between the structure of chemical compounds and the magnitude of their properties. The property for which a model is sought is termed the “endpoint.” In general, the endpoint may be any measurable biological, chemical or physical property. QSAR models are established by correlating the endpoint values of a group of compounds with some measure(s) of structure available for each of the compounds. The measure(s) used to describe or reflect structures are termed descriptors. Descriptors may reflect structure directly. For example, useful direct QSAR descriptors include fragments of structure (i.e. particular groups of atoms) which appear amongst the compounds of interest. Descriptors also may indirectly reflect structure. Indirect descriptors are useful because they may be measured for compounds of unknown structure. Indirect descriptors include physical properties that vary with molecular structure, for example, partition coefficients. Structure descriptors are obtained for a group of molecules exhibiting a range of endpoint values (called the training set) and a correlation is made between the descriptors and the endpoint. In some instances only a few of the descriptors shared amongst the training set of molecules will be important for determining a particular property. In others, a large number of descriptors may be required to adequately describe the dependence of a property on molecular structure. If one or more descriptors are sufficiently correlated with the endpoint, a mathematical or graphical QSAR representation of the dependence of the endpoint on the descriptor values can be obtained. Descriptor values for a compound of unknown endpoint may then be used along with the QSAR representation to predict an endpoint for the compound.
Qualitative spectral data-activity relationships (SDAR) and quantitative spectral data-activity relationships (QSDAR) are derived using spectral data as molecular descriptors. Spectrometric data-activity relationships directly correlate patterns of spectral data with molecular properties, rather than correlating structural features with molecular properties. Spectral data reflects the quantum mechanical states of the atoms and/or groups of atoms in a molecule and can be highly sensitive to changes in structure. For this reason, SDAR and QSDAR models reliably describe a wide variety of molecular properties (see, for example, Beger and Wilkes, “Developing 13C NMR Quantitative Spectrometric Data-Activity Relationship (QSDAR) Models of Steroid Binding to the Corticosteroid Binding Globulin,” J. Comput.-Aided Mol. Design (2001, in press), Beger and Wilkes, “Models of Polychlorinated Dibenzodioxins, Dibenzofurans, and Biphenyls Binding Affinity to the Aryl Hydrocarbon Receptor Developed using 13C NMR data,” J. Chem. Inf. Comput. Sci. (2001, in press), Beger et al., “13C NMR and EI Mass Spectrometric Data-Activity Relationship (SDAR) Model of Estrogen Receptor Binding,” Toxicol. Appl. Pharmacol., 169: 17-25 (2000), Beger et al., “The Use of 13C NMR Spectrometric Data to produce a Predictive Model of Estrogen Receptor Binding Activity,” J. Chem. Inf. Comput. Sci., 41: 219-224, (2001), Beger et al., “Producing 13C NMR, Infrared Absorption and EI Mass spectrometric Data Monodechlorination Models of Chlorobenzenes, Chlorophenols, and Chloroanilines,” J. Chem. Inf. Comput. Sci., 40: 1449-1455 (2000), and U.S. patent application Ser. No. 09/629,557, each of which is incorporated by reference herein.). SDAR and QSDAR methods are based in part upon a correlation between a molecular property and the presence, absence, and/or strength of spectral signals at particular energies. Therefore, since a number of diverse structures can give rise to similar spectral features, SDAR and QSDAR methods permit modeling of molecular properties amongst groups of structurally dissimilar molecules. Furthermore, SDAR and QSDAR methods do not require prior knowledge of molecular structure, since spectra may be just as conveniently recorded for unknown compounds as they can be for known compounds. On the other hand, SDAR and QSDAR methods based on experimental spectra may be limited where the spectral features correlated with the endpoint are not readily distinguishable from noise.
A successful and widely used approach to modeling structure-activity relationships in silico is to correlate molecular properties with calculated descriptions of the three-dimensional (3D) arrangements of atoms. Three-dimensional descriptions are especially important for modeling intermolecular binding properties such as drug-receptor interactions, where contact between drug and target molecule may take place in a specific pattern over a significant portion of the three-dimensional molecular surface of the drug. An exemplary 3D-QSAR technique is the Comparative Molecular Field Analysis (CoMFA) method of Cramer and Wold (U.S. Pat. No. 5,025,388). The CoMFA method is based upon quantum mechanical calculations of the steric and electrostatic properties of molecules from their known structures. The calculations, in effect, map the electron density distribution around a molecule to create a 3-D picture of its steric and electrostatic fields (collectively, the molecular field). The 3-D molecular field maps are used as descriptors in a structure-activity relationship. Successful CoMFA models may be used to visualize and identify molecular features (for example, steric features due to bulky groups of atoms and electrostatic features such as the direction and magnitude of the molecular dipole) that are important for a particular drug-target interaction. Since a particular molecular field pattern may be the result of a number of underlying molecular structures, molecular field descriptors are more general than the actual structures and permit identification of structurally dissimilar molecules that exhibit similar properties by virtue of their similar 3-D molecular fields. On the other hand, CoMFA methods and other known 3D-QSAR techniques generally require making assumptions about how molecules orient themselves relative to each other upon binding. Selecting the correct common alignment of a training set containing diverse structures may be problematic, leading, for example, to incorrect predictions of binding ability. Furthermore, quantum mechanical molecular field calculations are computationally intensive.
A spectral data-activity method that attempts to combine the quantum mechanical information inherent in spectral data with a description of molecular structure is the comparative structurally assigned spectral analysis (CoSASA) method. In the CoSASA method, only the spectral features exhibited by the atoms of a structural moiety that is shared amongst a group of molecules (e.g. a particular ring system) are used as descriptors. For example, Beger and Wilkes used the assigned 13C NMR chemical shifts of the steroid ring atoms to model steroid binding affinities to the aromatase enzyme and the corticosteroid binding globulin (see, Beger and Wilkes, “13C NMR Quantitative Spectrometric Data-activity Relationship (QSDAR) Models of Steroid Binding to the Aromatase Enzyme,” J. Chem. Inf. Comput. Sci., 41: 1360-1366 (2001) and Beger and Wilkes, “Developing 13C NMR Quantitative Spectrometric Data-activity Relationship (QSDAR) Models of Steroid Binding to the Corticosteroid Binding Globulin,” J. Comput. Aided Molec. Design, 15: 659-669, (2001). Addition of structural information through use of assigned spectral features was expected to improve the reliability of SDAR models. Surprisingly, however, CoSASA models of estrogen receptor binding using structurally assigned spectral data are no better than SDAR models that use unassigned spectral data as descriptors. Furthermore, CoSASA and related methods that rely on spectral data assigned to a common structural feature cannot be used to model properties of structurally dissimilar molecules.