Modeling a phenomenon in any field, such as biology, chemistry, physics, engineering, finance, sociology, toxicology, etc., provides insight into the factors controlling the phenomenon and enables the prediction of future trends without having to perform lengthy and costly experimental studies. For instance, a toxicological experiment to evaluate the carcinogenic potential of a chemical can take several years, consume millions of dollars, and cause unnecessary animal suffering.
A robust and predictive model, requiring only the structure of the chemical, may save unnecessary consumption of time, money, and life. However, a predictive model, generated from a limited set of available data, is representative of a closed system. Therefore, every model has a certain fixed domain of possible application. The model is not applicable outside of its fixed domain and therefore model-based predictions may not be reliable.
Quantitative Structure-Activity Relationship (QSAR) is a known technique to establish quantitative statistical models between structures and properties of chemicals. A number of QSAR models have been reported in the prior art to predict a variety of toxicological endpoints. However, no effort has been made to quantitatively define the application domain of these models. Some recommendations have been made to avoid QSARs resulting from chance correlations while some preliminary steps, employing univariate checking of independent variables, have been taken to define the application domain of the model. The prior art fails due to the fact that the application domain is a multivariate space and cannot be identified by univariate approaches.
Predictive models have always been sought in the physical, biological and social sciences, but the application of such models may not produce reliable results. Before accepting a prediction from any model, it is essential to ascertain that the model is applicable to make the prediction and compare the performance of the predictive model on a query object or point, etc., with the performance of the predictive model on an existing object having a shortest property sensitive similarity index from the query object, or point, etc. in a data processing system.