The application of statistical methods to the treatment of disease, through drug therapy, for example, provides valuable tools to researchers and practitioners for effective treatment methodologies based not only on the treatment regimen, but taking into account the patient profile as well. Using statistical methodologies, physicians and research scientists have been able to identify sources, behaviors, and treatments for a wide variety of illnesses. Thus, for example, in the developed world, diseases such as cholera have been virtually eliminated due in great part to the understanding of the causes of, and treatments for, these diseases using statistical analysis of the various risk and treatment factors associated with these diseases.
The most widely used statistical methods currently used in the medical and drug discovery fields are generally limited to conventional regression methods which relate clinical variables obtained from patients being treated for a disease with the probable treatment outcomes for those patients, based upon data relating to the particular drug, drugs or treatment methodology being performed on that patient. For example, logistic regression methods are used to estimate the probability of defined outcomes as impacted by associated information. Typically, these methods utilize a sigmoidal logistic probability function that is used to model the treatment outcome. The values of the model's parameters are determined using maximum likelihood estimation methods. The non-linearity of the parameters in the logistic probability function, coupled with the use of the maximum likelihood estimation procedure, makes logistic regression methods complicated. Thus, such methods are often ineffective for complex models in which interactions among the various clinical variables being studied are present, or where multivariable characterizations of the outcomes are desired, such as when characterizing an experimental drug. In addition, the coupling of logistic and maximum likelihood methods limits the validation of logistic models to retrospective predictions that can overestimate the model's true abilities.
Such conventional regression models can be combined with discriminant analysis to consider the relationships among the clinical variables being studied to provide a linear statistical model that is effective to discriminate among patient categories (e.g., responder and non-responder). Often these models comprise multivariate products of the clinical data being studied and utilize modifications of the methods commonly used in the purely regression-based models. In addition, the combined regression/discriminant models can be validated using prospective statistical methods in addition to retrospective statistical methods to provide a more accurate assessment of the model's predictive capability. However, these combined models are effective only for limited degrees of interactions among clinical variables and thus are inadequate for many applications.
The Similarity Least Square Modeling Method (SMILES) disclosed in U.S. Pat. No. 5,860,917 (of which the present inventor is a co-inventor), and which is hereby incorporated, in its entirety, by reference thereto, is capable of predicting an outcome (Y) as a function of a profile (X) of related measurements and observations based on a viable definition of similarity between such profiles. SMILES fails, however, to provide a means to effectively handle multiple outcome variables or outcomes of different types. For multiple outcome variables, or Y-variables, SMILES analyzes each Y-variable separately as independent measurements or observations. Thus, one obtains a separate model for each Y-variable. When the Y-variables measure the same phenomena, they likely have induced interdependencies or communalities. It becomes difficult to perform analysis with separate independent models. Nuisance and noise factors complicate this task even further.
What is needed, therefore, are methods of providing statistically meaningful models for analyzing the Y-variables as an ensemble of related observations, to produce a a common model for all Y-variables as a function of multiple X-variables to obtain a more efficient model with better leverage on common phenomena and less noise.