Empirical studies in many fields of science and technology collect a set of data and use the data to examine a theoretical model related to that field. Such empirical studies apply in a variety of areas, such as healthcare, drug development, finance, or stock market modeling and predictions.
In one example, a cancer drug development group identifies a family of drugs as candidates for inhibiting the growth of cancerous tumor cells. The drugs can be administered in one of three candidate carrier solutions. The group wants to identify an effective drug and a corresponding suitable family of carrier solutions. To that end, the group compares the drugs, two at a time, by performing a bioassay experiment on multiple equivalent tumor cells placed in multiple isolated slots of a bioassay plate. The group treats randomly selected subsets of the slots with each of the two drugs administered in one of the three carrier solutions. After a period of time, the cells are examined to determine the number of tumor cells that are alive in each slot. The effectiveness of each drug is measured by the number of cells that are killed during the time period. The group would like to determine whether there exists any variability in the effectiveness of drugs across the drug family, in the effectiveness of carrier solutions across the carrier family, and in the interactions between different drug and carrier solution combinations.
In another example, a clinical trial group studies a newly developed antihypertensive drug for lowering blood pressure in humans. To that end, the clinical trial selects a group of patients with a history of hypertension. Each patient visits the clinic five times. During each visit, the patient randomly receives one of the five options: a placebo and four different doses of the drug. The clinical trial monitors the blood pressure and other health parameters of each patient during the study. Based on the results, the scientists would like to determine the effectiveness of the drug and, if effective, a proper dose of the drug.
Many of these studies apply statistical modeling to solve the problem. In statistical modeling, researchers often attempt to describe the observed data with a model. In many cases the researchers know of multiple candidate models that may describe the data and want to find from among those candidates a model that describes the data in a best manner. In many cases, a candidate model is a parametric model, which includes some variable parameters.
For example, the cancer drug development example above may use a 2-way completely randomized analysis of variance (ANOVA) with interactions model. In this parametric model, the measured data are explained by equation (1):yijk=μ+αi+bj+cij+εijk  (1)In equation (1), the index i runs over the values 1, and 2 for the two drugs; the index j runs over the values 1, 2, and 3 for the three carrier solutions; and the index k runs over the number of measurements. Further, the parameters and values are defined as follows:
yijk=kth measurement of response after treatment with drug i in carrier solution j
μ=mean response
αi˜N(0, σa2)=contribution of drug i to yijk 
bj˜N(0, σb2)=contribution of carrier j to yijk 
cij˜N(0, σc2)=contribution of (drug i, carrier j) interaction to yijk 
εijk˜N(0,σ2)=contribution of error term to yijk 
Moreover, the parameters αi, bj, cij and εijk are independent of each other.
The clinical trial example above, on the other hand, may use a the quadratic model shown in equation (2):yij=αi+βiIij+γiIij2+εij  (2)In equation (2), the index i runs over the number of patients and the index j runs over 1 to 5 for the five visits. Further, the parameters and values are defined as follows:
yij=reduction in blood pressure on the jth visit for subject i
Iij=concentration of drug administered to subject I on the jth visit (Iij=0 for placebo)
αi=subject specific intercept
βi=subject specific slope
γi=subject specific quadratic coefficient
εij˜N(0,σ2)=error term which is independent for different i or j
In one example, the subject specific parameters above may be derived from the population level values as αi=α+αi; βi=β+bi; and γi=γ+ci in which α is population level intercept, β is population level slope, and γ is population level quadratic coefficient. Moreover,
      (                                        a            i                                                            b            i                                                            c            i                                )    ~      N    ⁡          (              0        ,                              σ            2                    ⁢          D                    )      where D is a symmetric positive semi-definite matrix, and
(αi, bi, ci)T is independent for various i and also independent of εij in equation (2).
Substituting the above values in equation (2) results in the model shown in equation (3):yij=(α+αi)+(β+bi)Iij+(γ+ci)+εij  (3)
In some studies, a researcher may propose different models. A first model is said to be nested in a second model if the first model can be derived from the second model by imposing constraints on the parameters of the second model. Is some embodiments, a researcher chooses between a base model and an alternative model. The base model can be more complex or less complex than the alternative model. In some embodiments, the base model is nested in the alternative model. In some other embodiments, the alternative model is nested in the base model. Further, in some embodiments, none of the base model and the alternative model is nested in the other model.
For instance, in the first example, a researcher may hypothesize that one or more of the parameters ai, bj, or cij, are not needed and thus can be set to zero. Similarly, in the second example, a researcher may define a reduced model for the clinical trial study by hypothesizing that the population average of γj is not needed in the model. The researcher may desire to compare these alternative reduced models with each other and with the full model of Equation (1) or Equation (2). In particular, the researcher may desire to compare the models for their accuracy in explaining the data.