Mathematical/statistical models are standard tools in determining how to best use drugs. Models are developed of the time course of the concentration of drugs (referred to as pharmacokinetic models) in various tissues, and the effects of drugs (referred to as pharmacodynamic models). There models are then used to understand the most appropriate dose and dosing interval, as well as whether and how to adjust doses for special populations (elderly, pediatric, patients with various diseases). In addition, these models can be used to simulate a variety of clinical applications (e.g., treatment of different population, different algorithms for adjusting doses and evaluating patient responses), in order to evaluate clinical trial designs (clinical trial simulation) or clinical practice. The current method for identification of the mathematical model that best describes the data (the optimal model) is a complex process based on knowledge of the properties of the drug and trial and error. The current process is best described as a manual binary tree search using forward addition.
NONMEM (Non linear Mixed Effect Model) is a software package developed by the University of California at San Francisco. This software is used to develop mathematical models. Typically these models are of biological response, particularly pharmacokinetic and pharmacodynamic models. NONMEM was the first, and remains the industry standard for developing complex pharmacokinetic/pharmacodynamic models.
Since NONMEM was introduced, a number of other software applications have been developed that have similar capabilities. These include WinNonMix (Pharsight Corporation), Kinetica 2000 Population (Innaphase Corporation), and a procedure in SAS (SAS Institute) called NLMIXED. These applications do essentially the same thing as NONMEM (mixed effect modeling), and the method described herein could be applied to those as well. In addition, these methods could be applied to non-linear regression and logistic regression.
The process of defining a near optimal model in mixed effect non linear regression, non linear regression and logistic regression is commonly called model building. In pharmacokinetic/pharmacodynamic modeling, data from the system of interest (usually a population of patients or normal subjects) is used to estimate the parameters of a mathematical model. Occasionally, attempts are made to break the system of interest down into smaller parts, each of which is then used to estimate the parameters of a model. NONMEM is the industry standard software for estimating parameters for a model, given a data set and a “model”. The model is a set of equations (algebraic or differential) that are intended to describe the system of interest. Once the set of equations is identified, the parameters of those equations are estimated (by “fitting”) by NONMEM or similar software. The “model building” part consists of an often long process of testing various models (sets of equations) for their ability to describe the observed data. The model is then modified, or rejected and a new model is tested. An example is described below.
Example of Pharmacokinetic Model Building
A study is done in which a single dose of a drug is given to 24 subjects. Plasma samples are collected over the subsequent 24 hours, and the plasma is assayed to determine the concentration of drug. A typical plot of the data from a single subject is shown in FIG. 1. Other subjects might have quite different plots. A mathematical model is sought to describe these data. Specifically, the model that best describes these data (defined as, among other factors the model with the smallest residual error) is sought. The standard pharmacokinetic models for such data consist of a series of linear differential equations. These equations describe the mass transfer of drug from and to one or more “compartments”. Compartments in a pharmacokinetic model are hypothetical volumes that contain drug. The differential equations describe the quantity (mass) of drug in the compartment as a function of time. The quantity of drug in a compartment is rarely observed. Rather, the concentration is observed by collecting a sample of a representative tissue (usually blood or plasma) and assaying that sample for the drug.
The model is then used to predict the concentration in the compartment by dividing the quantity of drug by the volume of distribution of the compartment. The volume of distribution of the compartment is a parameter estimated by fitting a model to observed data, using non-linear regression. The compartments used in these models may or may not correspond to any physiologic tissue. The “central compartment” describes the volume from which a plasma sample is collected. This central compartment may correspond to the blood volume, for some drugs (i.e, gentamicin). For other drugs, the central compartment may be larger and correspond to the blood and tissues that equilibrate rapidly with the blood (i.e., mass transfer rate constants are large). The central compartment and any peripheral compartments are defined by the equations that describe the time course of the concentration of drug, not by any physiologic properties.
If the data shown in FIG. 1 are plotted on a log scale, it is noted that a linear plot results. This plot is characteristic of a one compartment model, described by the differential equation:             ⅆ      A              ⅆ      t        =            -      k        ·    A  Where A is the amount of drug in the (single) compartment and k is the mass transfer rate constant out of this compartment (hence the− sign).
The observed plasma concentration then is given as A/V where V is the volume of distribution of this hypothetical compartment. Integrating the equation above, and dividing by V to get the observed drug concentration gives:   Concentration  =            Dose      V        ·          ⅇ              -        kt            Where Dose is the administered dose of drug (units of mass), and t is time. This model has two parameters, k and V. NONMEM provides estimates of these parameters, given a data set, and this model.
Other drugs may show two compartment pharmacokinetics. Two compartment pharmacokinetics are described by the differential equations given below.                                           ⅆ                          A              ⁡                              (                1                )                                                          ⅆ            t                          =                ⁢                                            -              k                        ·                          A              ⁡                              (                1                )                                              -                      k12            ·                          A              ⁡                              (                1                )                                              +                      k21            ·                          A              ⁡                              (                2                )                                                                                                  ⅆ                          A              ⁡                              (                2                )                                                          ⅆ            t                          =                ⁢                              k12            ·                          A              ⁡                              (                1                )                                              -                      k21            ·                          A              ⁡                              (                2                )                                                        This system of two compartments has two volumes, one for the central compartment (1) and one for the peripheral compartment (2). Typically, only the concentration in the central compartment can be observed, since it is, by definition, in rapid equilibrium with the blood. Peripheral compartments may correspond physiological to muscle, adipose tissue, brain tissue etc, or some combination of these.
The two compartment system has five parameters, k (mass transfer rate constant out of compartment 1), k12 (mass transfer rate constant from compartment 1 to 2), k21 (mass transfer rate constant from compartment 2 to 1), V(1) (volume of compartment 1) and V(2) (volume of compartment 2). Other, more complex system (3, 4 or occasionally more compartments) may be appropriate to describe other drugs. In addition to the selection of the number of compartments, a sub-model may best describe each parameter of the model. For example, it is frequently observed that the volume distribution is well described by a linear function of weight, of the form Volume=Θ•weight where Θ is a constant. If this is found to be the case, the equation for a one compartment model becomes:   Concentration  =            Dose              θ        ·        weight              ·          ⅇ              -        kt            Where Θ is a constant describing the relationship between weight and volume.
Similarly, the elimination rate constant (k) may be best described by an expression that includes renal function, liver function, age, race and/or gender. These relationships can be important in understanding how best to administer drugs to special populations such as the elderly, children, or to modify doses to target therapeutic concentrations. In a typical trial, standard demographic descriptors are collected including gender, age, weight, race, as well as clinical laboratory data describing renal function and liver function. Each of these descriptors is typically examined as a potential predictor of at least some of the parameters of the model.
Occasionally, pharmacokinetic data are not well described by a system of linear equations, and non linear equations are employed. As with linear models, there are a finite number of nonlinear models, each with a set of parameters. Again, each parameter is typically examined for a relationship with demographic and laboratory values descriptors (referred to as covariates in the model). Typical non linear kinetics are described by a Michaelis-Menton relationship.i This relationship describes a saturable elimination of the drug, with clearance follow the formula:   Clearance  =            V      ⁢                           ⁢              max        ·        Concentration                    Km      +      Concentration      Where Vmax is the maximum amount of drug that can be eliminated, and Km (Michaelis constant) is the concentration at which one half of the maximum clearance is observed. Vmax and Km may then be functions of covariate (age, weight, renal or liver function).
Linear pharmacokinetic models may be parameterized in more than one way. The simplest may be as rate constants and volumes. Commonly, a clearance can be described instead. Clearance is defined as the product of the volume and the rate constant. Units of clearance are volume/time. The one compartment pharmacokinetic model, parameterized in clearance and volume is given below:   Concentration  =            Dose      V        ·          ⅇ                        -                      (                          CL              /              V                        )                          ·        t            Where CL is clearance (units of volume/time) and t is time.
Occasionally, a clearance may be found to correspond to a physiologic process that eliminates drug. For example, gentamicin is eliminated essentially entirely by the kidney. Gentamicin clearance is found to correlate very well with a physiologic flow in the kidney known as the glomerular filtration rate. Other drugs have a clearance essentially equal to kidney blood flow, or liver blood flow. However, in general clearances are regarded as simply parameters that are estimated in fitting a pharmacokinetic model to data.
The equations discussed above describe the “structural model”, that is the model that takes a set of inputs (dose, time, weight, age, race etc) and results in a prediction for the observed value (a drug concentration for a pharmacokinetic model). The models described above are mutually exclusive, exactly one can be used in a given model. Presumably, exactly one will be the best of the group. For practical purposes, a number of useful models are enumerated in current pharmacokinetic software (e.g., NONMEM, WinNonMix). NONMEM for example has 12 libraries of pharmacokinetic models. These include one compartment, one compartment with first order absorption, two compartment, two compartment with first order absorption, three compartment, three compartment with first order absorption, a general linear model (1-10 compartments) and a general nonlinear (1-10 compartments) and Michaelis-Menten kinetics.
In practice, the predicted value is rarely equal to the observed value. Rather, the predicted value differs from the observed value by a random variable known in the statistical literature as the residual error (referred to as intra individual error in NONMEM documentation). The mean of all the residual errors (one for each observation) is by definition zero. That is, on average the prediction is equal to the observed value. However, any individual observed value may be higher or lower than the predicted value. Thus, the residual error will be greater or less than zero, but the average of all residual errors will be zero. A statistical model is used to describe the distribution of the residual errors. Typically, for statistical reasons in model fitting, the residual error is assumed to be normally distributed. However, the actual distribution of residual error from the data may be more consistent with other (e.g., skewed) distributions. Thus, it is usually necessary to examine a number models for the residual error as well. Five models of residual error represent the vast majority of work done in pharmacokinetic/pharmacodynamic modeling. These are the additive error, the constant coefficient of variation (CCV), the log normal error, the power model and a combination of the additive and log normal. The functional forms of these models are:
ModelFunctional formAdditiveY = F + εCCVY = F · (1 + ε)Log normalY = F · eεPower modelY = F + ε · {square root over ((1 + Θ · F2))}Combined Additive/logY = F · eε(1) + ε(2)Where Y is the observed value, F is the predicted value Θ is an estimated parameter and ε is a random variable with mean zero. The first three models have a single parameter to be fitted, the variance of ε, the fourth model has two parameters to be estimated Θ and the variance of ε, and the fifth has two parameters to be fitted, the variance of ε(1) and the variance of ε(2). Additionally, models for autocorrelated residuals can be implemented.
NONMEM is an acronym for NON linear Mixed Effect Model. The mixed effect part of the software refers to the combination of random and fixed effects. In practice this means that the values of parameters are permitted to vary from on person to another. This random effect is statistically entirely analogous to the random effect associated with the residual error. Physiologically, it means that the parameters can vary from one subject in a clinical trial to another. That is, one person will have a larger than average value for the volume of distribution, and another will have a smaller than average value. Frequently some, but not all of this variation is explained by differences in demographic variables (e.g., weight). Similar distributions of parameters can be applied to all the parameters of the model.
Three of the standard models that are applied to the residual error can be applied to this error, referred to in NONMEM as the inter (between individual) error. Unlike residual error however, the observed data may be consistent with no inter individual variability. Therefore, there may be four possible models of inter individual variability.
ModelFunctional formNo variabilityP = {tilde over (P)}AdditiveP = {tilde over (P)} + ηCCVP = {tilde over (P)} · (1 + η)Log normalP = {tilde over (P)} · eηWhere P is the parameter value for a given individual in the population and {tilde over (P)} is the mean value for the population. The random variable η (ETA) again has a mean of zero and a single parameter, the variance.
The variances of the ETAs are described by the variance-covariance matrix, called OMEGA in NONMEM. This matrix may be diagonal (all off diagonal elements constrained to be 0) or non-diagonal, with some or all the off diagonal elements estimated. Off diagonal elements of OMEGA describe the covariance of the individual parameter values between subjects.
Pharmacodynamic modeling is approached in a similar fashion to pharmacokinetic modeling. A model is sought that describes a given set of observed data. These observed data will include measurements such as blood pressure, cholesterol, HIV viral counts or other quantity that are effected by the administration of drugs. Often, a model consistent with current understanding of the physiology of the drug is sought. However, the underlying goal of the process is to describe the observed data. While some degree of creativity is more frequently observed with pharmacodynamic modeling, a well-defined set of standard model is found to adequately describe the vast majority of continuous systems. These models are:
ModelTypeFunctional formLinearAlgebraicY = M · C + BLogAlgebraicY = M · Log(C)Sigmoid EmaxAlgebraicY = (Emax · CH)/)(EC50H + CH)Indirect responseDifferentialdY/dt = Kin · (1 − (C/ic50 + C)) −model 1iiKout · YIndirect responseDifferentialdY/dt = Kin − Kout*model 2(1 − (C/(ic50 + C)) · YIndirect responseDifferentialdY/dt = Kin · (1 + (Emax*C)/model 3(EC50 + C)) − Kout · YIndirect responseDifferentialdY/dt = Kin −model 4Kout*(1 + (Emax*C)/(EC50 + C)) · YWhere Y is the observed response, C is the drug concentration, t is time and all other variables are fitted parameters. It may be observed that each of these responses may be modeled as mediated through an “effect compartment”. An effect compartment provides a mechanism to describe the commonly observed time delay in drug effects. The concentration (C in the equations) is not the concentration observed in the plasma of blood, but rather the concentration in a hypothetical effect compartment, whose time course is delayed by a linear rate constant compared to the central compartment. The differential equation that describes this is             ⅆ              C        e                    ⅆ      t        =            k      ·      C        -          k      ·              C        e            Where Cc is the hypothetical concentration in the effect compartment, and C is the concentration in the blood or plasma.Traditional “Model Building”
The literature documents a well-defined process to “build” these models. The conventional process is a very linear process, with one hypothesis tested at a time, and that hypothesis either accepted or rejected, then the next hypothesis tested, as is described by the techniques of forward addition and backwards elimination.iii Combinations of features are typically not tested together. This process consists of various plots to examine the raw data, as well as diagnostics (statistical and graphical) to select likely models and covariate relationships to examine. This process is very time consuming and labor intensive, often requiring weeks or months of work from an experienced practitioner.
A traditional approach to model building includes the techniques of forward-addition-backward elimination. This consists of starting with a base model, usually a simple model, and adding features to it, one at a time and testing whether each feature are statistically supportable (usually P<0.05 or P<0.01). This approach is widely applied to linear regression, logistic regression and forms the basis for the standard model building approach in mixed effects non linear regression. For example, in multiple linear regression, one might start with a base case or no linear effects, that isY=b
Then a single linear effect is added:Y=m(1)•x(1)+bWhere m(1) is an estimated parameters and x(1) is the first element of the x vector. A formal statistical test is done to determine if this is statistically significant. If it is, this term (feature) remains in the model and another element of x is added:Y=m(1)•x(1)+m(2)•x(2)+b
This is forward addition. Another technique is backward elimination. In this method, all candidate features are added in the base model. Each is removed, one at a time and the significance is assessed.
Pharmacokinetic/pharmacodynamic models have traditionally been developed using the forward addition approach, by starting with simple models (e.g., one compartment) and adding features.iv,v,vi,vii. For example, a data set would become available that included the data (time, dose, observed concentration) as well as potentially relevant covariates (age, weight, gender, race, renal and liver function). A one compartment model would be fitted to the data. Once this was done, graphics would be created to examine the data for other relationships that might be added to the model. For example, a plot of time vs residual error might be created. This plot would be examined for patterns characteristic of more complex time course of concentration (e.g., two compartment models).
Alternatively, a two compartment model could simply be fitted to the data as well and compared to the results of the one compartment fit. The comparison would be made based on (among other factors) a statistical test known as the log likelihood test. The log likelihood test requires that twice the log likelihood value (a measure of the goodness of fit) change by at least 3.84 unit for each parameter added to the model for the addition of that parameter to be statistically significant at the P<0.05 level. The addition of a pharmacokinetic compartment adds two parameters, a rate constant to describe the transfer of drug into the compartment and one to describe the transfer of drug out of the compartment. Therefore, the log likelihood value for a two compartment model must be at least 7.68 units (2•3.84) less than log likelihood value for the one compartment model. Additional compartments can be tested in the same way. Other features (absorption lag time, for example) can be tested similarly.
In order to apply the log likelihood test the models must be hierarchical. For models to be hierarchical, the smaller, less complex model must be a special case of the larger model. A one compartment model is a special case of a two compartment model, where the rate constants for transfer to and from the second compartment are equal to zero (or infinity). Therefore one and two compartment models are hierarchical. For models that are not hierarchical, other statistics can be used for comparison of models. These criteria include the Akaike information criteria and the Schwartz criteria.viii,ix 
Covariate relationships can be developed similarly. Graphics can be developed to examine relationships between post hoc estimates of parameters and demographics.x Post hoc estimates of parameters are estimates of individual's parameter values, based on Bayesian inference. Examination of these plots may suggest to the modeler that a relationship exists, which can then be incorporated into the model and formally tested. For example, one might find that a plot suggests that the volume of distribution is larger in people who have a greater body weight. This would then be modeled as:Volume=THETA(1)+THETA(2)*WeightWhere THETA(1) and THETA(2) are estimated parameters.
Note that this is a hierarchical model in comparison toVolume=THETA(1)If the value for THETA(2) is set to 0, the larger model becomes the simpler model. Therefore, the addition of this feature to the model can be tested for statistically significance.
Random effects (interindividual and intraindividual) can also be examined. There are no specific graphics to be examined for random interindividual effects. Typically there are relatively few of these to be tested (one for each basic parameter, e.g., rate constants and volumes). The initial model often includes a log normal inter individual error on each basic parameter. Based on the estimates of these, and how well they are estimated (the standard error of the estimate), they may be eliminated or retained in the model.
One can apply a less rigorous test to the addition of random effects, by comparing the Akaike information criteria to the model. The Akaike information criteria (AIC) is the log likelihood objective function+2•(# of estimated parameters). Each random effect adds one estimated parameter to the model (the variance of the distribution). Therefore, a decrease in the objective function (−2•log likelihood) of 2 when a single random effect is added would lead one to prefer that model. This is not a formal test of hypothesis however.
The literature contains one effort to automate this process.xi This work is derived from the technique of plotting post hoc estimates for parameters vs covariate values. Similarly to the current invention, each potential covariate relationship must be explicitly listed. The automated algorithm of Jonsson et al. then examines relationships between post hoc estimates of parameters and covariates. Each potential relationship is examined numerically. The most likely relationship is then incorporated in the next candidate model using the forward addition technique.
Limitations of Current Model Building Approach
Traditional statistical model selection in linear regression is based on backwards elimination. In backwards elimination, all postulated effects are entered into the model and then removed one at a time and tested for significance. This approach is not practical in NONMEM, for several reasons. First, some of the models are mutually exclusive. One may describe the relationship between weight and volume of distribution as linear or log-linear, it cannot be both. So, both effects cannot be in the model at one time. In traditional, linear regression, all effects are simple linear effects, no alternatives are available, and so the effect is either present or not present. Second, non-linear regression is computationally much more difficult than linear regression. Such a large model (with all possible effects) would invariably lead to computational difficulties. As a result of these problems using backwards elimination in NONMEM, forward addition is invariably used. The primary limitation being that it has been shown that the forward addition approach to model building has the potential to miss important interactions between effects, when effects are added one at a timexii.