Research, development, and engineering studies typically provide data sets that contain two coordinated data elements:                One or more independent variables (study factors) at two or more levels.        One or more associated quality attributes or performance characteristics (response variables) obtained at each level or level combination of the study factors.        
Numerical analysis techniques can be applied to such coordinated data to obtain a prediction model (equation) for each response. The word prediction indicates that the model predicts the response that would be obtained at a given input level of each included study factor. Linear regression analysis is one example of a numerical analysis approach that can yield a prediction model. The simplest form of such a model is the equation of a straight line presented in Equation 1.Ŷ=mX+b  Equation 1
In Equation 1 Ŷ is the predicted response obtained from the model, m is the slope of the prediction line, X is an input level of the study factor, and b is a constant corresponding to the Y-intercept. The linear model presented in Equation 1 can be expanded to include any number of study factors and factor effects (e.g. interaction effects, simple curvature effects, nonlinear effects, etc.). A quadratic model that includes simple pairwise interaction effects (Xi*Xj, i≠j) and simple curvature effects (Xi2) of two study factors (X1, X2) is presented in Equation 2. The quadratic model is the model underlying most commonly used statistical optimization experiment designs, also referred to as response surface designs.Ŷ=β0+β1X1+β2X2+β12X1X2+β11X12+β22X22  Equation 2
The models obtained from analysis of research, development, and engineering experiments are used to obtain predicted responses at given input combinations of the study factor level settings. In this way the user can identify the study factor level settings that will most improve the responses corresponding to the prediction models. These studies are typically undertaken to meet one of two overarching improvement goals for each response:
Mean performance goal—achieve a specific target value of the response, or a response that exceeds some minimum requirement.
Process robustness goal—minimize variation in response quality attributes or performance characteristics over time.
A statistically designed experiment is the most rigorous and correct approach to obtaining accurate response prediction models from which the study factor level settings that most improve the response(s) can be determined. To illustrate, consider a batch drug synthesis process that consists of three process parameters (study factors): the process operating time per batch (run time), the mixing speed of the rotor in the material blending tank (stir rate), and the temperature maintained in the blending tank (mixing temp.). These process parameters are presented in Table 1, along with their current operating setpoint levels and appropriate study ranges for a statistical optimization experiment.
TABLE 1Process Control ParameterCurrent ProcessExperiment Range(Study Factor)Operating SetpointAround SetpointRun time (minutes)4030-50Stir rate (rpm)14 8-20Mixing temp. (Deg. C.)5040-60
For this process the critical response is the measured amount of drug produced (% Yield). To quantitatively define the effects of changes to the three process parameters on the % yield response requires conducting a statistically designed experiment in which the parameters are varied through their ranges in a controlled fashion. A typical response surface experiment to study the three process parameters would require 17 experiment runs—eight runs that collectively represent lower bound and upper bound level setting combinations, six runs that collectively represent combinations of the lower bound and upper bound of one variable with the midpoints of the other two variables, and three repeat runs of the current process operating setpoints (range midpoints). This experiment design is shown in Table 2 (in non-random order).
TABLE 2ExperimentRun TimeStir RateMixing Temp.Run(minutes)(rpm)(Deg. C.)Run 130840Run 250840Run 3302040Run 4502040Run 530860Run 650860Run 7302060Run 8502060Run 9301450Run 10501450Run 1140850Run 12402050Run 13401440Run 14401460Run 15401450Run 16401450Run 17401450
Assuming reasonably good data, analysis of the response surface experiment results can provide accurate quadratic prediction models for each response evaluated in the study. These models can then be used to identify the study factor level settings that will most improve the responses.
Process Robustness Goal
A statistically designed experiment is the most rigorous and correct approach to accurately defining the process robustness associated with any given measured response. To illustrate, consider again the batch drug synthesis process described previously. The process parameters are again presented in Table 3, along with their current operating setpoint levels and study ranges. However, note that in this case each study range is defined by the variation around the parameter's setpoint expected during normal operation (random error range).
TABLE 3Process Control ParameterCurrent ProcessExpected Variation(Study Factor)Operating SetpointAround SetpointRun time (minutes)4038-42Stir rate (rpm)1413.5-14.5Mixing temperature (Deg. C.)5049.5-50.5
To quantitatively define process robustness for the % yield response at the defined parameter setpoints (current operating setpoints in this case) requires conducting a statistically designed experiment in which the parameters are varied through their error ranges in a controlled fashion. A typical statistical experiment to define robustness for the three parameters would require 11 experiment runs—eight runs that collectively represent all combinations of the error range lower and upper bound settings, and three repeat runs of the current process operating setpoints. This experiment design is shown in Table 4 (in non-random order).
TABLE 4ExperimentRun TimeStir RateMixing Temp.Run(minutes)(rpm)(Deg. C.)Run 13813.549.5Run 24213.549.5Run 33814.549.5Run 44214.549.5Run 53813.550.5Run 64213.550.5Run 73814.550.5Run 84214.550.5Run 9401450Run 10401450Run 11401450
Again assuming reasonably good data, analysis of the robustness experiment can define the effect on the responses evaluated of each of the process parameters studied, individually and in combination. The magnitude of their cumulative effects on the response is an indirect indication of the process robustness at the one defined parameter setpoint combination.
It must be understood that analysis of the % yield data obtained from the statistically designed robustness experiment presented in Table 4 can only define the process robustness for % yield associated with the current setpoint level setting combination of the process parameters. There are two critical limitations in the information available from the experiment:
1. The experiment ranges of the variables are those expected due to random error. The ranges are therefore too small to provide the data from which an accurate prediction model of the % yield response can be developed. This same problem is inherent in the analysis of historical data sets, as will be discussed shortly.
2. The data obtained from the designed experiment can not be used to predict what the process robustness for % yield will be at any other level setting combination of the process parameters.
To address the process robustness goal, historical data are sometimes used as an alternative to a designed experiment. However, this approach is extremely unlikely to provide an acceptable result due to the statistical inadequacy of most historical data sets. Put simply, quantitatively defining robustness requires development of accurate response prediction models. Historical data are obtained from monitoring process operation and output over time. In these data the changes to the process operating parameters during process operation are not done in a controlled fashion. Instead, the changes are due to random variation in the process parameters about their setpoints (random error). It is normally not possible to obtain accurate prediction models from such data sets due to two fundamental flaws:
1. The response variation is due to random error variation in the process parameters. Therefore the magnitudes of the response changes are small—normally in the range of measurement error, and so can not be accurately modeled. This condition is referred to as low signal-to-noise ratio.
2. The process parameters are varied in an uncontrolled fashion. This normally results in data sets in which the process parameters are not represented as independent. The lack of independence severely compromises the ability to develop accurate response prediction models.
Another alternative approach to using a designed experiment is to conduct a simulation study using Monte Carlo methods. In this approach a random variation data set is first created for each process parameter; the random setpoint combinations are then input into a mean performance model to generate a predicted response data set. The final step involves statistically characterizing the response data set distribution and using that result to define the process robustness, again at the defined parameter setpoints. This is an extremely computationally intensive approach, and suffers the same two limitations presented above for the statistically designed process robustness experiment.
Current Practice—Sequential Experimentation
As the discussion above points out, both the statistically designed process robustness experiment and the Monte Carlo simulation approach only define the response robustness at a single set of conditions. Therefore, to meet both the mean performance goal and the process robustness goal simultaneously, a complete statistical robustness experiment must be carried out at each set of process parameter conditions (each experiment run) in the statistical optimization experiment. This would require 187 experiments (17×11). This is almost universally impractical. Therefore, the two experiment goals are normally addressed sequentially.
In the sequential approach the mean performance goal experiment is conducted first, and the response prediction models are used to define the optimum process parameter settings. The process robustness goal experiment is then carried out to define the robustness at the optimum process parameter settings. There is obviously a tremendous limitation to this approach:
When the optimum process parameter settings addressed in the process robustness experiment do not meet the robustness goal, the experimenter must start over. However, time and budget restrictions invariably do not allow for additional iterations of the sequential experiment approach. The result is that most process systems are sub-optimal in terms of robustness, and the consequence is a major cost in terms of significant process output being out of specification Accordingly, what is needed is a system and method to overcome the above-identified issues. The present invention addresses these needs.