1. Field of the Invention
The present invention relates, in general, to process control, and, more particularly, to analysis of process data in pharmaceutical and other capital intensive manufacturing processes with batch data distributed across a plurality of disparate databases, to identify and demonstrate interactions of process variables that are likely to strongly affect desired outcomes.
2. Relevant Background
Great effort is expended to control manufacturing processes so as to avoid process parameter combinations that result in unsatisfactory product and to enhance the likelihood of producing superior product. Nowhere is this more true than in the manufacturing of pharmaceuticals, food and food supplements, and health care products and capital intensive manufacturing processes in general. Defective products increase the risk to consumers, increase costs, and waste resources that could otherwise be applied towards making cost-efficient, effective drugs and other products.
There is a class of data analysis problems, frequently encountered in manufacturing, that occurs when a specific sequence or combination of events or process variables causes an undesirable outcome, usually unsatisfactory product. In these situations, it is desirable to find those variables and their ranges of values which, in combination, are associated with the undesirable outcome. Traditional statistical methods are limited in their ability to provide the required information. It would therefore be desirable to have a method which solves these types of data analysis problems and to have software implementing the method in a computer which can be used by process operators, supervisors and engineers.
Principal component regression (PCR) techniques promise a more effective analysis of manufacturing data. PCR analysis has shown high value for focusing attention on those controllable process parameters that have the greatest likelihood of impacting manufacturing by subjecting the process data to a two-step process: Principal Component Analysis (PCA) followed by regression. Most large data sets, such as those produced during pharmaceutical and other capital intensive manufacturing, have properties that prohibit effective modeling using the raw process data. There is usually considerable correlation between the independent variables and there are usually a large number of variables. Using PCA, these attributes are used to advantage. This is done by transforming the data so that the large number of correlated variables are replaced by a much smaller set of uncorrelated variables. The new, smaller set of variables (called "principal components") contains substantially the same information as the larger set of raw data within measurement error.
Traditional statistical software tools are available to apply statistical process control techniques to available process data. Examples include SAS, S-PLUS, StatServer, Statistica, Matlab, Impropmtu, Mathematica and JMP. Several of these packages have good 2D display capabilities for standard trending and bar charts. However, each requires some amount of command line programming to perform PCR and they do not support robust, flexible visualizations. These software packages are generally difficult to learn because they include a comprehensive set of statistical techniques, most of which must be sidestepped to do PCR. They lack sufficiently powerful, flexible and intuitive graphic display capabilities for visual pattern recognition. Most have limited ability to handle the very large data sets that are common in pharmaceutical and other capital intensive manufacturing environments. In summary, they have limitations in their ability to provide useful point-and-click workflows specific to the requirements of pharmaceutical and other capital intensive manufacturers. Moreover, there exists deficiencies in both the front-end and back-end of available methods. In the front end, it is often difficult to access the relevant process data needed to make meaningful statistical analysis. On the back end, tools to format and display the results of statistical analysis in a readily discernible manner are lacking.
Access to relevant process data is impeded because process data is gathered and stored in a variety of disparate data storage systems. Hardware systems for gathering and storing data have become increasingly less expensive and more widely available in recent years. A very large number of measurement systems are installed in manufacturing companies gathering and storing vast amounts of raw batch data. These raw data have little utility on their own, yet can be a strategic asset for manufacturing process improvement, trouble-shooting and control. The value of this data can only be realized once the information content is extracted and used for decision-making. This is a widespread problem in the pharmaceutical industry. The lack of a well integrated data analysis and visualization software system easily usable by non-programmer scientific professionals is hampering the ability to extract information from their manufacturing data. There is a need for systems to assist with lost batch avoidance, process improvement, trouble-shooting and technology transfer.
Process data is often stored in legacy systems as well as modern database architectures. The process data is often distributed across a variety of hardware, sometimes at a variety of geographical locations. In practice, an analyst wishing to gather a particular set of data must request the data from the variety of sources, condition, reformat, import and export the data in a manner that is compatible with the analysis tools. These processes delay the analysis process and limit the variety of analyses that can be employed. Unfortunately, relevant data is often not included in the analysis because of the difficulty and delay in obtaining the data.
On the back end, the statistical analysis tools often provide only a raw numeric output or at best two-dimensional and sometimes three-dimensional representations of various process parameters. These static representations of the statistical analysis are useful, but limit the ability of the analyst to detect and demonstrate the process interactions that affect the product outcome. Particularly in regulated industries such as pharmaceutical, food, and food supplement industries a need exists to demonstrate the process control techniques used and the results provided by statistical process control. A need also exists for process analysis tools that enable an analyst to readily access relevant process data and visualize the results of statistical analyses.
There are some very capable three dimensional and four dimensional graphics applications available on the market, for example PV-WAVE and Advanced Visual Systems (AVS). These graphics applications are competent at quickly drawing a wide range of sophisticated displays, and they can handle the large data sets common in pharmaceutical manufacturing. However, they either lack the range of needed statistical capabilities and/or they have steep learning curves for either command line programming in the case of PV-WAVE, or object configuration in the case of AVS.
Regulated industries have unique needs for retrospective batch process data analysis, demanding powerful capabilities for statistical analysis, pattern recognition and data visualization that are not being satisfied by currently available systems. These industries need continuous process improvement as well as the ability to accomplish process equivalence demonstrations, demonstrations related to product lot release, and product specification failure avoidance.