The execution steps in most research, development, and engineering experiments generally involve manual operations carried out on unconnected technology platforms. The scientist or engineer works in what are essentially isolated technology islands with manual operations providing the only bridges. To illustrate, when there is a Standard Operating Practice (SOP) Guide for the experimental work, it is often an electronic document, for example in Microsoft Word. The experimental plan (Step 1) within the SOP Guide has to be transferred to the target device (instrument, instrument platform, or component module for execution (Step 2) by manually re-keying the experiment into the device's instrument control program (ICP)—the device's controlling application software. In a few cases the statistical analysis of results (Step 3a) can be done within the ICP, but it is most often done within a separate statistical analysis software package or spreadsheet program such as Microsoft Excel. This also requires manually transferring the results data from the ICP to the analysis software package. Reporting of results (Step 3b) is usually carried out in Microsoft Word, and therefore requires the manual transfer of all results tables and graphs from the separate statistical analysis software package. The manual operations within the general execution sequence steps are presented below. The isolated technology islands are illustrated in FIGS. 1 and 2.
FIG. 1 illustrates the manual tools and operations involved in carrying out a research and development experiment. In this work a statistical experiment design protocol is first generated, via step 12. This protocol is developed manually and off-line using non-validated tools such as Microsoft Word. The protocol then must be approved, once again manually and off-line, via step 14. When required, sample amounts are then calculated using non-validated tools such as Microsoft Excel, via step 16. Thereafter the samples are prepared, via step 18 and the experiment is run on a target device, via step 20, for example, a high-performance liquid chromatograph (HPLC). Running the experiment requires manually re-constructing the statistical design within the target device's ICP. When this software does not exist, or does not allow for full instrument control, the experiment must be carried out in a fully manual mode by manually adjusting instrument settings between experiment runs.
FIG. 2 illustrates the manual tools and operations involved in analyzing the data and reporting the results of the research and development experiment, via step 22. The analysis and reporting of data is accomplished by first statistically analyzing and interpreting the experiment data, off-line, using non-validated tools such as Microsoft Excel. Next, it is determined whether or not there is a need for more experiments, possibly using off-line generic Design of Experiments (DOE) software, via step 24. Then, data are entered and a report is written, via step 26. Finally, the report is archived, via step 28. As is seen from the above, the research, development, and engineering experimentation process involves a series of activities that are currently conducted in separate “technology islands” that require manual data exchanges among the tools that are used for each activity. However, until now, no overarching automation technology exists that brings together all the individual activities under a single integrated-technology platform that is adapted to multiple devices and data systems.
Method development activities encompass the planning and experimental work involved in developing and optimizing an analytical method for its intended use. These activities are often captured in company Standard Operating Procedure (SOP) documents that may incorporate Food and Drug Administration (FDA) and International Conference on Harmonization (ICH) requirements and guidances. Method development SOP documents include a description of all aspects of the method development work for each experiment type (e.g. early phase analytical column screening, late phase method optimization, method robustness) within a framework of three general execution sequence steps: (1) experimental plan, (2) instrumental procedures, and (3) analysis and reporting of results. The individual elements within these three general steps are presented below.
Step 1: Generate Experimental Plan
                Select experiment type        Select target instrument        Define study variables:                    analyte concentrations            instrument parameters            environmental parameters                        Specify number of levels per variable        Specify number of preparation replicates per sample        Specify number of injections per preparation replicate        Integrate standards        Include system suitability injections        Define Acceptance CriteriaStep 2: Construct Instrumental Procedures        
Define required transformations of the experiment plan into the native file or data formats of the instrument's controlling ICP software (construction of Sample Sets and Method Sets or Sequence and Method files).                Specify number of injections (rows)        Specify type of each injection (e.g., sample, standard)Step 3: Analyze Data and Report Results        Specify analysis calculations and report content and format        Carry out numerical analyses        Compare analysis results to acceptance criteria (FDA & ICH requirements)        Specify graphs and plots that should accompany the analysis        Construct graphs and plots        Compile final report        
The execution steps in analytical method development generally involve manual operations carried out on unconnected technology platforms. To illustrate, an SOP Guide for the development of an HPLC analytical method is often an electronic document in Microsoft Word. The experimental plan (Step 1) within the SOP Guide has to be transferred to the HPLC instrument for execution (Step 2) by manually re-keying the experiment into the instrument platform's ICP—in the case of an HPLC this is typically referred to as a chromatography data system (CDS). In a few cases the statistical analysis of results (Step 3) can be performed within the CDS, but it is most often carried out within a separate statistical analysis software package or spreadsheet program such as Microsoft Excel. This also requires manually transferring the results data from the CDS to the analysis software package. Reporting of results (Step 3) is usually carried out in Microsoft Word, and therefore requires the manual transfer of all results tables and graphs from the separate statistical analysis software package. The manual operations within the three general execution sequence steps are presented below.
Step 1—Experimental Plan
                Development plan developed in Microsoft Word.        Experimental design protocol developed in off-line DOE software.Step 2—Instrumental Procedures        Manually build the Sequences or Sample Sets and instrument methods in the CDS.        Raw peak (x, y) data reduction calculations performed by the CDS (e.g. peak area, resolution, retention time, concentration).Step 3a —Statistical Analysis        Calculated results manually transferred from the CDS to Microsoft Excel.        Statistical analysis usually carried out manually in Microsoft Excel.        Some graphs generated manually in Microsoft Excel, some obtained from the CDS.Step 3b—Reporting of Results        Reports manually constructed from template documents in Microsoft Word.        Graphs and plots manually integrated into report document.        
It is realized that prior art systems in the area do not address the overarching problem of removing the manually intensive steps required to bridge the separate technology islands. Similarly, it is also realized from the prior art that inherent data loss is known to occur in sampling of experimental results to impact quantitative effect estimations and thereby degrade and typically render inaccurate statistical confidences from experimental results. However, the prior art is not instructive in assisting in overcoming these problems to improve the accuracy or analyzability of experimental results and sampling, nor is the prior art instructing in overcoming deficiencies enabling one to develop a more readily obtainable solution to overcome inherent data loss, provide an identifiable metric for separate experimental undertakings, or provide information about resulting effects where experimental samples contain inherent data losses.
For instance, often trial runs of research and development (R&D) experiments may be carried out by making changes to one or more controllable parameters (as used herein such may include but not be limited to study factors, instrument settings, controllable parameters of instrumentation, a set of discrete process events, or other experimentation factors with other factors remaining constant (as used herein the controlled portion of an experiment or experimental run or trial) of a process or system and then measuring test samples obtained from in-process sampling or process output. Typically, an objective of a researcher in these undertakings is to identify and quantify the effects of the parameter changes on the identified important process output quality attributes or performance characteristics that are being measured. The quantified effects can then be used to define the parameter settings that will give the desired process output results.
FIG. 3 illustrates a generalized flow diagram of a process in a predetermined process flow direction (305) consisting of four discrete elements (300): base material input (310), key reactant input (320), heating (330), and chemical reaction (340). For the avoidance of doubt, FIG. 3 and its related embodiments are foundational to the present invention herein. The flow diagram 300 also contains a process endpoint measurement step at 350. In this generalized process 300 the base material element may have one or more controllable parameters such as material feed rate or be of two or more blended components including base material formulation for example.
The process 300 of FIG. 3 can similarly be analogized via a chemical separation process performed by instrumentation such as that of an HPLC. FIG. 4A is demonstrative of such an adaptation of the general process flow diagram 300 of FIG. 3 to that of an HPLC. In FIG. 4A, the flow diagram 400 comprises three primary HPLC process elements: solvent delivery (410), sample injection (420), and a separation chamber (430).
In FIG. 4A, method development experiments may be performed on controllable parameters within the HPLC to identify the parameter settings that are optimum for the separation of a given mixture of compounds. In such experiments, one critical performance characteristic being measured, for example, may be the degree of separation of the mixture into isolated pure individual compounds, as is further defined by the legend at 440. However, and more particularly in typical practical applications such as those within the pharmaceutical industry, the active pharmaceutical ingredient (API) and one or more impurities in a drug product often represents a normal mixture of compounds for which an HPLC method must be developed. As is known from practical applications under tradition methods, accurately measuring the amount of API in a test sample (or actual sample) with an HPLC would require that the instrument first separate the API from the impurities.
As used herein, the term “impurities” are defined to include but not be limited to components of the drug product formulation, which may also be termed excipients, or contaminants that come from various points or stages in the process or even the product packaging of an affected product or sample. For example, an impurity may be a plastic compound from a product container that may contaminate the surface of the drug tablet for instance. By further example, a test sample may be a dissolved tablet (i.e., the solid dosage form of the drug product) that contains the API and impurities.
Therefore a critical HPLC method development experiment objective in a traditional practice application may include identifying the instrument operating conditions that separate the API from the impurities in a test mixture to the degree required (i.e., accuracy level) to accurately measure the API amount. Further in separation method development experiments, for example, some of the HPLC parameter settings used in the experiment trials can result in the inability to accurately measure a critical performance characteristic, such as compound separation. These issues are known to be a significant challenge for researchers and commercial entities alike.
The consequences of these limitations realized by many in the field then are the inherent data losses in one or more experiment trials which can then result in the inability to quantitatively analyze the experiment results and draw any meaningful conclusions.
FIG. 4B depicts an instrument hardware framework 450 associated with an HPLC instrument system. The HPLC framework 450 comprises several process elements with controllable parameters that can be experimentally addressed. The process elements include: solvent formulation and solvent pH (CVM—Solvent Switching) (451), the solvent flow rate (Pump Module) (452), the type of separation column (CVM—Column Switching) (453), a sampler (454) and a detector (455).
For FIG. 4B, a typical experiment (i.e., method development experiment) may be comprised of conducting one or more trials where a trial consists of operating the HPLC instrument at one or more predetermined settings of the study parameters, injecting a small amount of the sample mixture into the solvent stream and measuring critical performance characteristics such as the degree of separation of the individual sample compounds at the endpoint of the process 455.
By exemplar, objectives of experimentation under the framework of FIG. 4B in view of the process set forth in FIG. 4A, may include attempting to separate out one or more APIs from impurities. In such experiments, for example, the controllable parameters of the CVM module (451) and the Pump Module (452) may be selected for experimental study. In such experiments, CVM solvent switching parameters may be adjusted between experiment trials to deliver a solvent mix at a different pH and the results captured. In such experiments, CVM column switching parameters may also be adjusted so as to employ a different column, for example, in each experimental trial undertaken. Similarly, in such experiments, pump module parameters may be adjusted between trials to both change the rate at which the solvent formulation is changed (i.e., proportion of organic solvent increased) during a trial run and to deliver the solvent formulation at a different flow rate. However, as will become further evident, in these types of experiments, despite the objectives of experimentation including attempts to separate out one or more APIs from impurities by selecting predetermined controllable parameters for experimental study, the results can be inaccurate.
FIG. 4C depicts a graphical chromatogram representation 460 of experimental results data obtained from a particular trial run trial in one of the experimental runs under assessment herein (e.g. trial run 11), wherein the “raw” results depicted in the figure are in the form of “absorbance peaks.” A peak typically occurs when a compound absorbs light transmitted through the solvent stream and is detected by the detector as the compound passes the detector at a given time X, wherein the baseline condition represents zero absorbance of the light.
As used herein, an “absorbance peak” or “peak” generally means a vertical spike (Y axis deviation) along a horizontal line in the graph from baseline conditions (where Y=zero) occurring at a given X axis time interval. As also used herein, a compound's “retention time” is defined as the time from injection to detection, and, in the chromatogram, this time is the X-axis value corresponding to the peak's maximum Y value.
In FIG. 4C, poorly separated peaks are apparent at 461 and 462. Interpretatively, each peak in FIG. 4C corresponds to at least one compound (i.e., the API or an impurity). It should also be readily recognized that the area under a given peak is proportional to the amount of absorbed light, which is in turn proportional to the amount of the corresponding compound in the solvent stream passing the detector at the time indicated on the X axis in the chromatogram.
However, problematically, translating the measured area of a given peak into an amount of the corresponding compound is typically accurate only where the peak in a chromatogram is the result of only one compound. As a result, accurately measuring the amount of an individual compound in a sample using traditional approaches is difficult and often impossible when two or more compounds pass through the detector at the same time due to lack of separation (i.e., 461 and 462). Unfortunately, the occurrence of two or more compounds passing through the detector at the same time due to lack of separation is quite a common event in many method development experiment instances.
To attempt to compensate for this limitation, often a primary goal of many HPLC method development experiments is to identify the instrument settings that result in a chromatogram with the following two critical characteristics: (1) an observable peak being present for each compound in the sample; and (2) situations where each peak is separated from all other peaks (i.e., no overlap) to a degree at least minimally necessary to accurately quantify the amount of the corresponding compound in the sample. The degree of separation between a given pair of adjacent compound peaks in a chromatogram is defined herein as the “peak resolution.”
In a traditional approach to HPLC method development, the effect of instrument setting changes on the resolution of sample compounds is therefore typically relied on as being one of the most important experiment results. As a result, it is traditionally believed and practiced to carry out the following steps:    a. change one or more instrument settings, inject a sample, and obtain a resulting chromatogram;    b. associate each peak in the chromatogram with one of the sample compounds;    c. compute the peak resolution results for all adjacent peak (compound) pairs;    d. determine if the compounds are sufficiently separated, as represented by the adjacent peak pair resolution data, to accurately determine the amount of each compound in the sample to the required level of precision; and    e. repeat Steps (a)-(d) above if the compounds are not sufficiently resolved.
Unfortunately, the correct assignment of the sample compounds to the chromatogram peaks as in Step (b) above is critical to accurately interpret experiment trial results in accordance with traditional practice. Such traditional practice characteristics may include current numerical analysis approaches and the like. Since, as is often the situation, current analysis and interpretation approaches target the interactions of each compound with the HPLC system elements that result from the specific chemical and structural nature of the compound, determining specifically and precisely which compound each resolution result associates with, in a given chromatogram, is effectively the only way to track the effects of instrument changes on the separation of that compound.
A further complication especially common to early HPLC method development experiments that involve analytical column and pH screening has been that it may not be readily determinable as to how many compounds are in an experimental sample, and therefore how many peaks an experimenter is to expect in a chromatogram obtained from sample analysis by HPLC. This particular complication is further illustrated by comparing FIG. 4C with FIG. 4D.
FIG. 4D is a chromatogram 470 obtained from the same sample of FIG. 4C as analyzed under different trial settings of the HPLC instrument. The chromatogram of FIG. 4D shows twelve well separated peaks being visible along the X axis time interval of 10 to 34 minutes (see for example representative peaks at 471 and 472, where an uncertain or undefinable number of peaks exist in this same interval in FIG. 4C (see for example representative points at 461 and 462).
However, additional complications can result even where the number and identity of all compounds in a test sample are known as such knowledge does not necessarily simplify the work of correctly associating each peak with a sample compound in each trial chromatogram, since instrument changes between trials can affect both peak shape (i.e., broad-flat versus narrow-spiked) and the column transit time of the corresponding compound (i.e., peak retention time).
For example, for a particular experimental trial, a peak arising in a resulting chromatogram corresponding to a given compound may occur at 15 minutes and appear narrow and spiked. In a second trial with different instrument settings, the peak corresponding to the same compound may occur at 12 minutes and may appear as being broad and flat. Contradistinctively, a third trial's settings may cause a second peak to also occur at the 12 minute location in the chromatogram resulting in a combined peak that differs greatly in shape and area from the others. By further example, in FIG. 4C at 461, overlapping peaks corresponding to incompletely separated compounds can be seen, and again at 462, while peaks with the same or very similar shape and area in FIG. 4D occur at approximately 22, 23, and 24 minutes (473, 474, and 475 respectively).
Exemplary Experimental Data
FIG. 13 is a table that presents a data set from an experiment to develop a HPLC method for a drug product sample containing two APIs and several impurities. In the data set of FIG. 13 the peak resolution responses are used directly in data analysis according to the current practice (i.e., traditional) approach. As used herein, it is understood that the standard calculation of resolution for a given compound represents the normalized distance (i.e., degree of separation) of the compound's peak from the peak directly in front of it in the solvent stream, which corresponds to the peak directly to the left of the subject in the chromatogram, since that peak has an earlier X-axis time point. Therefore, for example, in the data set presented in FIG. 13, the “3—Resolution” column response represents the degree of separation of Compound 3 from Compound 2 (where Compound 2 is the compound directly ahead of it in the solvent stream). Similarly, the “4—Resolution” column response represents the degree of separation of Compound 4 from Compound 3, and the remaining columns of FIG. 13 are similarly defined.
As becomes apparent from FIG. 13, notably absent are numerous resolution result values in the data set for Compounds 3 and 4a—two impurities that must be able to be separated from Compounds 4 and 5, the two APIs in this drug product sample. The trials in which the resolution values for these impurities are missing correspond to instrument settings which were unable to separate the impurities from the APIs. This assessment is visible for Compound 4a when compared with the chromatograms in FIGS. 4E and 4F, which correspond to the results obtained from two distinct experiment trials, 11 and 12 respectively, as identified in FIG. 13. FIG. 4E is a chromatogram 479 resulting from an experiment run of trial 11, of which there is no peak corresponding to Compound 4a therein. FIG. 4F is a chromatogram 485 resulting from an experiment run of trial 12.
FIG. 4G is a chromatogram 490 resulting from an experiment run of trial 22. By comparison of FIGS. 4F and 4G, the differences between the chromatograms illustrate an entirely different kind of inherent data loss that also severely compromises the current practice approach. In this comparative assessment, both trials represent instrument conditions in which Compound 4a is resolved. However, the resolution result in trial 12 (FIG. 4E) is a measure of the separation of Compound 4a from Compound 3; while in trial 22 (FIG. 4F) the result is a measure of Compound 4a separation from Compound 5 (in part due to Compounds 3 and 5 overlapping in this particular trial).
Unfortunately, this type of resulting change in what the data represent across trials, which represents inherent loss in terms of information content of the data, is a common consequence of the change in peak locations in response to the changing instrument settings, and represents a challenging problem.
The result of inherent data loss in HPLC method development experimental work is that the data typically do not accurately represent a compound's actual chemistry-based behavior, and, as a consequence, provide doubt towards legitimate analysis and accurate interpretation of the results. This impact to the integrity of the results is further observable via regression analysis (equation-fitting) of the Compound 4a data, the results of which are set forth in FIG. 14. FIG. 14 is regression statistics for compound 4a. 
For instance, the R2-Adj. (see “Adj. R Square” in FIG. 14) in FIG. 14 is the critical measure of equation predictive accuracy. As in FIG. 14, the value of 0.0639 is depicted to be not statistically different from zero, thereby meaning that the equation has no or questionable predictive accuracy. However, the study parameters included Column Type (e.g., two very different columns) and a wide range of Final % Organic (i.e., the gradient endpoint percent organic solvent)—two instrument parameters known to greatly affect compound separation under almost all conditions. Additionally, the observed changes in the resolution data across trials are substantially greater than can be accounted for by HPLC operating error.
Therefore, it can be and is readily determined that inherent data loss is often the cause of the inability to derive statistically valid results from numerical analysis of current practice data.
The problems described here are systemic to current HPLC method development experiment practice. In part the complications and limitations of the traditional approach start the method development process by studying the factors known or expected to have the greatest affect on peak shape and compound retention time, and therefore peak separation. However, this traditional approach results in changes that make correct compound assignments between trials extremely difficult and challenging. As a result, critical information sought from the experiment is normally not readily available due to the limitations inherent in the practice itself.