The Food and Drug Administration (xe2x80x9cFDAxe2x80x9d) sets guidelines for testing methods utilized in the pharmaceutical and related industries. The FDA evaluates whether a particular analytical method is suitable for its intended purpose. Once it is established that the method is suitable, the method is xe2x80x9cvalidated.xe2x80x9d
In order to validate a testing method, the FDA requires an applicant to evaluate many differing characteristics of the method. Although not all the characteristics of a particular method must be demonstrated in each case, the linearity of the relationship between an actual analyte concentration and a test result from the method is required for all quantitative methods.
Linearity is independent of the technology used to ascertain the analyte concentration. For instance, even the most modern instrumental methods that rely on multivariate chemometric computer methods have to produce a number that represents a final answer for the analyte, which would be the test result from the instrument. Therefore, the term xe2x80x9clinearityxe2x80x9d applies to all types of analytical methodology from manual wet chemistry to the latest high-tech instrument.
The FDA guidelines provide various definitions of the meaning of the term xe2x80x9clinearityxe2x80x9d. For instance, one definition is: xe2x80x9c. . . ability (within a given range) to obtain test results which are directly proportional to the concentration (amount) of analyte in the sample.xe2x80x9d This is a definition that is essentially unattainable in practice when noise and error are taken into account. For instance, FIG. 1 illustrates the problem with this strict definition. A set of hypothetical data points that most would agree represents a substantially linear relationship between a test result and an analyte concentration is illustrated in FIG. 1. However, even though there is a line that meets the criterion that xe2x80x9ctest results are directly proportional to the concentration of analyte in the samplexe2x80x9d, none of the data points actually fall on the line. Therefore, based upon the FDA definition, none of the data points representing the test results can be said to be proportional to the analyte concentration.
Differing descriptions of linearity are also provided. For instance, one recommendation is visual examination of a plot (unspecified, but presumably also of the method response versus the analyte concentration). Because this method requires a visual examination, it is inherently subjective and not amenable to the application of statistical tests, making an objective mathematical evaluation unattainable. This method is also open to different interpretations, and is unsuitable for application with computerized or automated screening methods.
A further recommendation in the guidelines is to use xe2x80x9cstatistical methodsxe2x80x9d; where calculation of a linear regression line is advised. This however, is not so much a definition of linearity, as an attempt to evaluate linearity. For instance, if regression is performed, then the correlation coefficient, slope, y-intercept and residual sum of squares are determined. However, there are no guidelines as to how these quantities are to be related to linearity. One reference by F. J. Anscombe, Amer. Stat. 27 pp. 17-21, presents several (synthetic) data sets, which are fit to a straight line using Least Squares regression. One data set is substantially linear, while another is a data set that is non-linear. However, when linear regression is performed on any of these data sets as recommended by the guidelines, all the recommended regression statistics are identical for the sets of data. It is immediately observed that the linear regression results cannot distinguish between the two cases, since the regression results are the same for both of them.
Other linearity tests exist, in addition to the ones in official guidelines. One such proposed test is the Durbin-Watson (xe2x80x9cDWxe2x80x9d) statistic, for use as a statistically based test method for evaluating linearity. However, it has been determined that use of the DW statistic provides unsatisfactory results. For instance, DW for residuals from regression data that are random, independent, normally distributed and represent a linear relation between two variables has an expected value of two. (See Draper, N., Smith, H., xe2x80x9cApplied Regression Analysisxe2x80x9d 3 ed., John Wiley and Sons, New York (1998) pp. 180-185). However, a fatal flaw in the DW method for use in this regard may be shown by calculating the DW statistic for the data sequence: 0, 1, 0, xe2x88x921, 0, 1, 0, xe2x88x921, 0, 1, 0, xe2x88x921, . . . which, also results in a computed value of two, despite the fact that this sequence is non-random, non-independent, not normally distributed and not linear. Sets of residuals showing a similar cyclic behavior also compute out to a value of DW that will erroneously indicate satisfactory behavior of the residuals.
Another test is a statistical F-test. An F-test is based on comparing sample estimates to the overall error of the analysis. This test is undesirable because it is insensitive. For instance, any bias in the estimates of the concentration will inflate the F-value, which will be taken as an indicator of non-linearity when some other phenomenon may be affecting the data. Furthermore it requires multiple readings of every sample by both the method under test and the method used to determine the actual concentration of the analyte, making it impractical to apply on a routine basis, and inapplicable to already existing data.
Still another method is disclosed by Haid, A., xe2x80x9cStatistical Theory with Engineering Applicationsxe2x80x9d, John Wiley and Sons, Inc. New York (1952). Hald recommends testing whether the residuals are normally distributed since it is unlikely that the residuals will be normally distributed if there is appreciable non-linearity in the relationship between concentration and the test results. However, this test is again insensitive to actual non-linearity (especially for small numbers of samples), and also suffers from the same difficulties as the F-test, namely that other types of problems with the data may be erroneously called non-linearity.
None of the above-mentioned methods are completely satisfactory for utilization in the pharmaceutical and related industries. In fact, the recommendations of the official guidelines for evaluating linearity, both the definitions and the recommended method(s) for assessing it are themselves not suitable for their intended purpose.
Therefore what is desired is to provide a new method for reliably testing the linearity of data.
It is further desired to provide statistical results that the current FDA test procedure recommends in a context that makes those statistics more meaningful.
It is further desired to provide the derivation and details of the operation for the new method of evaluating data.
It is also desired to disclose a report on the ability of the new method to test linearity by applying it to data from a real analytical method.
It is further desired to disclose a report on the ability of the new method to test linearity of Near Infra-Red (xe2x80x9cNIRxe2x80x9d) spectroscopic analysis using diffuse transmittance measurements.
It is still further desired to disclose a report on the ability of the new method to test linearity of NIR spectroscopic analysis using diffuse reflectance measurements.
We propose a method of determining non-linearity (or showing linearity) by fitting a straight line to the data and assessing the fit. But as we showed, the currently recommended method for assessing linearity is faulty because it cannot distinguish linear from non-linear data.
Examination of the requirements of the FDA for evaluating the linearity of an analytical method reveals them to be unsatisfactory, in both the definition of linearity and in the specifications for testing this property of an analytical method. Therefore, we first define linearity as follows; linear data is: xe2x80x9cdata where the relationship between analyte concentrations and test results can be fitted (in the Least-Squares sense) as well by a straight line as by any other function.xe2x80x9d
When examining the proposed definition of linearity, this method may seem to be similar to the FDA approach. However, the difference is that this new method includes fitting other functions to the data and comparing the fits, whereas the FDA guidelines only specify trying to fit a straight line to the data. This new method is also compatible with the proposed definition of linearity because functions other than a straight line are compared to the data, and if an improved fit is not obtained, it is concluded that the data is linear.
It is possible to fit other functions to a set of data, using least-squared mathematics. In fact, the Savitzky-Golay (xe2x80x9cS-Gxe2x80x9d) algorithm is based on fitting polynomials to data. However, this new method differs from the S-G method in that, while S-G fits a polynomial to small sections of the data, we fit the polynomial to the entire data set simultaneously, rather than a few data points at a time.
For the new method, data is assumed to be univariate and to follow the form of some mathematical function, although the nature of the function may be undetermined. From Taylor""s theorem, however, any function may be approximated by a polynomial, although the degree of the polynomial may also not be known a priori (the xe2x80x9cdegreexe2x80x9d of a polynomial being the highest power to which the variable is raised in that polynomial).
Based upon the forgoing, we do not need to approximate the relationship between test results and analyte concentration as accurately as possible, but rather, we need only ascertain whether a straight line fits the data as well as a polynomial. To accomplish this, it has been determined that it is not necessary to utilize polynomials of high degree.
Accordingly, in one advantageous embodiment of the present invention, a method for determining the linearity of data points is provided comprising the steps of ascertaining an actual concentration (Y) of a sample of an analyte and measuring the sample to generate a result (X). The method further comprises the steps of computing a value of Z from the following formula:   Z  =                    ∑                  i          =          1                N            ⁢                        X          i          2                ⁡                  (                                    X              i                        -                          X              _                                )                            2      ⁢                        ∑                      i            =            1                    N                ⁢                              X            i                    ⁡                      (                                          X                i                            -                              X                _                                      )                              
and computing a new variable (Xxe2x88x92Z)2 from each value of X. The method still further comprises the steps of regressing X and (Xxe2x88x92Z)2 against Y to generate coefficients having t-values and evaluating the t values of the coefficients of X and (Xxe2x88x92Z)2, to determine if the linear term exceeds a threshold value to determine if nonlinearity exists.
In another advantageous embodiment a method for determining the linearity of data points is provided comprising the steps of ascertaining an actual concentration (Y) of a sample of an analyte and measuring the sample to generate a result (X). The method further comprises the steps of computing Z from the following formula:   Z  =                    ∑                  i          =          1                N            ⁢                        X          i          2                ⁡                  (                                    X              i                        -                          X              _                                )                            2      ⁢                        ∑                      i            =            1                    N                ⁢                              X            i                    ⁡                      (                                          X                i                            -                              X                _                                      )                              
and computing a new variable (Xxe2x88x92Z)2 from each value of X. The method still further comprises the steps of regressing X and (Xxe2x88x92Z)2 against Y as a multiple regression analysis utilizing an Inverse Least Squares algorithm to generate coefficients having t-values and evaluating the t values of the coefficients of X and (Xxe2x88x92Z)2, to determine if the linear term exceeds a threshold value to determine if nonlinearity exists.
In still another advantageous embodiment a method for determining the linearity of data points is provided comprising the steps of determining an actual concentration of a sample of an analyte and generating concentration data and testing the sample and generating test data. The method further comprises the steps of comparing the concentration data to the test data to generate related data, fitting a non-linear function to the related data, and fitting a straight line to the related data. Finally, the method further comprises the step of determining whether the straight line fits the related data as well as the non-linear function and concluding that the related data is linear when the straight line fits the related data as well as the non-linear function.
The invention and its particular features and advantages will become more apparent from the following detailed description considered with reference to the accompanying drawings.