Clinical laboratories perform tests for doctors and healthcare professionals. The laboratories perform tests on human blood, urine, plasma, serum or other body fluids in order to measure chemical or physical properties of the specimens. The results of these tests are used by doctors and healthcare professionals to make clinical decisions related to patient care and treatment. Because results are used to make clinical decisions for patient care, dependable test results are of the utmost importance.
Clinical laboratories purchase supplies and products in order to perform these tests. For example, blood collection tubes, needles, diagnostic instruments, chemical reagents and other supplies are used during testing, and therefore must be periodically replenished. From time to time, some element of a testing procedure may change for a variety of reasons. For example, a new blood collection tube type may replace an older version, new blood collection tubes may include a new additive, or a new blood collection tube could be made of plastic rather than glass. Chemical reagents may be ordered from a different supplier, or even a new batch of reagents could be considered a change in the testing procedure. Furthermore, the diagnostic instruments used to perform the testing themselves may change. Newer models may replace older testing equipment. Also, hardware, software and firmware updates may be applied to the equipment.
Of course, the above-described list of variables in testing procedures is merely exemplary, and the list of possible variables is endless. It is important to recognize, however, that any change in testing procedure can potentially affect test results. Therefore, because the accuracy of test results is so important, there is a need for a way to gather and analyze empirical data to show that the testing procedure using the new method, device or system does not significantly affect the testing results.
There is certain degree of variability in any testing procedure. By analyzing test data, it is possible to measure the variability in test results. In addition, a new test procedure or method may give results that are on the average different from a “reference” test procedure. This average difference is called bias. If the bias between a new test procedure and a reference method is small enough, and the variability in the results using the new procedure is no greater than the variability of the old test procedure, the new test procedure can be considered clinically equivalent to the old test procedure. There is presently specialty software on the market for evaluating and validating testing methods. However, the existing software products fall short in several respects.
Currently, most if not all clinical laboratories rely on a statistical technique called linear regression to compare testing methods, systems or products. The linear regression analysis is almost always accompanied by a graphical representation called a scatter diagram. In a scatter diagram, the results from one method, system or product are plotted against the results from the “reference” method or system on a chart and linear regression analysis is used to determine a best-fit line on the chart to represent the data points. A perfect result on a scatter diagram would be a line having a slope of one and a vertical axis intercept of zero. Unfortunately, the degree to which the best-fit line fits the observed data depends on the number and frequency distribution of data values used. Therefore, the quality of the best-fit line for its accuracy and usefulness may be manipulated by selecting individuals at either end of some analytic spectrum and including their results in the data. Thus, while scatter diagrams and linear regressions may be helpful in determining the similarity of results between a reference and evaluation method, system or product, they are not sufficient.
A commonly used quantity calculated by existing software packages is called R.sup.2, sometimes referred to as the coefficient of determination. R.sup.2 can have a value between 0 and 1, and represents the degree to which a straight line fits the data, relative to the total variability observed. A value of 1 indicates that all the points fit exactly on the same line. Often, R.sup.2 is seen as a measure of equivalence between the reference and evaluation methods, systems or products. Unfortunately, R.sup.2 is susceptible to a priori manipulation. For example, suppose two tests designed to measure cholesterol values in human blood are to be compared. Some patients may have very high cholesterol values while others may have very low cholesterol values. If two methods for measuring cholesterol are being compared using a linear regression best-fit line, then a high value of R.sup.2 may be falsely interpreted as indicating equivalence of the two methods. In fact, the high value of R.sup.2 is may only be indicating that there are patients included in the study whose cholesterol values are at the high and low ends of the human spectrum. Because R.sup.2 is susceptible to manipulation, it is not a good quantity to be depended upon for measuring the clinical equivalence of a new test method.
Still another disadvantage of current test validation methods, is that they typically validate only a single test method at a time. Thus, for a testing device which is capable of testing 30 separate analytes, using previous testing validation methods 30 separate validations will have to be performed. Accordingly, it would be advantageous to have a single software package which could validate all 30 testing methods at one time.
Therefore, there is a need for a test method validation system which reliably measures the accuracy and precision of a new testing method, determines whether the new testing method is clinically equivalent to a previous testing method, and is capable of validating a plurality of test methods at one time.