There are a number of applications in which a widely dispersed array of laboratory instruments, sometimes of different kinds, are used to perform the same quantitative tests on unknown materials. In theory, a given unknown sample should produce the same results when analyzed on any system. In order to keep such systems in calibration, they are typically operated to produce calibration control data periodically, such as once per day.
Quality monitoring systems have been available for collecting the control data (calibration data on identical control samples) from a large number of instruments, and processing it to rate the performance of each instrument with respect to the performance of the group as a whole. This procedure is employed to a large extent in laboratories which analyze test samples for medical purposes. Hospitals collect large quantities of samples of various types and direct them to the hospital laboratory where they are automatically analyzed in special purpose laboratory instruments. For example, such measurements are made on triglycerides, glucose, cholesterol, total protein, and liver enzymes, to name just a few. At least once per day, and sometimes once per shift, a set of known control samples taken from a lot obtained from a central standard agency, is processed in the instrument and a set of control data measurements obtained. Usually, the standards are at 2 or at 3 points, a typical example utilizing 3 points for low, medium and high values of, for example, triglycerides. The control readings are stored and later dispatched to an organization which collects like data from a large number (preferably all) laboratories making the same tests and using the same control materials. The data is then processed utilizing certain statistical techniques to determine performance ratings for the laboratories. Typically the data from all of the laboratories is averaged or otherwise combined to provide a peer group mean, and the control data from each of the laboratories is evaluated against the peer group mean to determine the performance for that laboratory. But, considering that there might be three control levels for each laboratory, and each period may encompass multiple days (or shifts) of operations, it is not a simple matter to provide an easily understandable and reliable rating to advise a laboratory of how it is performing with respect to its peers.
An additional problem with such techniques is the quality of the peer group readings are sometimes compromised by utilizing all of the control data from all of the participating laboratories in setting the standard for the peer group determination. As a result, the standard might be skewed by including a number of badly performing laboratories in setting it, and the standard deviation (SD) from the average may be widened, giving a false sense of "goodness" to a marginally performing laboratory.
Attempts are sometimes made to minimize the effect of outliers (data points significantly different than the majority of readings), particularly in the computation of target values for the peer group. However, this is typically attempted by manual editing of the data intended to identify and remove from consideration readings containing outliers. This not only introduces an aspect of subjective variability into the overall process, but also introduces an element of effort and further delay in the rating process.
In many cases, hospitals and laboratories will edit their data before transmitting it to the agency which produces the ratings, further compromising the integrity of the overall system. In some cases, certain hospitals will run a larger number of control samples than others, and the former group of hospitals will, in some systems, be rewarded with an undeserved weight being given to their control readings as opposed to those of hospitals which take fewer control readings.
A further shortcoming of rating systems used in the past is timeliness of reporting. The data collection and editing process can take an inordinate amount of time, particularly in view of the fact that at least some of the data which must be reported is reported manually, and most of the data requires manual editing (as pointed out above). The enormity of the quantity of data can be better appreciated when one recognizes that data is being collected and ratings made for literally dozens of tests, each defining its own peer group. The final ratings for any test cannot be determined until all of the data for that test is obtained and input. This typically results in a situation where ratings, when they are distributed to the laboratories being rated, are based on data which can be on the order of six weeks old. It would be useful for laboratories, particularly those performing medical tests, to have a more current indication of their performance with respect to their peers, so that if a laboratory is performing out of control, that fact can be brought to light at the earliest possible moment.