Spectral analysis is widely used in identifying and quantitating analytes in a sample of a material. One form of spectral analysis measures the amount of electromagnetic radiation which is absorbed by a sample. For example, an infrared spectrophotometer directs a beam of infrared radiation towards a sample, and then measures the amount of radiation absorbed by the sample over a range of infrared wavelengths. An absorbance spectrum may then be plotted which depicts sample absorbance as a function of wavelength. The shape of the absorbance spectrum, including relative magnitudes and wavelengths of peak absorbances, serves as a characteristic `fingerprint` of particular analytes in the sample.
The absorbance spectrum may furnish information useful in identifying analytes present in a sample. In addition, the absorbance spectrum can also be of use for quantitative analysis of the concentration of individual analytes in the sample. In many instances, the absorbance of an analyte in a sample is approximately proportional to the concentration of the analyte in the sample. In those cases where an absorbance spectrum represents the absorbance of a single analyte in a sample, the concentration of the analyte may be determined by comparing the absorbance of the sample to the absorbance of a reference sample at the same wavelengths, where the reference sample contains a known concentration of the analyte.
One fundamental goal of a near-infrared spectroscopic method for biological fluid analyte concentration measurements such as blood glucose levels is to collect high quality data. Although great care may be taken to ensure reliable measurements by consistent sample preparation and data acquisition, data generated by instrumentation and clinical reference testing, like all data, are susceptible to the inclusion of errors from a number of sources. In large sets of data, it is not uncommon to have a number of measurements that are extremely deviant from the expected distribution of measurements, commonly referred to as outliers. Whether outliers result from statistical errors or systematic errors, outlier detection identifies samples containing such errors with sufficient confidence that such samples can be considered unique with respect to the sampled population. Inclusion of a small number of outliers within a set of measurements can degrade or destroy a calibration model that would otherwise be obtained by the measurements.
Referring to the method and apparatus of the present invention, there are at least four potential sources of error in the chemometric analysis for biological fluid analyte measurements such as measurements of blood glucose levels.
A first source of error is related to sample preparation. Blood serum samples require a great deal of preparation before chemometric analysis. During this preparation, a number of factors can affect the sample. For example, the amount of time that blood samples are allowed to clot may affect sample continuity in terms of fibrinogen content. The level of clotting also impacts the quality of centrifugation and ultimately the decanting of serum from cells. Samples prepared for clinical assays determine the quality of the data used for reference and calibration, so that great care must be exercised with the samples since this data will ultimately define the limit of prediction abilities.
A second source of error may result from the spectral measurement process. For example, the use of a flowcell for sample containment during data acquisition is susceptible to problems such as bubbles in the optical path as well as dilution effects from reference saline solution carryover. These dilution effects are usually negligible, but bubbles in the optical path are not infrequent and have a severe impact on data quality. In addition, errors produced by mechanical or electronic problems occurring within the analysis instrumentation can have important effects on data quality.
A third source of error is also related to the reference tests. Errors due to out-of-specification instrumental controls and low sample volume during clinical assays have similar effects to errors related to sample preparation, described above.
A fourth source of error, and probably the most difficult to identify and control, relates to sources of the samples, that is, to the individuals providing the biological fluids. A sample taken from an individual may at first seem to be quite unique with respect to a previously sampled population, but may in fact be an ordinary sample when a larger sample population is considered, that is, a putative unique sample may be only an artifact of undersampling.
All of these errors, alone or in combination, can lead to a calculated value of biological fluid analyte concentration that is at great variance with respect to measurements from samples taken from the same individual at approximately the same time. These extremely deviant values, which can be orders of magnitude greater or less than a predicted mean value, are outliers that should be identified prior to constructing a model for predicting biological fluid analyte concentrations.
The removal of outliers from a data set can be accomplished in a qualitative and subjective sense by graphical inspection of plotted data in those cases when the dimensionality is low, that is, where the number of data points associated with each measurement is small. In those instances where the number of data points associated with each measurement is large, however, outlier detection may be more quickly and efficiently be accomplished by a number of automatable procedures such as residual analysis. However, such procedures are often subject to a number of errors, or at least subject to errors in interpretation, especially in the relatively high dimensional spaces that are typically associated with multifactorial chemometric analyses.