Data representative of a plurality of complex samples is generated by modern instruments for use in a wide variety of quantitative and qualitative data analyses. Often, at least two goals can be identified for such data analysis: (1) comparing one or more samples to a standard having a known, or approved, composition so as to classify each sample; and with regard to a sample that has been classified, (2) providing an accurate identification of the component(s) in a sample that caused such sample to be classified as a differentiated, or anomalous complex sample.
To accomplish these goals, modem pattern recognition techniques are sometimes used to interpret the data. The purpose of such pattern recognition is usually to aid in classification of the sample (e.g., Is the sample of acceptable quality? Is the sample consistent with a previous run?) The advent of pattern recognition software has simplified methods development and automated the routine use of robust pattern matching in chromatography and similar analytical methods.
The field of study which encompasses this type of pattern recognition technology is called chemometrics. For example, a mass spectrogram or a chromatogram can be thought of as a data matrix representative of a "chemical fingerprint" wherein a pattern can emerge from the relative intensities of the sequence of peaks in the data matrix. Chromatographic fingerprinting, whether interpreted by human intervention or automated pattern recognition in software, has been used to infer a property of interest (typically adherence to a performance standard); or to classify the sample into one of several categories (good versus bad, Type A versus Type B, etc.).
Some examples of the use of chemometrics to problems in chromatographic pattern recognition, with applications drawn from different industries are as follows: In the food and beverage industry, sensory evaluation is sometimes coupled with instrumented analysis to classify samples according to geographical/varietal origin, for competitor evaluation, for determining a change in process or raw material or similar constituents, and in general for quality control and classification. In the medical and clinical industries, improved data analysis is required for identification of microbial species by evaluation of cell wall material, cancer profiling and classification, and for predicting disease states. For example, a prime concern of clinical diagnosis is to classify disorders rapidly and accurately and techniques have been applied to chromatographic data to develop models allowing clinicians to distinguish among disease states based on the patterns in body fluids or cellular material. In the field of environmental monitoring, improved data systems are now required for the evaluation of trace organics and pollutants, for performing pollution monitoring where multiple sources are present; and for effective extraction of information from large environmental databases.
Furthermore, instrumentation for carrying out gas chromatographic and mass spectrometric analyses are well known in the art for identifying one or more specific chemical components of a sample mixture. For example, chromatography is a method of analyzing a sample comprised of several components to qualitatively determine the identity of the sample components as well as quantitatively determine the concentration of the components.
Some of the above-described approaches have been successful in achieving an accurate comparison of a plurality of samples to a standard having a known, or approved, composition so as to classify each sample; others of the above-described approaches have been successful in identifying a specific chemical component of a sample mixture. However, none of the above-described approaches have been completely successful achieving both of the above-described data analysis goals in the one integrated methodology, namely, the integration of: a comparison of a plurality of samples to a standard having a known, or approved composition so as to classify each sample; and providing an accurate identification of the component(s) present in a classified sample that caused the sample to be classified as anomalous.
Accordingly, there is a need for an integrated method for achieving not only classification of a plurality of complex samples, but also for providing an accurate identification of the component(s) present in a sample that caused that sample to be classified as anomalous.