The identification and quantification of chemical entities is largely the domain of analytical chemistry. Both the identification and quantification tasks are made easier with the use of multi-element analytical instrumentation since more analytical information is available to aid the analysis. Examples of contemporary analytical instrumentation capable of producing multi-element (vector) data include multiwavelength infrared and Raman spectrometers, mass spectrometers, nuclear magnetic resonance (NMR) spectrometers, and chromatographic separation-detection systems. Conveniently, as these techniques became more prevalent in the analytical laboratory, computational power also became more affordable and available, and analysts were quick to recognize that computer-aided methods could dramatically speed up the identification and quantification tasks.
In the computer-aided identification task, which is the focus of this patent, the analytical data is submitted to a system (the search appliance) which scours a library of known materials looking for similarities in the instrument response of the unknown material to the stored responses for known materials. Typically, the search appliance returns to the user a list of materials in the library along with their associated similarity to the submitted data. This entire process is usually termed “spectral library searching”. The vast majority of proposed similarity measures cannot be interpreted absolutely, but the relative similarity of the measured data to the various library records is deemed meaningful for ranking purposes. This is akin to today's web search utilities that return to the user a list of sites, ordered by a similarity measure of site-to-query. As with web search utilities, the critical differentiation among competing methods is usually the definition of the similarity measure.
The most common similarity measure in use today for spectral library searching is correlation based (see S. R. Lowry, “Automated Spectral Searching In Infrared, Raman And Near-Infrared Spectroscopy”, J. Wiley & Sons, pp. 1948-1961). This approach exploits a linear instrument response, assuming that a chemical species and its spectrum (InfraRed, Raman, mass spectrum, etc.) are immutably tied, and the vector orientation of the spectrum does not depend on the concentration of the species. Other well-known measures of similarly include Euclidean distance and least-squares methodologies (see S. R. Lowry, “Automated Spectral Searching In Infrared, Raman And Near-infrared Spectroscopy”, J. Wiley & Sons, pp. 1948-1961), which are equivalent to the correlation similarity within elementary scalar manipulations. These similarity measures are implemented in many commercial spectral library search software packages.
In web searching, there are minimal end-user consequences (other than wasted time and frustration) if a page is suggested that does not actually pertain to the query (a “false-positive”). However, many applications of spectral library searching are used to guide actions, such as how chemicals are to be treated in hazardous materials situations, so it is critical to know when an evidence-based decision can be made, and when it cannot. The correlation similarity measure does not suffice to guide actions, as we will illustrate by way example.
FIGS. 1a and 1b illustrate the challenges posed by spectral library search methods using non-absolute similarity measures such as correlation. In both FIGS. 1a and 1b, the measured material is in fact kerosene, a mixture of petroleum distillates in the C12 to C15 range, but due to different measurement conditions, it is apparent that the precision-states of the two measurements are quite different. In FIG. 1a, the measured kerosene is compared to a library record spectrum of kerosene, yielding a correlation similarity measure of 0.950. In FIG. 1b, the measured kersosene spectrum is compared to a library record spectrum of Japan Drier, a common solvent for painting (a mixture of lighter petroleum distillates), yielding a correlation similarity measure of 0.945. Recall that for any case at hand, the analyst needs to make one of the following judgments based on the similarity measure:
(i) the measured material is likely the top-ranked library material;
(ii) the measured material is likely one of several top-ranked library materials; or
(iii) the measured material is not any of the top-ranked materials (i.e., there is no library match).
FIGS. 1a and 1b illustrate the complication in such a decision based on the correlation similarity measure. The different precision states of the two measurements mean that even though the similarity measure is the same in the two cases, one is a valid match (i.e., FIG. 1a), while the other is an invalid match (i.e., FIG. 1b). A simple rule cannot be formulated based on correlation that allows one to reliably decide between judgments (i), (ii) and (iii) above. This is because the correlation similarity measure (and equivalently, least-squares or Euclidean distance measures) does not account for the precision state of the measurement, and therefore does not consistently reflect the amount of scientific evidence favoring a judgment.
Counter-intuitively, when the signal-to-noise ratio is poor, similarity measures in the art tend to more emphatically suggest that the measured material is not in the library; in reality, the evidence provided by the data in such a circumstance is weak—little can be said about whether the material is or is not in the library. Furthermore, when the signal-to-noise ratio increases, the similarity measure tends to increase for all records in the library, when the analyst knows intuitively that with higher quality data, it should be easier to distinguish one library component from another. Indeed, even in FIG. 1b with a very high correlation similarity measure, several obvious mismatched spectral features can be identified (indicated by the arrows in the figure).
What is needed, and to date lacking, in spectral library search algorithms, are similarity measures that are directly interpretable in terms of the scientific evidence supporting one library “hit” over another.