An application of this invention to spectroscopic data involves its use in Raman spectroscopy.
Raman spectroscopy has historically been used to obtain vibrational spectroscopic data from a large number of chemical systems. Its versatility, due to ease of sampling via coupling to fibre optics and microscopes, allied to the ability to sample through glass or plastic, has made it a very practical technique for use by law enforcement agencies in the detection of illicit materials. It also has the highly desirable properties of being non-invasive, non-destructive and very often highly selective. The analytical applications of Raman Spectroscopy continue to grow and typical applications are in structure determination, multi-component qualitative analysis and quantitative analysis.
The Raman spectrum of an analyte (i.e. a substance or chemical constituent that is undergoing analysis) may be compared against reference spectra of known substances to identify the presence of the analyte. For more complex (or poorly resolved) spectra, the process of identification is more difficult. The current norm is to develop test sets of known samples and use chemometric methods such as Principal Component Analysis (PCA) and multivariate regression to produce statistical models to classify and/or quantify the analyte from the spectroscopic data. These statistical based models are however, limited in performance for complex systems that have poorly resolved peaks and/or comprise complex mixtures.
Recent advances in machine learning have led to new techniques capable of outperforming these chemometric methods. Machine Learning techniques are more robust and therefore more capable of overcoming the above-mentioned problems. These techniques have been successfully employed in the past to identify and quantify compounds based on other forms of spectroscopic data, such as the use of artificial neural networks (ANNs) to identify bacteria from their IR Spectra and the use of ANNs to classify plant extracts from their mass spectra. A more recent machine learning method is the kernel-based Support Vector Machine (SVM), a powerful classification and regression tool, which is also suited to handling the problems encountered in the spectral analysis of complex mixtures.
There are very few machine learning packages on the market that are specifically dedicated to analysing spectra. Gmax-bio (Aber Genomic Computing) is an application that is designed for use in many scientific areas including spectroscopy. Gmax-bio uses genetic programming to evolve solutions to problems and is claimed by its developers to outperform most other machine learning techniques. However, due to its diverse problem applicability, the user requires some prior knowledge of both genetic programming and spectroscopy. Neurodeveloper (Synthon GmBH), an application designed specifically for spectral analysis, uses standard chemometric tools, pre-processing techniques and also uses ANNs for the deconvolution of spectra.
There follows a discussion of prior art related to the present invention. U.S. Pat. No. 5,649,068 describes the Support Vector Machine. The SVM maps a set of training patterns to a kernel-defined space (a linear function corresponds to the original input space) and finds a linear decision surface in this space that realizes the maximum margin of separation between the two classes of samples in the training set. The decision surface is defined by a set of weights, each of which is associated with a training pattern, and the goal of the SVM training process is to find the set of weights that results in optimum separation of the two classes of data. During the SVM training process, training patterns or kernel-transformed training patterns that are unused to determine the decision function, are identified and removed to allow the training process to continue with the remaining training patterns. Those training patterns that remain in the final decision function are known as support vectors. The use of a relatively small set of support vectors (compared to training set size) results in a more computationally efficient method than previous maximum margin separation methods. A pre-cursor to the SVM patent is the Soft Margin Classifier (U.S. Pat. No. 5,640,492), which incorporates the use of slack variables that allow erroneous or difficult training patterns to be taken into account in the determination of the optimal hyperplane. As the SVM is a kernel method, i.e. is based on an underlying kernel function, the SVM invention can be used in combination with the WS Kernel invention described here. The WS Kernel improves the performance of SVMs in the classification of materials or chemicals and the quantification of properties of materials or chemicals based on spectroscopic data. Another SVM related system is described in U.S. Pat. No. 6,427,141, which discloses a system for enhancing knowledge discovery using multiple support vector machines.
Another example of a kernel method that can be used in conjunction with the WS Kernel invention is the k-Nearest Neighbour classification/regression technique. This well-known technique has been used in many previous patents, including: U.S. Pat. No. 6,592,033 (Item recognition method and apparatus), U.S. Pat. No. 6,011,990 (Method and device for evaluating an EEG carried out in the context of anaesthesia or intensive care) and U.S. Pat. No. 6,198,846 (Character recognition system).
A number of patents have been published that disclose techniques, designed specifically for spectral analysis. As mentioned previously, PCA is widely used in the field of chemometrics; U.S. Pat. Nos. 6,675,137 and 5,822,219 disclose the use of PCA for spectral analysis. The use of other chemometric techniques, Partial Least Squares, classical least squares techniques, and hybrids of these, have been disclosed in U.S. Pat. Nos. 6,415,233, 6,711,503 and 6,096,533. Other approaches are based on the use of spectral pre-processing techniques, such as those described in U.S. Pat. No. 4,783,754, U.S. Pat. No. 5,311,445, U.S. Pat. No. 5,435,309, U.S. Pat. No. 5,652,653, U.S. Pat. No. 6,683,455 and U.S. Pat. No. 6,754,543. There is a need to provide a prediction method that is robust to noise, removing the need for such spectral pre-processing techniques.
Patents have also been published that disclose systems designed for the detection or quantification of a specific category of analytes, e.g. U.S. Pat. No. 6,762,060, which describes an apparatus and set of methods for monitoring the concentration of hazardous airborne substances, such as lead.
In the field of machine learning, the ANN is the most popular technique used for the spectral analysis, e.g. U.S. Pat. Nos. 5,631,469, 5,553,616 5,660,181 (which uses PCA in combination with PCA) 5,900,634, 5,218,529, 6,135,965 and 6,477,516. A limitation of existing techniques based on the ANN is that they produce predictions that are not particularly amenable to interpretation, due to the ‘black box’ nature of the ANN technique. This is in stark contrast to the situation in which a domain expert (e.g. analytical chemist or forensic scientist) manually inspects spectra and classifies them based on the position and size peaks. Naturally, this approach is not feasible for the scenarios targeted by this invention, the analysis of complex mixtures in particular. As such, domain experts using machine learning methods are typically at a disadvantage in that they are provided with no insight into the classification or quantification models used, or the data under analysis.
Another method for classifying spectra is disclosed in U.S. Pat. No. 6,421,553. This system uses a k-NN style classification technique and is based on the distance of an unknown sample from set of training samples (of known condition). The unknown sample is classified based on a distance relationship with at least two samples, provided that at least one distance is less than a predetermined maximum distance.
In addition to the ANN, there are many machine learning techniques that could be used for the analysis of spectra, such as those for classification (Decision Trees, Naïve Bayes, etc.) and techniques for regression (Model Trees, Least Median Squares, etc.). Kernel-based learning algorithms (such as the SVM for classification and for regression) present a unified approach to all the problems of classification, regression, clustering and can also be used in database search applications. The SVM is a machine learning technique that is suited to handling noisy data and is also suited to the high dimensionality that is characteristic of spectral data. However, as with the ANN, they are typically deployed in a way that does not provide experts with added insight.
Furthermore, kernel methods have not been tailored specifically for spectral data, as has been done in other application domains, e.g. the string kernel for text classification. A key component of all kernel methods is the kernel, which acts as a similarity measure for comparing two objects of a dataset that is being used to build a prediction model. In the problem domain of this invention, the kernel compares two sample spectra and returns a value that indicates how similar they are; the higher this value, the more similar they are. The way in which this similarity measure or kernel is used to build a prediction model differs between kernel methods. For example, k-NN uses the kernel to derive a distance metric, i.e. to measure the distance between two data samples. When applied to spectral data, a conventional kernel considers each spectral point in isolation, i.e. for each spectral point where a calculation is performed, e.g. the dot-product calculation of the Linear kernel or the Euclidean distance calculation of the RBF kernel, the kernel operates on just that spectral point in the two samples being compared.