The present invention concerns pattern classification, notably in the field of the characterisation or the identification of odours.
In odour characterisation or identification apparatus (so-called xe2x80x9celectronic nosesxe2x80x9d) a sample of an unknown product is identified by classifying it in one of several classes which the apparatus recognises thanks to a preliminary learning operation. Apparatus of this type comprises two main parts, a data acquisition portion and a data processing portion. In general, the data acquisition portion comprises a sampling head enabling the sampling of volatile components which are given off by the tested product, and a battery of sensors to which the sampled components are supplied and which produce output signals which, taken together, are characteristic of the volatile components given off by the product and, thus, are characteristic of the product. In general, the data processing portion comprises automatic data processing means which record the sensors"" output signals and perform analyses thereon. By detecting or analysing the patterns inherent in the different sets of data relating to particular products, the data processing portion detects the class of the product (which can correspond to the nature of the product, for example, xe2x80x9ca winexe2x80x9d, xe2x80x9ca whiskyxe2x80x9d, etc., or to a quality of a product, for example, xe2x80x9cfresh milkxe2x80x9d, xe2x80x9cfermented milkxe2x80x9d, etc.).
The goals pursued in acquiring and in processing the data are different depending upon whether the apparatus is in learning mode or in identification mode.
In the learning mode of the apparatus, the products from which the various samples are taken are known and are analysed several times. The processing methods which are used at this time are those which enable, on the one hand, the detection and characterisation of the different products and, on the other hand, the construction of a robust identification model. The methods used are generally statistical methods (Principal Component Analysis (PCA), Discriminant Factor Analysis (DFA), . . . ) and neural networks (see, for example, J. W. Gardner and P. N. Bartlett, xe2x80x9cSensors and sensory systems for an Electronic Nosexe2x80x9d, Nato Asi, Series E, Vol. 212, pp.161-180 (1992)). In the identification phase, the origin of the samples is unknown and the goal is to identify this origin by using the data, and the model which has already been constructed.
The statistical methods used in processing the data recorded by an electronic nose are linear methods. The variables calculated according to such methods are so-called xe2x80x9csynthetic variablesxe2x80x9d and convey two items of information, quantitative information, indicating the percentage of inertia in the initial cloud of points that is explained by this variable, and qualitative information regarding the structure and discrimination of the cloud which it discloses. By way of contrast, neural networks are among non-linear methods and are capable of learning the behaviour of the different sensors. The two types of methods directly employ the recorded raw data. In general, they are used separately.
Now, it is recalled that the expression xe2x80x9csynthetic variablesxe2x80x9d refers to combinations (whether or not linear) of the measured variables. In general they are obtained by optimisation of a precisely defined criterion. The environment of this optimisation process can be conditioned by additional knowledge or variables. For example, in PCA a criterion is optimised without any additional information whereas, in DFA, there is available knowledge regarding each sample""s affiliation to a group, and in PLS (xe2x80x9cpartial least squarexe2x80x9d) one or several additional variables are measured for each sample (for example, the concentration of a particular chemical element).
The document WO96/26492 describes a particular neural network structure capable, amongst other things, of being used in electronic noses. This neural network comprises, in the output layer thereof, fuzzy logic neurones that enable it to be indicated that the tested product belongs to an unknown class. In this document, it is also proposed to increase the number of neurones in the input layer of the neural network so as to input not only the sensor output signals but also signals representing variables calculated in a Principal Component Analysis of the original signals, for example, the two or three first components of the PCA.
However, by adding wholesale, to the measured signals representing the data, signals resulting from a Principal Component Analysis of the data, in order to constitute the input signals to the neural network, WO96/26492 follows the approach of conventional techniques which superpose statistical methods and neural networks. Moreover, even though the variables obtained by a principal component analysis convey substantial quantitative information, their contribution in terms of qualitative information is not necessarily great, particularly when a large number of distinct products is being handled. Thus, the systematic use of a certain number of predetermined variables derived from a statistical analysis does not automatically ensure the discrimination of the different products. Furthermore, these variables are correlated with certain of the initial variables (channels) and, consequently, their use as input signals to the neural network leads to redundancy.
The present invention relates to a pattern classification apparatus notably for use in the field of odour recognition, in which the statistical methods and the neural techniques are intertwined so as to exploit to the greatest possible extent the synergy between them, and to a method for optimising such an apparatus during the learning phase thereof.
More particularly, the present invention provides a classification apparatus, notably for use in the recognition or the characterisation of odours, comprising: means for acquiring, during a learning phase and by a plurality of channels, raw data representing a plurality of instances of a plurality of different classes; a processing unit for processing said raw data so as to determine, for each class (j), characteristics common to different instances of the class; and a neural network for outputting a classification, into one of the classes, of an instance presented to said classification apparatus, said neural network comprising an input layer, an output layer and at least one intermediate layer disposed between the input and output layers, the output layer comprising a plurality of neurones each being adapted to recognise instances of a respective class from the plurality of classes: characterised in that the processing unit determines for each class (j) a sub-space (SEj) in which the instances of this class (j), for which raw data has been obtained, are optimally separated from the instances of all the other classes, said sub-space (SEj) being defined by a group (VDj) of one or a plurality of synthetic variables (V), and determining the discrimination sub-space (SED) which comprises the sub-spaces of all of the classes, said sub-space (SED) being defined by the plurality of variables (VD) comprised in the sets (VDj) of synthetic variables of the plurality of classes: and that the input layer of the neural network comprises a plurality of neurones each corresponding to a respective variable (V) of the plurality of synthetic variables which define the discrimination sub-space (SED).
The present invention also provides a learning method of a classification apparatus comprising means for acquiring, by a plurality of channels, raw data representing instances of a plurality of different classes; a unit processing said raw data and a neural network comprising an input layer, an output layer and at least one intermediate layer disposed between the input and output layers, the output layer comprising a plurality of neurones each being adapted to recognise instances of a respective class from the plurality of classes, this apparatus being notably for use in the recognition or the characterisation of odours, the method comprising the steps of: applying to the raw data acquisition means, during a learning phase, a plurality of instances of the plurality of classes, the classes to which the instances belong being known; and processing said raw data, by the data processing unit, so as to determine for each class (j) characteristics common to the different instances of the class; characterised in that the processing step comprises the determination, by the data processing unit, for each class (j), of a sub-space (SEj) in which the instances of this class (j), for which raw data has been obtained, are optimally separated from the instances of all the other classes, said sub-space (SEj) being defined by a set (VDj) of one or a plurality of synthetic variables (V), and determining the discrimination sub-space (SED) which comprises the sub-spaces (SEj) of all the classes, said sub-space (SED) being defined by the plurality of variables (VD) comprised in the groups (VDj) of synthetic variables of the plurality of classes; and in that the neural network is provided with an input layer comprising a plurality of neurones for receiving, respectively, as input a value of a respective variable of the plurality of synthetic variables (VD) defining the discrimination sub-space (SED).
The apparatus and the method according to the present invention use the intertwining of the results of a statistical analysis and a neural network so as to draw maximum benefit from the advantages provided by each of these. More particularly, by choosing for input into the neural network the synthetic variables which best discriminate the different classes, there is a significant increase in the contribution in terms of qualitative information, which leads to a reduction in the duration of the learning phase and an increase in the speed of identification of samples during the identification phase.
The choice of synthetic variables which best discriminate the different classes can be made manually or using an automated search. A preferred automated search method consists in the determination, in each of a plurality of different sub-spaces corresponding to combinations of respective different variables, of a region encompassing the points representing the different instances of this class, and the determination of the sub-space in which the determined region both is the furthest from the regions representing the other classes and has the smallest possible size.
In a preferred embodiment of the invention, the neural network comprises, as intermediate layer, a plurality of individual neural networks each corresponding to a respective class of the plurality of classes. By using this structure of the neural network, it is possible to reduce the duration of the learning phase to a significant extent, which can be of extreme importance when classification of instances of a large number of classes is involved. The training of such a neural network involves the separate training of the individual neural networks.
In certain applications it can be useful, or necessary, to eliminate from the analysis raw data corresponding to channels which do not contribute to the differentiation of instances of different classes. It can also be necessary to eliminate from the calculations the data concerning abnormal instances of a class, that is, the instances which, in terms of synthetic variables, are far away from other instances of the same class. This improves the reliability of the identification model established during the learning phase.
Preferably, the neural networks of the invention are supervised networks the neurones of which apply sigmoid functions. Preferably, the neural networks according to the invention are trained using the back propagation algorithm.
The raw data processing unit applies statistical processing such as a principal component analysis, a discriminant factor analysis, or other similar methods.
So as to check the reliability of the identification model established during the learning phase of the apparatus, it is preferable to determine the sub-spaces (SEj), the sets of synthetic variables and the discrimination sub-space (SED) based on raw data relating to only a certain percentage of the data acquired during the learning phase. The other data is applied to the processing unit so as to calculate values corresponding to the synthetic variables, these values being applied to the neural network and the classifications performed by the neural network being compared with the true classes to which the instances involved belong. If the class identification rate exceeds a threshold value, for example, 80%, then the training of the apparatus is ended and the apparatus can be used for identifying instances of unknown class.