The present invention concerns pattern classification, notably in the field of the characterisation or the identification of odours and, more particularly, a pattern classification apparatus which selects, from a list, the appropriate classification rule dependent upon the application.
In odour characterisation or identification apparatus (so-called xe2x80x9celectronic nosesxe2x80x9d) a sample of an unknown product is identified by classifying it, according to a classification rule, in one of several classes which the apparatus recognises thanks to a preliminary learning operation during which the characteristics of the classes are determined.
Apparatus of this type comprises two main parts, a data acquisition portion and a data processing portion. In general, the data acquisition portion comprises a sampling head enabling the sampling of volatile components which are given off by the tested product, and a battery of sensors to which the sampled components are supplied and which produce output signals which, taken together, are characteristic of the volatile components given off by the product and, thus, are characteristic of the product. In general, the data processing portion comprises automatic data processing means which record the sensors"" output signals and perform analyses thereon. By detecting or analysing the patterns inherent in the different sets of data relating to particular products, the data processing portion detects the class of the product, which can correspond either to the nature of the product (for example, xe2x80x9ca winexe2x80x9d, xe2x80x9ca whiskyxe2x80x9d, etc.), or to a concentration or quality of the product, for example, xe2x80x9cfresh milkxe2x80x9d, xe2x80x9cfermented milkxe2x80x9d, etc.
The goals pursued in acquiring and in processing the data are different depending upon whether the apparatus is in learning mode or in identification mode.
In the learning mode of the apparatus, the products from which the various samples are taken are known and are analysed several times. The processing methods which are used at this time are those which enable, on the one hand, the detection and characterisation of the different products and, on the other hand, the construction of a robust identification model. In general, an identification model consists of two elements, a method of extracting information from the sensors"" output signals and an identification rule enabling data obtained for a sample of unknown nature to be classified into a particular class. The information-extraction methods used are, generally, statistical methods, notably a Principal Component Analysis (PCA) or Discriminant Factor Analysis (DFA) (see for example, J. W. Gardner and P. N. Bartlett, xe2x80x9cSensors and sensory systems for an Electronic Nosexe2x80x9d, Nato Asi, Series E, Vol. 212, pp.161-180 (1992)). In the identification phase, the nature of the samples is unknown and the goal is to identify this nature by using the data, and the model which has already been constructed.
Typically, apparatuses of this kind implement a fixed type of analysis of the data obtained during the learning phase so as to produce certain definitions of the classes and, during the identification phase, they apply a fixed rule to allocate the samples of unknown products into the appropriate one of the defined classes.
The present invention relates to a pattern classification apparatus, notably for use in the field of odour recognition, in which there is a choice between several different allocation rules and/or between several different information-extraction methods, so as to enable the identification model to be adapted to the intended application.
More particularly, the present invention provides a classification apparatus, notably for use in the recognition or the characterisation of odours, comprising: means for acquiring, by a plurality of channels, during a learning phase, raw data representing a plurality of instances of a plurality of different classes, it being known to which respective classes the instances analysed during the learning phase belong; a processing unit for processing the data provided by the data acquisition means so as to determine, during the learning phase, an identification model defining the different classes and, during an identification phase, to classify, using a particular rule, an instance of unknown class into one of the classes defined during the learning phase; characterised in that the processing means is adapted to decide which of a plurality of possible identification models, the characteristics of which are stored in a register, should be applied.
By choosing the allocation rule and/or the information-extraction method dynamically, dependent upon the intended application, the apparatus of the present invention reaches a rate of recognition of unknown products which is improved compared with the conventional apparatuses.
It is preferable to select in a dynamic fashion both the allocation rule and the method used for extracting the information from the measured raw data. Both of these elements form part of the identification model established during the learning phase of the apparatus. Preferably, the raw data processing unit uses, as information-extraction method, statistical processing such as a principal component analysis or a discriminant factor analysis. In general, these methods serve to determine respective sub-spaces in which the different classes are well-discriminated from each other.
Preferably, several different identification models are determined on the basis of only a percentage of the data obtained during the learning phase. The identification model which will be chosen is the one which gives the best recognition rate when the other data obtained during the learning phase is submitted thereto.
Among the different possible allocation rules there is a rule which defines the centre of gravity of the points representing the instances of a class and which allocates an unknown sample to the class for which the distance between the point representing the unknown instance and the centre of gravity is a minimum. There is also an allocation rule according to which the boundary delimiting the points representing the instances of one class from the points representing the instances of the other classes is determined, and an unknown instance is allocated to the class for which the expansion of the boundary needed in order to encompass the point corresponding to this unknown instance is a minimum. A third allocation rule consists in the establishment of a probability distribution, based on a weighted sum of Gaussian functions representing the instances of the different classes, and an instance of an unknown class is allocated to one of the classes based on the positioning of the point which represents it relative to this probability distribution.
In certain applications it can prove useful, or indeed necessary, to eliminate from the analyses the raw data relating to channels which do not contribute to differentiating the instances of different classes. It can also prove necessary to eliminate from the calculations the data relating to abnormal instances of a class, that is, instances which, in terms of synthetic variables, are far removed from the other instances of the same class. This increases the reliability of the identification model established during the learning phase. In the same way, the data obtained during the learning phase, relating to abnormal instances of a class, can be deleted.