A shape recognition system conventionally comprises two main processing modules. The nature of the inputs for the system may vary widely: data represented in the form of vectors, time signals, images, videos. There may be data obtained directly from sensors or that has already undergone processing operations, such as a filtering or a merging of data for example.
The first module performs a preprocessing and codes the inputs of the system. The desired aim is to present the information contained in the inputs as explicitly as possible. The second processing module performs the actual recognition by using the code generated by the first module. This operation, also called discrimination, is made easier when the preprocessing is effective, that is to say that the coding performed is as informative as possible. The choice of preprocessing method is therefore key in optimizing a recognition system.
The preprocessing methods usually used in such a system depend on the type of input to be processed.
In the case of recognition systems that process data, the input of the preprocessing module consists of a vector. This vector can be linearly decomposed into a series of components. For a given vector s, by having a base of N components ψj of the same dimension as s, the linear decomposition leads to the decomposition equation expressed below:
                    s        =                              ∑                          i              =              1                        N                    ⁢                                          ⁢                                    β              j                        ⁢                          Ψ              j                                                          (        1        )            
The variables βj are associated with the components ψj and are the result of the decomposition. The components ψj are usually orthonormal, which considerably simplifies the calculation of the variables βj by restricting it to a scalar product. It is also usual for the base not to generate all the space. Such is the case if N is less than the dimension of the data. In this case, a reconstruction error e is added to the equation (1):
                    s        =                                            ∑                              i                =                1                            N                        ⁢                                                  ⁢                                          β                j                            ⁢                              Ψ                j                                              +          e                                    (        2        )            
The conventional methods for decomposing data such as, for example, principal component analysis (PCA), independent component analysis (ICA) or linear discriminative analysis (LDA), determine the components ψj to be used by learning by using databases of input vectors. A drawback to these methods limiting their effectiveness is that they have no physical meaning specific to the phenomenon being processed.
In the case of recognition systems that process signals, the source signal s(t) to be processed can be expressed as a linear combination of atoms. If the sampled signal to be processed s(t) is one-dimensional and of length T, it is decomposed by using a base consisting of N atoms ψj(t) of length T. The signal can then be expressed:
                              s          ⁡                      (            t            )                          =                              ∑                          i              =              1                        N                    ⁢                                          ⁢                                    β              j                        ⁢                                          ψ                j                            ⁡                              (                t                )                                                                        (        3        )            
The result of the decomposition is the value of all the variables βj associated with the atoms ψj(t).
This notation is extended to the decomposition of multidimensional signals. The signal s(t) is then expressed as a linear combination of atoms ψj(t) of the same dimensions as s(t). Just the same as for the vectors, the atoms may not generate all the space and a reconstruction error e(t) is taken into account giving:
                              s          ⁡                      (            t            )                          =                                            ∑                              i                =                1                            N                        ⁢                                                  ⁢                                          β                j                            ⁢                                                Ψ                  j                                ⁡                                  (                  t                  )                                                              +                      e            ⁡                          (              t              )                                                          (        4        )            
By taking the example of a decomposition by Fourier transform, the atoms are complex exponential functions or else sine and cosine functions. An atom then corresponds to a pure single-frequency signal and these functions are indexed by frequency.
The decomposition base may also be generated from shorter signals, called kernels, which are made to undergo various transformations to generate all the atoms. This principle is used, for example, by wavelet transform: the atoms are constructed from a single kernel called mother wavelet. This kernel undergoes, on the one hand, a change of scale and, on the other hand, a time shift. These operations conducted on the same kernel lead to a base of several atoms used for the decomposition of the signal. Each atom is then associated with a scale and with a delay value.
The usual algorithms performing the decomposition of signals often have a mainly mathematical sense or a very generic physical sense like the presence of a frequency in the case of the use of the Fourier transform. In the large majority of cases, the relevant information after preprocessing is still presented in a distributed manner. One consequence is that the coding performed is then not very sparse. The algorithm used for the recognition phase will have a lot of work to do because of the poor effectiveness of the preprocessing.
Experience from greatly studied fields, such as speech processing, shows the extent to which the specialization of the preprocessing to the problem concerned improves performance.
The works of E. C. Smith and M. S. Lewicki presented in the articles Efficient coding of time-relative structure using spikes, Neural Computation, Vol. 17, p. 19-45, 2005 and Efficient auditory coding, Nature, Vol. 439, No. 23, p. 978-982, 2006, deal with the case of speech processing. Rather than use an a priori defined kernel base, the solution proposes to determine the relevant kernels for the coding by a learning mechanism. The learning of the kernels relies on a learning base containing signals specific to the source, such as ambient sounds, animal or human vocalizations or a mixture of the two. Following this learning phase, the sounds can be decomposed into discrete acoustic elements characteristic of the structure of the signals and optimal for its coding. The kernels obtained differ according to the type of signals that make up the learning base. For the case of learning signals consisting of ambient sounds and vocalizations, the kernels found correspond to the impulse responses of the filters of the cochlea in mammals. The coding produced is more efficient than the conventional codings such as those produced by Fourier transforms or in wavelets in that it produces a sparse code, also called “hollow code”.
Some existing preprocessing methods make it possible to select a subset of kernels in a kernel base so that the coding is adapted, for example, to the instantaneous nature of the signal. However, in this case, the kernel base is defined a priori and the kernels are not adapted to the physical reality of the signal. The coding algorithm proposed by E. C. Smith and M. S. Lewicki makes it possible to adapt the kernels for the source but it is not applied to the field of shape recognition and it is not optimized for the processing of signals or data represented on several dimensions. Shape recognition applications, such as, for example, seismic signal analysis and writing recognition or, more generally, movement recognition require a multidimensional processing. Such is also the case, for example, for medical applications such as electrocardiogram ECG, electroencephalogram EEG or magnetoencephalogram MEG.