In the field of biometric authentication, biometrics which allow for the authentication of an individual comprise: for example his/her voice, body movements, fingerprints or palm prints, iris structure, venous networks of his/her retina or palm of his/her hand, morphology of his/her hand as well as his/her facial features.
Each biometry is collected by at least one biometric sensor of a determined type, for example a microphone for detecting the voice of an individual. The sensor transforms the biometry that it detects into an analog signal. Owing to the technical features of the sensor (physical value detected, precision), the analog signal is defined in a continuous and limited observation range. In a known manner, during a step of sampling, for example over time, this signal is sampled then during a step of parameterization, a contiguous set of samples is transformed, for example by implementing Fourier transform, in order to obtain at least a multidimensional vector of determined parameters defined in a first area of representation. An analog signal is thus represented in the first area of representation by a sequence of multidimensional parameter vectors. In the rest of the text, each occurrence of the terms “biometric data” refers to a multidimensional parameter vector and each occurrence of the terms “set or collection of biometric data” refers to one or several sequences of multidimensional parameter vectors.
The main methods of biometric authentication are based on the statistic modeling of the first area of representation in a second area of representation. This statistical modeling particularly rests on the hypothesis that a set of biometric data may be represented and classified in the second area of representation in the form of a probability distribution. This hypothesis is reasonable within certain limits that are not mentioned here and has the advantage of defining each set of biometric data in a form that is easily manipulated mathematically. It is possible to define the distribution of biometric data in the second area of representation by a set of simple statistical distributions, each characterized by a limited set of parameters. In the case of a universal collection of biometric data, the set of distributions representing said universal collection constitutes a mean statistical model, called universal statistical model. For example, in the field of vocal recognition, the classification methods are based on the statistical modeling of the acoustic space in an area of representation defined by a mixture of Gaussian distributions or GMM (Gaussian Mixture Model) wherein is classified a universal collection of voice signatures from a large number of speakers. The universal statistical modeling thus constructed is well known by initials UBM (Universal Background Model). The universal statistical modeling from this statistical modeling constitutes a mean reference with respect to which individual models may be derived thanks to an adaptor or a so-called Maximum A Posteriori (MAP) estimator, adapting or estimating all or part of the parameters describing the universal model. In general, only the parameters corresponding to the means of Gaussian distributions are adapted. The, thus model adapted to an individual or a class of individuals represents the specificities of said individual or class of individuals. As the universal model must comprise a large number of statistical distributions in order to be generic and that an individual or class of individuals is specified by means of a limited collection of biometric data, only a small part of statistical distributions, that is to say, parameters describing these distributions, is adapted to a speaker or a given class. The statistical distributions composing the universal model are called “components” of said model. When a collection of biometric data must be compared with a given individual model, a likelihood score is obtained by comparing the likelihood of these data with the individual model with respect to the likelihood of the same data with the universal model. The function corresponding to these Likelihood scores is called “likelihood ratio” and is generally projected in the logarithmic space in order to define the function called “Log-likelihood ratio” or LLR.
However, if this approach based on statistical modeling is powerful and widely used, it is important to note that it has its limitations. FIG. 1 illustrates at the same time the general principle of statistical modeling and its main limitations. FIG. 1 more particularly illustrates the probability distribution function F of a universal statistical model such as UBM (composed of four components only in a one-dimensional area of representation) and the probability distribution function G of an individual model derivative of this statistical model by MAP adaptation (of a single component of the universal model only adapting the mean parameters from amongst the parameters describing this component). FIG. 1 also illustrates the LLR function, referenced L, pertaining to the considered individual model and the universal model. The below comments made with reference to FIG. 1 facilitate the comprehension of the limitations of the approach by statistical modeling.
The M point on FIG. 1 represents the average of the component adapted to the individual model. This average represents an information specific to the individual from individual learning biometric data. The point E1a represents the local maximum of the probability distribution function G of the individual model nearest to point M and point E1b representing the minimum local of the probability distribution function G of the individual model nearest to the point M. It is observed that points E1a and E1b do not have the same abscissa, nor are they symmetrically distributed around point M. This is not only due to the adaptation by MAP, but also to the limitation of the number of components of the universal statistical model. The latter is directly linked to the need of having reliable statistical estimations making it possible to obtain an individual model whereof a significant part of the components show parameters that vary significantly with respect to those of the universal model of which it is from.
Always with reference to FIG. 1, the log-likelihood ratio shows a maximum E2a and a minimum E2b that are distant from the point M. Two points H and E of the log-likelihood ratio respectively of slightly different abscissa x(H) and x(E) such that the distance x(H)-x(E) is of the order of magnitude of the distance x(E2a)-x(M), respectively give a positive and negative score. Thus, a small variation or error on the x-axis may result in different decisions as to knowing if the probability distribution function G of the individual model is likely or not. Thus, it appears that the decision is not directly linked to the information specific to the individual.
These observations make it possible to illustrate that the method by the standard statistical modeling lacks robustness with respect to a shifting of a small variation or error on the biometric data, as the effect of this shifting on the decision may be critical. It is to be noted that a shifting may simply be due to one of the sources of variability (or noise), for example the use of different microphones to collect voice signatures or the detection of voice signatures in different acoustic environments (enclosed or outside). These sources of variability are well known in the biometric authentication field and many documents of the prior art, for example referenced international application WO 2010/049695 or the referenced international application WO 2007/131530, put forward solutions intended to make it possible to increase the robustness of the standard statistical modeling approach with respect to the sources of variability without however questioning this approach.
It is also known of the prior art different developments of the approach by statistical modeling presented below. These developments are briefly described below, particularly with reference to FIG. 2.
A first development of the approach by standard statistical modeling consists in considering the universal statistical model (USM) as a definition of an area of representation of new data defined by the concatenation of mean parameters of each of the components constituting said model, these new data being known by the name of super vectors (SV). This area called super vector space, makes it possible to use support vector machines (SVM) which are a set of supervised learning techniques intended to resolve discrimination and regression issues.
A second development of the approach by standard statistical modeling consists in a direct modeling of the variability of sessions in the super vector space by using a joint Factor analysis known by initials JFA.
More recently, the concept of total variability space (TS) has been introduced, consisting in modeling the total variability in the super vector space such as to construct a smaller space that concentrates the information and wherein it is easier to jointly model the variability of sessions and speakers. More particularly, each vector of the super vector space corresponds to a vector called iVector (iV), in the total variability space.
Several comments can be made regarding the latter approaches. First of all, the super vector space counts a very large number of dimensions (currently hundred thousands of dimensions), thus rendering its joint factor analysis difficult. Secondly, the extraction of the iVector corresponds precisely to a reduction of the dimensionality to a few thousand dimensions only, thus mainly making it possible to apply simpler approaches in order to differentiate the variability of sessions and variability of speakers: the extraction techniques of the iVector are similar to the joint factor analysis techniques because these two types of techniques work in a same type of space, the super vector space. Thirdly, the super vector space in the approach by statistical modeling of a universal collection of biometric data with a Gaussian mixture model and the reduced dimensionality space linked with the extraction of the iVector are constructed according to the same schema: a step of re-parameterization taking into consideration in input a set of biometric data (BD) and with in output one single vector representing the input set. The FIG. 2 simplifies these two steps of re-parameterization and illustrates the reduction in the size of the associated data.
The approach by statistical modeling through the use of super vectors and/or the total variability thus has advantages, but also has at least two limitations discussed hereinafter.
First, as a set of biometric data is represented by a unique point in the area of representation, it is difficult in this approach to exploit information temporally or sequentially, for example within a setting of unsupervised learning as well as applying this approach in order to fulfill other tasks, such as indexing or segmenting the analog signal provided by the biometric sensor.
Second, the super vector space and the total variability space are both based on the concept of global and general information: information is important because it appears frequently. Thereby, these spaces do not intrinsically take into account the discriminatory specificities of an individual or a class of individuals, yet the purpose of any biometric classification is precisely to take into consideration these discriminatory specificities. In this regard, these spaces have the drawbacks stated in the comments made with reference to FIG. 1.