In the description, the following definitions and parameters are used.
The term “cepstrum” is a type of vector representative of the spectral content extracted from a speech or audio signal, based on a hypothesis of production of a source-filter type signal. Centroids are vectors representative of a cepstral vector class. The set of these centroids constitutes a dictionary obtained for example by implementing a learning algorithm known to those skilled in the art, example algorithms are given in the following description.
Currently, identification or authentication systems using biometric parameters are very widespread. Of all these types of biometrics, speaker recognition is the technique most accepted by the user, because it is non-intrusive and does not require any contact with the system reader. This is the same for authentication systems based on iris recognition. On the other hand, speaker recognition is particularly suitable for applications implemented over telephone networks, permitting remote and centralized processing on a server. The variation between the voices of different individuals originates from three distinct factors; morphological differences, physiological differences and socio-cultural differences. The first of these factors changes during adolescence, but stabilizes. The other factors are not stable and can vary over time. These factors added to the environmental noise distortions and to the quality of the voice recognition device, or the voice recording cause large variations between the same speaker. This increases the difficulty in recognizing an individual during his authentication.
In spite of these factors limiting the correct use, there are a large number of applications for which authentication based on voice is still the most recommended. For example, worthy of mention is the use of voice recognition with mobile phones, the associated services, for example the consultation of bank details, etc., the latter in complete security, without any fear that an ill-intentioned individual will succeed in getting hold of data characterizing this individual. (The use of biometric data imposes a stringent requirement for the user to be physically present and is more robust than use of a password alone).
There is currently a need for a system allowing precise authentication of the speaker without storage of data likely to betray his identity or information relating to his private life.
The article by Monrose et al. entitled “Cryptographic Key Generation from Voice”, which appeared in the Proceedings of the 2001 IEEE Symposium on Security and Privacy May 2001, which is incorporated by reference herein, describes a system allowing cryptographic key generation from the voice of an individual. Even if it is effective, this system nevertheless suffers from the disadvantage of requiring the use of a database in which information characterizing the speaker can be stored.