A device for speaker's verification has to ascertain that the identity stated by a speaker corresponds to the true speaker's identity. The speaker has to say a standard sentence from which the device obtains typical speech parameters which are compared with average speech parameters of the same speaker, obtained in a previous training phase, in which that speaker had to repeat the same standard sentence many times. The comparison is carried out by calculating the probability that the sentence just spoken belongs to that speaker. If the probability value is greater than a certain threshold, then the device considers the speaker to be verified.
The devices known for speaker's verification, as for example that described in the paper "A low cost speaker verification device" presented by M. H. Kuhn, R. Geppert at Carnahan Conference on Crime Countermeasures, held at the Univ. of Kentucky, Lexington, 14-16 May 1980, generally consist of the following circuit units:
a parameter extraction unit, which divides each just spoken sentence into time intervals of suitable fixed duration, and calculates the energy associated with the signal for each interval and for each frequency band constituting the frequency spectrum of the speech band, obtaining an energy vector for each time interval. Then, it averages the power vectors of all intervals, thus obtaining a vector of average parameters, where each component is pertinent to a frequency band; PA0 a unit determining the distribution of average speaker parameters for several repetitions of the same sentence. This block works during the training phase and prepares distribution histograms, one for each frequency band, of average energy levels obtained for each sentence. A histogram memory is created for each speaker; PA0 a unit for probability calculation. This unit works during the verification phase and for each frequency band it verifies at which point of the related histogram, read in the speaker memory, the new value of average parameter just calculated by the parameter extraction block is found and it assigns a corresponding probability value that the new value belongs to the speaker. Then, this unit multiplies all probability values and compares the product with a fixed threshold value. PA0 it is difficult to establish the real instants of beginning and end of the sentence, so that the parameter extraction unit will not consider also time intervals in which only noise is present; PA0 the real sentence duration changes at each speaker's repetition, then the fixed number of acoustical events (characteristic of a given sentence) is divided into a variable number of time intervals. During the various repetitions of the same sentence, the same events have different weights, and thus the validity of the time average function is reduced; and PA0 a fixed probability threshold does not favor, in the verification process, those speakers whose histograms of average parameter distribution have a higher variance, i.e. a higher dispersion of average parameter values during the repetitions of the same sentence, so that the probability of no verification increases.
The known devices have disadvantages:
Some methods known for determining the beginning and end points of the sentence are essentially based on the measure of speech signal energy.
A first method effects a comparison between the energy of the speech signal and a threshold value possibly adapted to the background noise present at the beginning in the environment.
Another method described in the article "An algorithm for determining the endpoints of isolated utterances" by L. R. Rabiner and M. R. Sambur, The Bell System Tehcnical Journal, V. 54, No. 2, February 1975, proposes a comparison between the speech signal energy and two thresholds of different value. The sentence beginning or end is established by the lower value threshold overcoming if the higher value threshold is overcome before a new overcoming of the lower threshold.
If the sentence begins and/or ends with a consonant, the corresponding time intervals adjacent to that of the real sentence are added. They are calculated according to the determination of the number of zero crossings of the acoustical signal in those time intervals.
All these methods have the drawback that an unexpected peak of high-energy noise is interpreted as a beginning or end point of a sentence.