My invention relates to speech analysis and, more particularly, to recognition systems for identification or verification of a speaker on the basis of selected acoustic parameters unique to his person.
It is often important to identify or verify the identity of an individual from physical characteristics related to his speech. Such a procedure is desirable for transactions conducted over the telephone, for rapid credit verification, or for security arrangements such as controlled admittance to secured areas. Priorly, automatic speaker recognition systems have been based on the comparison of a predetermined spoken message with a previously stored reference of the same or a similar message, or a comparison between selected parameters of particular utterances made by the individual with previously stored parameters of a corresponding utterance. Such parameters may be derived from speech characteristics such as pitch period, intensity, a particular formant of frequency or its bandwidth, or some property of the glottal wave.
In one system such as disclosed in U.S. Pat. No. 3,466,394, issued to W. K. French Sept. 9, 1969, selected peaks and valleys of each pitch period are utilized to obtain characteristic coordinates of a voiced input of an unknown speaker, which coordinates are selectively compared against those of one or more previously stored reference coordinates. As a result of the comparison, a decision is made as to the identity of the unknown speaker. This arrangement, however, requires that the characteristic coordinates be normalized with respect to intensity to prevent errors occasioned by the individual's use of a different intensity than used when the reference coordinates were obtained.
Another arrangement, such as disclosed in G. R. Doddington et al U.S. Pat. No. 3,700,815 issued Oct. 24, 1972 and assigned to the same assignee, compares the characteristic way an individual utters a test sentence with a previously stored utterance of the same sentence. This comparison, however, requires a temporal alignment of the test and reference utterances. Accordingly, the time scale of the test utterance is warped to bring it into time registration with the reference sentence before the comparison is made.
These and other techniques presently used are based on characteristics of speech that are dependent on the content of the utterance. A more effective method can be based on a speaker recognition feature that reflects the unique properties of the speaker's vocal apparatus and not the content of the utterance. Speech analysis based on the linear predictability of the speech waveform provides a set of characteristics that are desirable for automatic speaker recognition. These characteristics represent combined information about the formant frequencies, their bandwidth, and the glottal wave and are substantially independent of pitch and intensity information.
A speaker recognition arrangement based on comparison of linear prediction characteristics of an unidentified speaker with previously stored linear prediction characteristics of known speakers is not restricted to selected speech features such as formant frequencies and the glottal wave. Thus, the linear prediction characteristics can form a more complete basis for speaker recognition. The use of linear prediction characteristics for speaker recognition, however, generally requires segmentation or time normalization since the characteristics include both linguistic and speaker dependent information.
It is an object of the invention to provide speaker recognition which is substantially independent of the linguistic content of the speech signal and avoids alignment of signal characteristics.