1. Technical Field of the Invention
The present invention relates to a technology for estimating the degree of similarity between voices using so-called inter-band correlation matrices, and relates to a technology for authenticating or identifying speakers using the estimation technology.
2. Description of the Related Art
To authenticate or identify a speaker, it is necessary to estimate the degree of similarity between a voice generated by the speaker and voices that have been previously obtained from specific speakers. In a general method for estimating the degree of similarity between voices, respective features of the voices to be compared are quantified into feature quantities, and the degree of similarity between the voices is estimated by comparing the feature quantities obtained from the voices. Non-Patent Reference 1 describes a technology in which inter-band correlation matrices are used as the feature quantities of voices to perform speaker identification. The inter-band correlation matrix obtained from a voice is a specific matrix whose elements are correlation values between envelope components of the voice in multiple bands into which the spectral data of the voice is divided. The contents of inter-band correlation matrices obtained respectively from voices uttered by speakers are not substantially affected by contents of the uttered voices and, instead, significantly depend on the speakers. Inter-band correlation matrices having similar contents are obtained from voices uttered by the same speaker, regardless of the contents of the uttered voices. Accordingly, speakers can be authenticated or identified using the inter-band correlation matrices as feature quantities of their voices.
[Non-Patent Reference 1] An article “TALKER IDENTIFICATION USING NARROW-BAND ENVELOPE CORRELATION MATRIX” published by KAZAMA MICHIKO, HIGASHIYAMA MIKIO, and YAMAZAKI YOSHIO in the Institute of Electronics, Information and Communication Engineers in March 2002.
[Non-Patent Reference 2] An article “Talker difference as they appear in correlation matrices of continuous speech spectra” published by K. -P. Li and G. W. Hughes, J. Acoust. Soc. Am., Vol. 55, No. 4, April 1974.
The inter-band correlation matrix used in the technology described in Non-Patent Reference 1 includes, as its elements, a number of correlation values between respective envelope components of the voice in a plurality of bands that are contiguous or continuous (not discrete) to each other along the frequency axis. However, the correlation between envelope components of the voice in frequency bands that are adjacent to each other along the frequency axis is high for any speaker who utters the voice. The inter-band correlation matrix used in Non-Patent Reference 1 includes elements which do not express differences between individuals, which causes a reduction in the accuracy of the estimation of the degree of similarity between voices.