The invention relates to speech analysis and more particularly to arrangements for generating signals representative of acoustic features of speech patterns of known speakers.
Digital speech coding is widely used to compress speech signals for efficient storage and transmission over communication channels and for use in automatic speech and voice recognition as well as speech synthesis. Coding arrangements generally involve partitioning a speech pattern into short time frame intervals and forming a set of speech parameter signals for each successive interval. One such digital speech coding system is disclosed in U.S. Pat. No. 3,624,302 issued to B. S. Atal, Nov. 30, 1971. The arrangement therein includes linear prediction analysis of speech signal in which the speech signal is partitioned into successive time frame intervals of 5 to 20 milliseconds duration, and a set of parameters representative of the speech portion of the time interval is generated. The parameter signal set includes linear prediction coefficient signals representative of the spectral envelope of the speech in the time interval, and pitch and voicing signals corresponding to the speech excitation. These parameter signals are encoded at a much lower bit rate than the speech waveform itself for efficient storage, transmission or comparison with previously stored templates to identify the speech pattern or the speaker. A replica of the original speech pattern may be formed from the parameter signal codes by synthesis in apparatus that generally comprises a model of the vocal tract in which the excitation pulses of each successive interval are modified by the interval spectral envelope prediction coefficients in an all-pole predictive filter. Spectral or other types of speech parameter signals may be utilized in speech coding using similar techniques.
Further compression of the speech pattern waveform may be achieved through vector quantization techniques well known in the art. In linear prediction analysis, the speech parameter signals for a particular time frame interval form a multidimensional vector and a large collection of such feature signal vectors can be used to generate a much smaller set of vector quantized feature signals that cover the range of the larger collection. One such arrangement is described in the article "Vector Quantization in Speech Coding" by John Makhoul et al appearing in the Proceedings of the IEEE, Vol. 73, No. 11, November 1985, pp. 1551-1588. Vector quantization is particularly useful in speaker recognition arrangements where feature signals obtained from a number of individuals must be stored so that they may be identified by an analysis of voice characteristics. A set of vector quantized feature signals may be generated from sets of I feature vectors a(1),a(2), . . . , a(I) obtained from many processed utterances. The feature vector space can be partitioned into subspaces S.sub.1, S.sub.2, . . . , S.sub.M. S, the whole feature space, is then represented as EQU S=S.sub.1 U S.sub.2 U . . . U S.sub.M. (1)
Each subspace S.sub.i forms a nonoverlapping region, and every feature vector inside S.sub.i is represented by a corresponding centroid feature vector b(i) of S.sub.i. The partitioning is performed so that the average distortion ##EQU1## is minimized over the whole set of original feature vectors. Using linear prediction coefficient (LPC) vectors as acoustic features, the likelihood ratio distortion between any two LPC vectors a and b may be expressed as ##EQU2## where R.sub.a is the autocorrelation matrix of the speech input associated with vector .alpha.. The distortion measure of equation 3 may be used to generate speaker-based VQ codebooks of different sizes. Such codebooks of quantized feature vectors may be used as reference features to which other feature vectors are compared for speech recognition, speech transmission, or speaker verification.
One problem encountered in the use of speech feature signals relates to the fact that speech patterns of individuals change over time. Such changes can result from temporary or permanent changes in vocal tract characteristics or from environmental effects. Consequently, stored speech feature signals that are used as references for speech processing equipment may differ significantly from feature signals later obtained from the same individuals even for the same speech messages. Where vector quantization is utilized, changes may be such that a codebook entry cannot be found within a reasonable distortion range of one or more input speech feature signals. In this event prior art arrangemnts have generally required that a new codebook be formed. But formation of a new codebook in the aforementioned manner requires complex and time consuming processing and may temporarily disable the speech processing equipment.
The article "An 800 BPS Adaptive Vector Quantization Vocoder Using a Perceptual Distance Measure" by Douglas B. Paul appearing in the Proceedings of ICASSP 83, pp. 73-76, discloses a technique for adapting a vector quantization codebook comprising spectral templates to changes in speech feature signals as a normal part of vocoder operation rather than generating a completely new codebook. The arrangement declares an incoming spectrum to be a new template if its distance from the nearest neighbor in the codebook template set exceeds a voicing decision dependent threshold. The least useful template as determined by the longest-time-since-use algorithm is replaced by the new template. The training and time-of-use recording devices are gated by a speech activity detector to prevent loading the codebook template set with templates representing background noise.
While the foregoing adaptive technique provides updating of a vector quantization codebook without disrupting operation of speech processing equipment, it does so by discarding codebook entries originally generated by careful analysis of a large collection of speech feature signals and inserting new codebook entries accepted on the basis of a single mismatch of an incoming speech signal to the codebook set. Thus, a valid and useful codebook entry may be replaced by an entry that is at the outer limit of expected feature signals. It is the object of the invention to provide an improved adaptation arrangement for vector quantized speech processing systems in which codebook entries are modified based on relative changes in speech features.