The present invention relates to vector quantization (VQ) in speech coding systems using waveform interpolation.
In recent years, there has been increasing interest in achieving toll-quality speech coding at rates of 4 kbps and below. Currently, there is an ongoing 4 kbps standardization effort conducted by an international standards body (The International Telecommunications Union-Telecommunication (ITU-T) Standardization Sector). The expanding variety of emerging applications for speech coding, such as third generation wireless networks and Low Earth Orbit (LEO) systems, is motivating increased research efforts. The speech quality produced by waveform coders such as code-excited linear prediction (CELP) coders degrades rapidly at rates below 5 kbps; see B. S. Atal, and M. R. Schroeder, (1984) “Stochastic Coding of Speech at Very Low Bit Rate”, Proc. Int Conf. Comm, Amsterdam, pp. 1610–1613.
On the other hand, parametric coders, such as: the waveform-interpolative (WI) coder, the sinusoidal-transform coder (STC), and the multiband-excitation (MBE) coder, produce good quality at low rates but they do not achieve toll quality; see Y. Shoham, IEEE ICASSP'93, Vol. II, pp. 167–170 (1993); I. S. Burnett, and R. J. Holbeche, (1993), IEEE ICASSP'93, Vol. II, pp. 175–178; W. B. Kleijn, (1993), IEEE Trans. Speech and Audio Processing, Vol. 1, No. 4, pp. 386–399; W. B. Kleijn, and J. Haagen, (1994), IEEE Signal Processing Letters, Vol. 1, No. 9, pp. 136–138; W. B. Kleijn, and J. Haagen, (1995), IEEE ICASSP'95, pp. 508–511; W. B. Kleijn, and J. Haagen, (1995), in Speech Coding Synthesis by W. B. Kleijn and K. K. Paliwal, Elsevier Science B. V., Chapter 5, pp. 175–207; I. S. Burnett, and G. J. Bradley, (1995), IEEE ICASSP'95, pp. 261–263, 1995; I. S. Burnett, and G. J. Bradley, (1995), IEEE Workshop on Speech Coding for Telecommunications, pp. 23–24; I. S. Burnett, and D. H. Pham, (1997), IEEE ICASSP'97, pp. 1567–1570; W. B. Kleijn, Y. Shoham, D. Sen, and R. Haagen, (1996), IEEE ICASSP'96, pp. 212–215; Y. Shoham, (1997), IEEE ICASSP'97, pp. 1599–1602; Y. Shoham, (1999), International Journal of Speech Technology, Kluwer Academic Publishers, pp. 329–341; R. J. McAulay, and T. F. Quatieri, (1995),in Speech Coding Synthesis by W. B. Kleijn and K. K. Paliwal, Elsevier Science B. V., Chapter 4, pp. 121–173; and D. Griffin, and J. S. Lim, (1988), IEEE Trans. ASSP, Vol. 36, No. 8, pp. 1223–1235. This is largely due to the lack of robustness of speech parameter estimation, which is commonly done in open-loop, and to inadequate modeling of non-stationary speech segments.
Commonly in WI coding, the similarity between successive rapidly evolving waveform (REW) magnitudes is exploited by downsampling and interpolation and by constrained bit allocation; see W. B. Kleijn, and J. Haagen, (1995), IEEE ICASSP'95, pp. 508–511. In a previous Enhanced Waveform Interpolative (EWI) coder the REW magnitude was quantized on a waveform by waveform base; see O. Gottesman and A. Gersho, (1999), “Enhanced Waveform Interpolative Coding at 4 kbps”, IEEE Speech Coding Workshop, pp. 90–92, Finland; Finland. O. Gottesman and A. Gersho, (1999), “Enhanced Analysis-by-Synthesis Waveform Interpolative Coding at 4 kbps”, EUROSPEECH'99, pp. 1443–1446, Hungary.