Recent advances in speech coding coupled with a dramatic increase in the performance-to-price ratio for Digital Signal Processor (DSP) devices have significantly improved the perceptual quality of compressed speech in speech processing systems such as voice store- and-forward systems or voice messaging systems. Typical applications of such voice processing systems are described in S. Rangnekar and M. Hossain, "AT&T Voice Mail Service," AT&T Technology, Vol. 5, No. 4, 1990 and in A. Ramirez, "From the Voice-Mail Acorn, a Still-Spreading Oak," NY Times, May 3, 1992.
Speech coders used in voice messaging systems provide speech compression for reducing the number of bits required to represent a voice waveform. Speech coding finds application in voice messaging by reducing the number of bits that must be used to transmit a voice message to a distant location or to reduce the number of bits that must be stored to recover a voice message at some future time. Decoders in such systems provide the complementary function of expanding stored or transmitted coded voice signals in such manner as to permit reproduction of the original voice signals.
Salient attributes of a speech coder optimized for transmission include low bit rate, high perceptual quality, low delay, robustness to multiple encodings (tandeming), robustness to bit-errors, and low cost of implementation. A coder optimized for voice messaging, on the other hand, advantageously emphasizes the same low bit rate, high perceptual quality, robustness to multiple encodings (tandeming) and low cost of implementation, but also provides resilience to mixed-encodings (transcoding).
These differences arise because, in voice messaging, speech is encoded and stored using mass storage media for recovery at a later time. Delays of up to a few hundred milliseconds in encoding or decoding are unobservable to a voice messaging system user. Such large delays in transmission applications, on the other hand, can cause major difficulties for echo cancellation and disrupt the natural give-and-take of two-way real time conversations. Furthermore, the high reliability of mass storage media achieve bit error rates several orders of magnitude lower than those observed on many contemporary transmission facilities. Hence, robustness to bit errors is not a primary concern for voice messaging systems.
Prior art systems for voice storage typically employ the CCITT G.721 standard 32 kb/s ADPCM speech coder or a 16 kbit/s Sub-Band coder (SBC) as described in J. G. Josenhans, J. F. Lynch, Jr., M. R. Rogers, R. R. Rosinski, and W. P. VanDame, "Report: Speech Processing Application Standards," AT&T Technical Journal, Vol. 65, No. 5, September/October 1986, pp. 23-33. More generalized aspects of SBC are described, e.g., in N. S. Jayant and P. Noll, "Digital Coding of Waveforms-Principles and Applications to Speech and Video", and in U.S. Pat. No. 4,048,443 issued to R. E. Crochiere et al. on Sep. 13, 1977.
While 32 kb/s ADPCM gives very good speech quality, its bit-rate is higher than desired. On the other hand, while 16 kbit/s SBC has half the bit-rate and has offered a reasonable tradeoff between cost and performance in prior art systems, recent advances in speech coding and DSP technology have rendered SBC less than optimum for many current applications. In particular, new speech coders are often superior to SBC in terms of perceptual quality and tandeming/transcoding performance. Such new coders are typified by so-called code excited linear predictive coders (CELP) disclosed, e.g., in U.S. patent application Ser. No. 07/298,451, by J-H Chen, filed Jan. 17, 1989, now abandoned, and U.S. patent application Ser. No. 07/757,168 by J-H. Chen, filed Sep. 10, 1991, U.S. patent application Ser. No. 07/837,509 by J-H. Chen et al., filed Feb. 18, 1992, and U.S. patent application Ser. No. 07/837,522 by J-H. Chen et al., filed Feb. 18, 1992, assigned to the assignee of the present application. Each of these applications are hereby incorporated by reference in the present application as if set forth in their entirety herein. Related coders and decoders are described in J-H Chen, "A robust low-delay CELP speech coder at 16 kbit/s," Proc. GLOBECOM, pp. 1237-1241 (November 1989); J-H Chen, "High-quality 16 kb/s speech coding with a one-way delay less than 2 ms," Proc. ICASSP, pp. 453-456 (April 1990); J-H Chen, M. J. Melchner, R. V. Cox and D. O. Bowker, "Real-time implementation of a 16 kb/s low-delay CELP speech coder," Proc. ICASSP, pp. 181-184 (April 1990); all of which papers are hereby incorporated herein by reference as if set forth in their entirety. A further description of the candidate 16 kbit/sec LD CELP standard system was presented in a document entitled "Draft Recommendation on 16 kbit/s Voice Coding," (hereinafter the Draft CCITT Standard Document) submitted to the CCITT Study Group XV in its meeting in Geneva, Switzerland during Nov. 11-22, 1991 which document is incorporated herein by reference in its entirety. In the sequel, systems of the type described in the Draft CCITT Standard Document will be referred to as LD-CELP systems.