Many techniques have been developed in recent years for reducing the amount of information that must be provided to communicate speech to a remote location or to store speech information for subsequent retrieval and reproduction. An important consideration is the rate at which such code information must be generated to adequately meet the high quality requirements of the coding scheme. For example, in some important applications speech is represented by digital signals occurring at 32 kilobits per second (kbit/s). It is, of course, desirable to represent speech with as few digital signals as possible to minimize storage and transmission bandwidth requirements.
Among the most common techniques currently used are those collectively known as linear predictive coding techniques. Within this broad category of coding techniques, that known as Code Excited Linear Predictive (CELP) coding has received much attention in recent years. An early overview of the CELP approach is provide in M. R. Schroeder and B. S. Atal, "Code Excited Linear Prediction (CELP): High-Quality Speech at Very Low Bit Rates," Proc. IEEE Int. Conf. Acoust. Speech, Signal Processing, pp. 937-940 (1985).
Another coding constraint that arises in many circumstances is the delay needed to perform the coding of speech. Thus, for example, low delay coding is highly effective to reduce the effects of echoes and to impose lesser demands on echo suppressors in communication links. Further, in those circumstances, such as cellular communication systems, where permitted total delay is limited, and where channel coding delays are an important aspect of channel error control, it is highly desirable that the original speech coding not consume a significant portion of the available total delay "resource."
To date, most speech coders for use at or below 16 kbit/s buffer a large block of speech samples in seeking to achieve good speech quality. This block of samples typically includes samples of speech over approximately a 20 millisecond (ms) interval, to permit the application of well known transform, prediction, or sub-band techniques to exploit the redundancy in the buffered speech. However, with processing delay and bit transmission delay added to the buffering delay, the total one-way coding delay of these conventional coders is typically around 50 to 60 ms. As noted, such a long delay is not desirable, or even tolerable, in many applications.
A recent goal of an international standards group has focused on the problem of low-delay CELP coding for 16 kbit/s speech coding. See, CCITT Study Group XVIII, Terms of reference of the ad hoc group on 16 kbits/s speech coding (Annex 1 to question U/XV), June, 1988. The requirement posed by the CCITT group was that coding delay was not to exceed 5 msec, with the goal being 2 msec. Solutions to the problem posed by the CCITT group have been provided, e.g., in J.-H. Chen, "A robust low-delay CELP speech coder at 16 kbits/s," Proc. IEEE Global Commun. Conf., pp. 1237-1241 (November 1989); J.-H. Chen, "High-quality 16 kb/s speech coding with a one-way delay less than 2 ms," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 453-456 (April, 1990); and J.-H. Chen, M. J. Melchner, R. V. Cox, and D. O. Bowker, "Real-time implementation of a 16 kb/s low-delay CELP speech coder," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 181-184 (April 1990).
Recently, the CCITT went one step further and planned to standardize an 8 kb/s speech coding algorithm. Again, all candidate algorithms are required to have low delay, but this time the one-way delay requirement has been relaxed somewhat to about 10 ms.
At 8 kb/s, it is much more difficult to achieve good speech quality with low delay than at 16 kb/s. This is, in part, because current low-delay CELP coders update their predictor coefficients based on previously coded speech, the so-called "backward adaptation" technique. See, for example, N. S. Jayant, and P. Noll, Digital Coding of Waveforms, Prentice-Hall, Inc., Englewood Cliffs, N.J. (1984). Additionally, higher coding noise level in 8 kb/s coded speech makes backward adaptation significantly less effective than at 16 kb/s.
Prior to the 8 kbit/s low delay coder challenge posed by the CCITT, little or nothing was published in the literature on the subject. Since the challenge, T. Moriya, in "Medium-delay 8 kbit/s speech coder based on conditional pitch prediction", Proc. of Int. Conf. Spoken Language Processing, (November, 1990), has proposed a 10 ms delay 8 kb/s CELP coder based on the backward adaptation techniques of 16 kb/s LD-CELP described, e.g., in the above cited 1989 Chen paper. This 8 kb/s coder was reportly capable of outperforming conventional 8 kb/s CELP coder described in the above-cited Schroeder and Atal 1985 paper and in P. Kroon and B. S. Atal, "Quantization procedures for 4.8 kbps CELP coders," Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, pp. 1650-1654 (1987). However, such performance was possible only if delayed decision coding of the excitation vector was used (at a price of very high computational complexity). On the other hand, if delayed decision was not used, then the speech quality degraded and became slightly inferior to that of conventional 8 kb/s CELP.
The Moriya coder first performed backward adaptive pitch analysis to determine 8 pitch candidates, and then transmitted 3 bits to specify the selected candidate. Since backward pitch analysis is known to be very sensitive to channel errors (see Chen 1989 reference, above), this coder is likely to be very sensitive to channel errors as well.