1. Field of the Invention
The present invention generally relates to systems that encode audio signals, such as music and speech signals, for transmission or storage and/or that decode encoded audio signals for playback.
2. Background
Audio coding refers to the application of data compression to audio signals such as music and speech signals. In audio coding, a “coder” encodes an input audio signal into a digital bit stream for transmission or storage, and a “decoder” decodes the bit stream into an output audio signal. The combination of the coder and the decoder is called a “codec.” The goal of audio coding is usually to reduce the encoding bit rate while maintaining a certain degree of perceptual audio quality. For this reason, audio coding is sometimes referred to as “audio compression.”
Traditional audio codecs are typically transform audio codecs that employ a large transform window size between 20 and 50 milliseconds (ms). The large transform window size results in a fairly long coding delay. In certain applications of audio coding, such as tele-presence, in-game voice chat, and on-line live music performance by musicians in different places, it is necessary to maintain a low end-to-end delay. Some of these applications also require low codec complexity, especially when a battery-operated wireless device such as a Bluetooth™ stereo headset is involved. There exists low-delay and low-complexity transform audio codecs that use small transform window sizes below 10 ms to achieve low coding delays and low codec complexity. Examples of such low-delay transform audio codecs include the Constrained Energy Lapped Transform (CELT) codec (http://www.celt-codec.org) as described by J.-M. Valin, et al. in “A High-Quality Speech and Audio Codec With Less Than 10 ms delay,” IEEE Transaction on Audio, Speech, and Language Processing, Vol. 18, No. 1, January, 2010, and the HF64 audio codec described by J.-H. Chen in “A High-Fidelity Speech and Audio Codec With Low Delay and Low Complexity,” Proceedings of 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. II-1161 to II-1164 and in U.S. Pat. No. 6,351,730.
An inherent limitation of such low-delay transform audio codecs employing small transform window sizes is that the frequency resolution of such transforms is insufficient to resolve the pitch harmonics of some of the nearly periodic segments of music and speech signals. As a result, such low-delay transform codecs tends to produce more audible coding distortion when encoding nearly periodic music and speech signals, even though the coding performance may be fine for other non-periodic signals. Increasing the transform window size will enable the pitch harmonics to be resolved and thus exploited to reduce such distortion for periodic music and speech signals, but will also increase the coding delay and codec complexity.
What is needed, then, is a technique to improve the output audio quality of an audio codec that cannot effectively exploit pitch redundancy in an input audio signal to reduce distortion when such signal exhibits significant pitch periodicity. As noted above, such audio codecs may include low-delay transform audio codecs such as CELT and HF64.