1. Field of the Invention
The present invention generally relates to systems that encode audio signals, such as speech signals, for transmission or storage and/or that decode encoded audio signals for playback.
2. Background
Speech coding refers to the application of data compression to audio signals that contain speech, which are referred to herein as “speech signals.” In speech coding, a “coder” encodes an input speech signal into a digital bit stream for transmission or storage, and a “decoder” decodes the bit stream into an output speech signal. The combination of the coder and the decoder is called a “codec.” The goal of speech coding is usually to reduce the encoding bit rate while maintaining a certain degree of speech quality. For this reason, speech coding is sometimes referred to as “speech compression” or “voice compression.”
The encoding of a speech signal typically involves applying signal processing techniques to estimate parameters that model the speech signal. In many coders, the speech signal is processed as a series of time-domain segments, often referred to as “frames” or “sub-frames,” and a new set of parameters is calculated for each segment. Data compression algorithms are then utilized to represent the parameters associated with each segment in a compact bit stream. Different codecs may utilize different parameters to model the speech signal. By way of example, the BROADVOICE16™ (“BV16”) codec, which is described by J.-H. Chen and J. Thyssen in “The BroadVoice Speech Coding Algorithm,” Proceedings of 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. IV-537-IV-540, April 2007, is a two-stage noise feedback codec that encodes Line-Spectrum Pair (LSP) parameters, a pitch period, three pitch taps, excitation gain and excitation vectors associated with each 5 ms frame of an audio signal. Other codecs may encode different parameters.
As noted above, the goal of speech coding is usually to reduce the encoding bit rate while maintaining a certain degree of speech quality. There are many practical reasons for seeking to reduce the encoding bit rate. Motivating factors may include, for example, the conservation of bandwidth in a two-way speech communication scenario or the reduction of memory requirements in an application that stores encoded speech for subsequent playback. To this end, codec designers are often tasked with reducing the number of bits required to encode a parameter associated with a segment of a speech signal without sacrificing too much in terms of the resulting quality of the decoded speech signal.
Like the BV16 codec mentioned above, many speech codecs in use today encode a pitch period associated with each segment of a speech signal. Generally speaking, a pitch period is a measure of the lag between repeating cycles of a quasi-periodic or periodic signal. The pitch period is an important parameter for speech coding because voiced regions of a speech signal are often periodic in nature and thus can be modeled by estimating a pitch period associated therewith. The pitch period of a voiced region of a speech signal typically does not change abruptly but rather evolves smoothly over time. The pitch period is often used in codecs that perform long-term prediction of a speech signal.
In the BV16 codec, the encoder uses 7-bit instantaneous uniform quantization to generate a quantized representation of a pitch period that may range from 10 samples to 136 samples for each 5 ms frame. (As used herein, the term “instantaneous” quantization means that the quantization is based solely on that particular parameter or sample being quantized in an instantaneous manner without delayed-decision coding and without relying on previous states (memory)). This means that in BV16, pitch period encoding consumes 1400 bits per second (bps) of the total 16 kb/s encoding bit rate, or less than 10% of the total encoding bit rate. While this is a relatively small amount of the total encoding bit rate, if the same pitch period encoding method were used in a codec having a significantly lower encoding bit rate, the percentage consumed would be much higher. For example, if the same pitch period encoding method were to be used in a codec that was required to have a 4 kb/s-5 kb/s encoding bit rate, the pitch period encoding method would consume roughly a third of the available bit rate.
One obvious approach to reducing the encoding bit rate associated with BV16 would be to simply reduce the fixed number of bits used to generate the quantized representation of the pitch period, either by narrowing the range of pitch periods represented, by reducing the number of levels represented, or both. However, this approach would tend to result in a corresponding degradation of the decoded speech signal generated by the BV16 decoder, which would be forced to decode the speech signal with more limited and/or less accurate pitch period data.
What is needed, then, are systems and methods for reducing the bit rate required to encode a pitch period associated with a segment of a speech signal in a manner that will result in relatively little or no degradation of a decoded speech signal generated using the encoded pitch period. The desired systems and method should be applicable to the BV16 codec or any other speech codec that encodes a pitch period associated with a segment of a speech signal.