The present invention relates generally to audio coding techniques, and more particularly, to perceptually-based coding of audio signals, such as speech and music signals.
Perceptual audio coders (PAC) attempt to minimize the bit rate requirements for the storage or transmission (or both) of digital audio data by the application of sophisticated hearing models and signal processing techniques. Perceptual audio coders (PAC) are described, for example, in D. Sinha et al., xe2x80x9cThe Perceptual Audio Coder,xe2x80x9d Digital Audio, Section 42, 42-1 to 42-18, (CRC Press, 1998), incorporated by reference herein. In the absence of channel errors, a PAC is able to achieve near stereo compact disk (CD) audio quality at a rate of approximately 128 kbps. At a lower rate of 96 kbps, the resulting quality is still fairly close to that of CD audio for many important types of audio material.
Perceptual audio coders reduce the amount of information needed to represent an audio signal by exploiting human perception and minimizing the perceived distortion for a given bit rate. Perceptual audio coders first apply a time-frequency transform, which provides a compact representation, followed by quantization of the spectral coefficients. FIG. 1 is a schematic block diagram of a conventional perceptual audio coder 100. As shown in FIG. 1, a typical perceptual audio coder 100 includes an analysis filterbank 110, a perceptual model 120, a quantization and coding block 130 and a bitstream encoder/multiplexer 140.
The analysis filterbank 110 converts the input samples into a sub-sampled spectral representation. The perceptual model 120 estimates a masked threshold of the signal. For each spectral coefficient, the masked threshold gives the maximum coding error that can be introduced into the audio signal while still maintaining perceptually transparent signal quality. The quantization and coding block 130 quantizes and codes the spectral values according to the precision corresponding to the masked threshold estimate. Thus, the quantization noise is hidden by the respective transmitted signal. Finally, the coded spectral values and additional side information are packed into a bitstream and transmitted to the decoder by the bitstream encoder/multiplexer 140.
FIG. 2 is a schematic block diagram of a conventional perceptual audio decoder 200. As shown in FIG. 2, the perceptual audio decoder 200 includes a bitstream decoder/demultiplexer 210, a decoding and inverse quantization block 220 and a synthesis filterbank 230. The bitstream decoder/demultiplexer 210 parses and decodes the bitstream yielding the coded spectral values and the side information. The decoding and inverse quantization block 220 performs the decoding and inverse quantization of the quantized spectral values. The synthesis filterbank 230 transforms the spectral values back into the time-domain.
In perceptual audio coders, such as the perceptual audio coder 100 shown in FIG. 1, the masked threshold is used to control the quantization and encoding of subband signals by the quantization and coding block 130. FIG. 3 illustrates a masked threshold 310 computed according to a psychoacoustic model and the corresponding approximation 320 used by a conventional perceptual audio coder. As shown in FIG. 3, the masked threshold is usually approximated with a step function that is encoded and transmitted to the perceptual audio decoder as side information. Due to limited bandwidth in the side information, however, only a course approximation of the masked threshold is transmitted. Inadequate accuracy of the masked threshold representation impacts the perceptual quality.
A need therefore exists for methods and apparatus for representing the masked threshold more accurately. A further need exists for methods and apparatus for representing the masked threshold more accurately with as few bits as possible.
Generally, a method and apparatus are disclosed for representing the masked threshold in a perceptual audio coder, using line spectral frequencies (LSF) or another representation for linear prediction (LP) coefficients. The present invention calculates LP coefficients for the masked threshold using known LPC analysis techniques. In one embodiment, the masked thresholds are optionally transformed to a non-linear frequency scale suitable for auditory properties. The LP coefficients are converted to line spectral frequencies (LSF) or a similar representation in which they can be quantized for transmission.
According to one aspect of the invention, the masked threshold is represented more accurately in a perceptual audio coder using an LSF notation previously applied in speech coding techniques. According to another aspect of the invention, the masked threshold is transmitted only if the masked threshold is significantly different from the previous masked threshold. In between each transmitted masked threshold, the masked threshold is approximated using interpolation schemes. The present invention decides which masked thresholds to transmit based on the change of consecutive masked thresholds, as opposed to the variation of short-term spectra.
The present invention provides a number of options for modeling variations in the masked threshold over time. For signal parts that gradually change, the masked threshold changes gradually as well and can be approximated by interpolation. For a generally stationary signal part, followed by a sudden change, the masked threshold can be approximated by a constant masked threshold that changes at once. A relatively constant masked threshold that later changes gradually can be modeled by a combination of a constant masked threshold followed by interpolation. A stationary signal part with a short transient in the middle has a masked threshold that temporarily changes to another value but returns to the initial value. This case can be modeled efficiently by setting the masked threshold after the transient to the masked threshold before the transient, and thus not transmitting the masked threshold after the transient.
A more complete understanding of the present invention, as well as further features and advantages of the present invention, will be obtained by reference to the following detailed description and drawings.