Many audio processing systems operate by dividing streams of audio information into frames and further dividing the frames into blocks of sequential data representing a portion of the audio information in a particular time interval. Some type of signal processing is applied to each block in the stream. Two examples of audio processing systems that apply a perceptual encoding process to each block are systems that conform to the Advanced Audio Coder (AAC) standard, which is described in ISO/IEC 13818-7. “MPEG-2 advanced audio coding, AAC”. International Standard, 1997; ISO/IEC JTCI/SC29, “Information technology—very low bitrate audio-visual coding,” and ISO/IEC IS-14496 (Part 3, Audio), 1996, and so-called AC-3 systems that conform to the coding standard described in the Advanced Television Systems Committee (ATSC) A/52A document entitled “Revision A to Digital Audio Compression (AC-3) Standard” published Aug. 20, 2001.
One type of signal processing that is applied to blocks in many audio processing systems is a form of perceptual coding that performs an analysis of the audio information in the block to obtain a representation of its spectral components, estimates the perceptual masking effects of the spectral components, quantizes the spectral components in such a way that the resulting quantization noise is either inaudible or its audibility is as low as possible, and assembles a representation of the quantized spectral components into an encoded signal that may be transmitted or recorded. A set of control parameters that is needed to recover a block of audio information from the quantized spectral components is also assembled into the encoded signal.
The spectral analysis may be performed in a variety of ways but an analysis using a time-domain to frequency-domain transformation is common. Upon transformation of blocks of audio information into a frequency-domain representation, the spectral components of the audio information are represented by a sequence of vectors in which each vector represents the spectral components for a respective block. The elements of the vectors are frequency-domain coefficients and the index of each vector element corresponds to a particular frequency interval. The width of the frequency interval represented by each transform coefficient is either fixed or variable. The width of the frequency interval represented by transform coefficients generated by a Fourier-based transform such as the Discrete Fourier Transform (DFT) or a Discrete Cosine Transform (DCT) is fixed. The width of the frequency interval represented by transform coefficients generated by a wavelet or wavelet-packet transform is variable and typically grows larger with increasing frequency. For example, see A. Akansu, R. Haddad, “Multiresolution Signal Decomposition, Transforms, Subbands, Wavelets,” Academic Press, San Diego, 1992.
One type of signal processing that may be used to recover a block of audio information from the perceptually encoded signal obtains a set of control parameters and a representation of quantized spectral components from the encoded signal and uses this set of parameters to derive spectral components for synthesis into a block of audio information. The synthesis is complementary to the analysis used to generate the encoded signal. A synthesis using a frequency-domain to time-domain transformation is common.
In many coding applications, the bandwidth or space that is available to transmit or record an encoded signal is limited and this limitation imposes severe constraints on the amount of data that may be used to represent the quantized spectral components. Data needed to convey sets of control parameters are an overhead that further reduces the amount of data that may be used to represent the quantized spectral components.
In some coding systems, one set of control parameters is used to encode each block of audio information. One known technique for reducing the overhead in these types of coding systems is to control the encoding processes in such a way that only one set of control parameters is needed to recover multiple blocks of audio information from an encoded signal. If the encoding process is controlled so that ten blocks share one set of control parameters, for example, the overhead for these parameters is reduced by ninety percent. Unfortunately, audio signals are not stationery and the efficiency of the encoding process for all blocks of audio information in a frame may not be optimum if the control parameters are shared by too many blocks. What is needed is a way to optimize the signal processing efficiency by controlling that processing to reduce the overhead needed to convey control parameters.