Coding systems are often used to reduce the amount of information required to adequately represent a source signal. By reducing information capacity requirements, a signal representation can be transmitted over channels having lower bandwidth or stored on media using less space. Coding can reduce the information capacity requirements of a source signal by removing either redundant components or irrelevant components in the signal. So called perceptual coding methods and systems often use filter banks to reduce redundancy by decorrelating a source signal using a basis set of spectral components, and reduce irrelevancy by adaptive quantization of the spectral components according to psycho-perceptual criteria.
Many perceptual coding systems implement the filter banks by block transforms. In an audio coding system, for example, a source audio signal, which is represented by time segments or blocks of time-domain samples, is transformed into sets of frequency-domain coefficients representing the spectral content of the source signal. The length of the segments establishes both the time resolution and the frequency resolution of the filter bank. Time resolution increases as the segment length decreases. Frequency resolution increases as the segment length increases. Because of this relationship, the choice of segment length imposes a trade off between the time and frequency resolution of a block transform filter bank.
No single choice of segment length can provide an optimum trade off between resolutions for all of the source signal conditions that are encountered by typical coding systems. Slowly varying or stationary source signals generally can be encoded more efficiently if the filter bank has a higher frequency resolution, which can be provided if a longer segment length is used. Rapidly varying or highly non-stationary source signals generally can be encoded more efficiently if the filter bank has a higher time resolution, which can be provided if a shorter segment length is used. By adapting the segment length in response to changing source signal conditions, a block transform filter bank can optimize the trade off between its time and frequency resolution.
A large variety of transforms may be used to implement filter banks in audio coding systems, for example, but a particular Modified Discrete Cosine Transform (MDCT) is widely used because it has several very attractive properties for audio coding including the ability to provide critical sampling while allowing adjacent source signal segments to overlap one another. The MDCT is also attractive because it is able to remove substantially all redundant components in a source signal that is substantially stationary within a segment. Proper operation of the MDCT filter bank requires the use of overlapped source-signal segments and window functions that satisfy certain criteria described in Princen et al., “Subband/Transform Coding Using Filter Bank Designs Based on Time Domain Aliasing Cancellation,” Proc. of the 1987 International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 1987, pp. 2161-64. Unfortunately, it is difficult to adapt the time and frequency resolution of MDCT filter banks in response to signal conditions because of the requirements imposed on the window functions that must be applied to overlapping source signal segments.
One known technique sometimes referred to as “window switching” is able to adapt the time resolution of a MDCT filter bank by adaptively switching between two different window functions in response to the detection of certain signal conditions such as abrupt signal changes or amplitude transients. According to this technique, which is described in U.S. Pat. No. 5,214,742 by Edler, issued May 25, 1993, and incorporated herein by reference, segment lengths are not changed but the time resolution is adapted by switching between different window function shapes to reduce the number of non-zero samples in each segment that are transformed by the filter bank. Unfortunately, this technique does not adapt the frequency resolution of the filter bank and the frequency selectivity of the filter bank is seriously degraded whenever the time resolution is reduced because the shape of the window functions needed for window switching must be suboptimal to satisfy the requirements for proper operation of the MDCT.
Another known technique sometimes referred to as “block switching” is similar to the window-switching technique mentioned above in that it also switches between different window function shapes, but the block-switching technique is able to adapt both time and frequency resolutions of a MDCT filter bank by also adaptively switching between two different segment lengths in response to the detection of certain signal conditions such as abrupt signal changes or amplitude transients. This technique is used in the Advanced Audio Coder (AAC), which is described in Bosi et al., “ISO/IEC MPEG-2 Advanced Audio Coding,” J. Audio Eng. Soc., vol. 45, no. 10, October 1997, pp. 789-814, and incorporated herein by reference.
In AAC, a MDCT filter bank is applied to stationary source signal segments having a length equal to 2048 samples and is applied to non-stationary source signal segments having a length equal to 256 samples. Block switching is achieved in AAC by using “long window functions” that are appropriate for the longer segments, “short window functions” that are appropriate for the shorter segments, a “long-to-short bridging window function” that allows switching from the longer segment length to the shorter segment length, and a “short-to-long bridging window function” that allows switching from the shorter segment length to the longer segment length. The two bridging window functions allow switching between different segment length while satisfying the criteria necessary for proper operation of the MDCT. A switch from a longer segment length to a shorter segment length and back to the longer length is accomplished by applying the MDCT to a long segment using the long-to-short bridging window function, applying the MDCT to an integer multiple of eight short segments using the short window function, and applying the MDCT to a long segment using the short-to-long bridging window function. Immediately thereafter, the MDCT must be applied to a long segment but the long window function may be used or the long-to-short bridging window function may be used if another block switch is desired.
Although block switching does provide a way to adapt the time and frequency resolution of a MDCT filter bank, it is not an ideal solution for several reasons. One reason is that the frequency selectivity of the transform is degraded during a switch of block lengths because the shape of the bridge window functions must be suboptimal to allow segment-length switching and to satisfy requirements for proper operation of the MDCT. Another reason is that a switch cannot occur at any arbitrary time. As explained above, the MDCT must be applied to another long segment immediately after switching to the longer segment length. An immediate switch to the shorter length is not possible. This block switching technique also is not an ideal solution because the switching mechanism provides only two segment lengths, which are not optimum for all signal conditions. For example, the two segment lengths in AAC are not optimal because neither the longer nor the shorter segment length in AAC is optimum for most speech signal segments. The 2048-sample segments are usually too long for the non-stationary nature of speech and the 256-sample segments are usually too short to remove redundant components effectively. Furthermore, there are many stationary signals for which a segment length longer than 2048 samples would be more optimum. As a result, the performance of AAC is impaired by the limited ability of block switching to adapt the time and frequency resolution of a MDCT filter bank.
Another form of block switching is used in coding systems that conform to the Dolby Digital encoded bit stream standard. This coding standard, sometimes referred to as AC-3, is described in the Advanced Television Systems Committee (ATSC) A/52A document entitled “Revision A to Digital Audio Compression (AC-3) Standard” published Aug. 20, 2001, and incorporated herein by reference. The form of block switching used in AC-3 coding systems applies a MDCT to source signal segments of either 512 samples for stationary signals or 256 samples for non-stationary signals. The block switching technique used in AC-3 coding systems provides more flexibility in choosing when length switches are made. Furthermore, coding performance is reasonably good for non-stationary source signals like speech; however, the coding performance for signals that are more stationary is limited by the relatively low frequency resolution provided by the longer segment.
Other techniques for adaptive control of the time and frequency resolution of a MDCT filter bank are described in U.S. Pat. No. 5,394,473 by Davidson, which issued Feb. 28, 1995 and are incorporated herein by reference. Some of these techniques allow a MDCT filter bank to be applied to segments of essentially any length using window functions that provide much a better frequency response than is possible by other known techniques. Unfortunately, these techniques must adapt the kernel or basis functions of the MDCT and are, therefore, incompatible with existing bit stream standards like the AC-3 standard mentioned above. These techniques are also computationally intensive.