The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches.
The transmission and storage of computer data increasingly relies on the use of codecs (coder-decoders) to compress/decompress digital media files to reduce the file sizes to manageable sizes to optimize transmission bandwidth and memory resources. Transform coding is a common type of data compression for data such as audio signals or graphic images that helps reduce signal bandwidth through the elimination of certain information in the signal. However, this transformation is typically lossy in that the output is of lower quality than the original input. Specific compression techniques that are actually deployed may depend on the type of signal that is being processed. For example, a color graphic image may be compressed by examining small blocks of the image and averaging out the color using a discrete cosine transform (DCT) to form an image with far fewer colors in total; and an audio signal may be compressed by analyzing the transformed data according to a psychoacoustic model or other techniques that describe or model the human ear's sensitivity to parts of the signal. Although in many cases the reduction in quality from the compression may be imperceptible upon decompression and playback, certain types of content, such as high contrast (large transitions in the frequency domain) or transient (fast transitions in the time domain) signals may pose problems.
Many present compression techniques do not adequately address the problem of compression artifacts, which is the noticeable distortion caused by the application of lossy data compression. Such artifacts can be manifested as pre-echo, warbling, or ringing in audio signals, or ghost images in video data. Such artifacts are often encountered through conventional transform coding schemes applied to signals that vary greatly over time, such as speech or music. Such a signal may change drastically within a transform block, yet the level of quantization noise will remain constant within this block. Without a switch to shorter transform lengths, the equal distribution of quantization noise in compressing a transient signal can generate audible artifacts. One known approach to address this problem is temporal noise shaping, which uses a prediction approach in the frequency domain to shape the quantization noise over time. Temporal noise shaping applies a filter to the original spectrum and quantizes the filtered signal. The quantized filter coefficients are transmitted in the bitstream and used in the decoder to undo the filtering leading to a temporally shaped distribution of quantization noise in the decoded audio signal. The temporal noise shaping method is essentially a parametric method that requires the system to transmit the temporal shape based on a prediction of the shape, thus adding a degree of processing overhead to the overall coding/decoding process.
A common technique to reduce the quality degradation associated with compression processes is sub-band coding, which breaks a signal into a number of different frequency bands and encodes each one separately. Traditional sub-band audio codecs divide the signal into overlapping blocks and use a filter bank to extract the content of the signal at varying frequencies that are grouped into bands. In the audio spectrum, the size of the bands may vary to match properties of the human ear. One difficulty with this framework is selecting the right trade-off of time resolution (the size of the blocks) against frequency resolution (the size of the filter bank). For example, for transient sounds, it is preferable to have good time resolution (small blocks), while for tonal signals, it is preferable to have good frequency resolution (large blocks). In some cases, transients and tones may be present at the same time and in different regions of the spectrum. Present sub-band coding systems typically cannot accommodate both cases simultaneously. Thus, it would be useful to have the ability to select the resolution on a per-band basis in a sub-band based codec.
It is also desirable to use certain available coding information to optimize the cost of TF resolution changes. For instance, although each band is typically coded as a separate entity, there may still be dependencies between the bands. For example, one known codec predicts the energy level of a band from the coded energy level of the previous band. In this case, the coding cost for each possible T-F resolution in one band may depend on the actual coded T-F resolution in the previous band. Such information can be used to optimize the coding cost of different coding options.