1. Technical Field
This invention is directed toward a system and method for encoding and decoding data. More specifically, the invention is directed toward a system and method for encoding and/or decoding data, such as, for example audio or video data, by employing a reversible transform obtained via matrix lifting.
2. Background Art
High performance audio codec brings digital music into reality. Popular audio compression technologies, such as MPEG-1 layer 3 (MP3), MPEG4 audio, Real Audio and Windows Media Audio (WMA), are lossy in nature. In these compression technologies, the audio waveform is distorted in exchange for a higher compression ratio. In quality critical applications such as a professional recording/editing studio, it is imperative to preserve the original audio. That is, the audio should be compressed in a lossless fashion. An especially attractive feature of a lossless audio codec is the progressive-to-lossless codec, where the audio is compressed into a lossless bitstream, which may be further truncated at an arbitrary point to provide a lossy bitstream of lesser bitrate without re-encoding. Thus, progressive-to-lossless media codec offers the greatest flexibility in compression. During initial encoding, the media may be compressed to lossless, which preserves all of the information of the original media. Later, if the transmission bandwidth or the storage space is insufficient to accommodate the full lossless media, the compressed media bitstream may be effortlessly truncated to whatever bitrate is desired. The state-of-the-art image compression algorithm, the JPEG 2000[1], has the progressive-to-lossless compression mode. However, no existing audio codec operates in the progressive-to-lossless mode.
A primary reason for the lack of progressive-to-lossless audio codec is due to the lack of high quality reversible transform. Most lossless audio coding approaches, such as [8][9][10], are built upon a lossy audio coder. The audio is first encoded with an existing lossy codec, then the residue error between the original audio and the lossy coded audio is encoded. The resultant compressed bitstream has two rate points, the lossy base bitrate and the lossless bitrate. It may not be scaled at other bitrate points. Since the quantization noise in the lossy coder is difficult to model, such approaches usually lead to a drop in the lossless compression efficiency. Moreover, this coding approach is also more complex, as it requires the implementation of a base coder and a residue coder. Some other approaches, e.g., [11], build the lossless audio coder directly through a predictive filter and then encode the prediction residue. The approaches may achieve good lossless compression performance. However, there is still no scalability of the resultant bitstream.
There are many existing schemes for encoding audio files. Several such schemes attempt to achieve higher compression ratios by using known human psychoacoustic characteristics to mask the audio file. A psychoacoustic coder is an audio encoder which has been designed to take advantage of human auditory masking by dividing the audio spectrum of one or more audio channels into narrow frequency bands of different sizes optimized with respect to the frequency selectivity of human hearing. This makes it possible to sharply filter coding noise so that it is forced to stay very close in frequency to the frequency components of the audio signal being coded. By reducing the level of coding noise wherever there are no audio signals to mask it, and increasing the level of coding noise wherever there are strong audio signals, the sound quality of the original signal can be subjectively preserved. Using human psychoacoustic hearing characteristics in audio file compression allows for fewer bits to be used to encode the audio components that are less audible to the human ear. Conversely, more bits can then be used to encode any psychoacoustic components of the audio file to which the human ear is more sensitive. Such psychoacoustic coding makes it possible to greatly improve the quality of an encoded audio at given bit rate.
Psychoacoustic characteristics are typically incorporated into an audio coding scheme in the following way. First, the encoder explicitly computes auditory masking thresholds of a group of audio coefficients, usually a “critical band,” to generate an “audio mask.” These thresholds are then transmitted to the decoder in certain forms, such as, for example, the quantization step size of the coefficients. Next, the encoder quantizes the audio coefficients according to the auditory mask. For auditory sensitive coefficients, those to which the human ear is more sensitive, a smaller quantization step size is typically used. For auditory insensitive coefficients, those to which the human ear is less sensitive, a larger quantization step size is typically used. The quantized audio coefficients are then typically entropy encoded, either through a Huffman coder such as the MPEG4 AAC quantization and coding, a vector quantizer such as the MPEG-4 TwinVQ, or a scalable bitplane coder such as the MPEG-4 BSAC coder.
In each of the aforementioned conventional audio coding schemes, the auditory masking is applied before the process of entropy coding. Consequently, the masking threshold is transmitted to the decoder as overhead information. As a result, the quality of the encoded audio at a given bit rate is reduced to the extent of the bits required to encode the auditory masking threshold information. Additionally, these audio coding schemes typically use floating point values in their calculations. Floating point arithmetic varies across platforms and thus coding schemes that use floating points are not readily transportable across these different types of platforms.
Therefore, what is needed is a system and method for encoding or decoding media data, such as, for example, audio or video data, wherein the bitstream can be scaled to whatever bitrate is desired. This system and method should be computationally efficient, while minimizing quantization noise. This encoding and decoding scheme should be portable across different types of platforms and operate in lossy and progressive-to-lossless modes.
It is noted that in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, “reference [1]” or simply “[1]”. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section.