Directly digitized still images and video requires many “bits”. Accordingly, it is common to compress images and video for storage, transmission, and other uses. Most image and video compressors share a basic architecture, with variations. The basic architecture has three stages: a transform stage, a quantization stage, and an entropy coding stage, as shown in FIG. 1.
Video “codecs” (compressor/decompressor) are used to reduce the data rate required for data communication streams by balancing between image quality, processor requirements (i.e. cost/power consumption), and compression ratio (i.e. resulting data rate). The currently available compression approaches offer a different range of trade-offs, and spawn a plurality of codec profiles, where each profile is optimized to meet the needs of a particular application.
The intent of the transform stage in a video compressor is to gather the energy or information of the source picture into as compact a form as possible by taking advantage of local similarities and patterns in the picture or sequence. Compressors are designed to work well on “typical” inputs and ignore their failure to compress “random” or “pathological” inputs.
Many image compression and video compression methods, such as MPEG-2, use the discrete cosine transform (DCT) as the transform stage.
Some newer image compression and video compression methods, such as MPEG-4 textures, use various wavelet transforms as the transform stage.
A wavelet transform comprises the repeated application of wavelet filter pairs to a set of data, either in one dimension or in more than one. For image compression, a 2 D wavelet transform (horizontal and vertical) can be used. For video data streams, a 3 D wavelet transform (horizontal, vertical, and temporal) can be used.
Prior Art FIG. 2 shows an example 100 of trade-offs among the various compression algorithms currently available. As shown, such compression algorithms include wavelet-based codecs 102, and DCT-based codecs 104 that include the various MPEG video distribution profiles.
2D and 3D wavelets, as opposed to DCT-based codec algorithms, have been highly regarded due to their pleasing image quality and flexible compression ratios, prompting the JPEG committee to adopt a wavelet algorithm for its JPEG2000 still image standard. Unfortunately, most wavelet implementations use very complex algorithms, requiring a great deal of processing power, relative to DCT alternatives. In addition, wavelets present unique challenges for temporal compression, making 3D wavelets particularly difficult.
For these reasons, wavelets have never offered a cost-competitive advantage over high volume industry standard codecs like MPEG, and have therefore only been adopted for niche applications. There is thus a need for a commercially viable implementation of 3D wavelets that is optimized for low power and low cost focusing on three major market segments.
For example, small video cameras are becoming more widespread, and the advantages of handling their signals digitally are obvious. For instance, the fastest-growing segment of the cellular phone market in some countries is for phones with image and video-clip capability. Most digital still cameras have a video-clip feature. In the mobile wireless handset market, transmission of these still pictures and short video clips demand even more capacity from the device battery. Existing video coding standards and digital signal processors put even more strain on the battery.
Another new application is the Personal Video Recorders (PVR) that allow a viewer to pause live TV and time-shift programming. These devices use digital hard disk storage to record the video, and require video compression of analog video from a cable. In order to offer such features as picture-in-picture and watch-while-record, these units require multiple video compression encoders.
Another growing application area is the Digital Video Recorders (DVR) for surveillance and security video. Again, compression encoding is required for each channel of input video to be stored. In order to take advantage of convenient, flexible digital network transmission architectures, the video often is digitized at the camera. Even with the older multiplexing recorder architecture, multiple channel compression encoders are used.
Of course, there are a vast number of other markets which would benefit from a commercially viable compression scheme that is optimized for low power and low cost.
Entropy Coding
The goal of entropy coding (also known as “Source Coding” in the literature) is generally to produce, from a message or source of information, a shorter message that can later be decoded back into the original message, preferably exactly as the original. Typically this is done by dividing the source message into “symbols” and processing the message symbol-by-symbol, rather than by looking up larger blocks or even the entire input message (such as an image or a video GOP) in an excessively large codebook.
The class of entropy coders that works on fixed-size input symbols, and produces for each a variable-length bit string, is known in the literature as “block to variable coders”.
Two Typical Ways to Encode a Symbol
Given an input symbol to encode, one way to do the encoding is to take the symbol as an index and look it up in a table called a “codebook”. The entry found in the codebook is the encoded output for the symbol. The codebook is typically large enough to provide an entry for every possible symbol.
In some implementations, a single random access to a table is very fast and efficient. However, in other implementations, random access to a large table is either relatively slow (because of cache memory loading) or relatively expensive (because of the cost of on-chip memory, as in an FPGA or ASIC).
A second typical scheme for encoding a symbol is to do some computational operations on its representation, usually a binary bit string, that produce the encoded output as their result. In this way, the output is produced without the need for a large codebook.
In some implementations, such computation is reasonably fast and efficient. However, in other implementations, multiple steps of computation may be needed and are relatively slow.
A decoder must be able to determine the length of each variable-size bit string (i.e. codeword) that is to be decoded back into a symbol. This is generally done by arranging for the codewords to have the “Huffman prefix property”: that no codeword is a prefix of any other codeword.
Distributions
Entropy coding as described above works by taking advantage of non-uniform probability among the symbols. When a symbol has high probability of occurrence (meaning it occurs frequently in the message or source), it is encoded with a short codeword. When a symbol has a low probability of occurrence (meaning it occurs rarely in the message or source), it is encoded with a longer codeword. Thus the encoded output, with many short codewords and few long codewords, is usually shorter than the input.
An optimum encoding, as described by Shannon (C. E. Shannon, The Mathematical Theory of Communications, Bell System Technical Journal, July & October 1948), has the length of each output codeword inversely logarithmically related to the probability of the occurrence of its corresponding symbol in the source input. This is usually not achieved exactly, but encoder designs try to approximate it.
Therefore the probability distribution of the symbols is known, measured, approximated or assumed in order to design an entropy code that is effective.
For some distributions, the computational method of encoding can be done with very few steps, while for others many steps are needed to compute a good encoding.
In video compression work, the probability distribution of quantized coefficients can sometimes be awkward. In other words, the distribution is not one with a known fast computational encoding, but the number of possible values requires a codebook too large to fit in the available lookup storage.
Therefore, what is needed is an encoding scheme that is optimally matched to a known or measured probability distribution, but that does not require an excessively large lookup table.