Directly digitized still images and video requires many “bits”. Accordingly, it is common to compress images and video for storage, transmission, and other uses. Most image and video compressors share a basic architecture, with variations. The basic architecture has three stages: a transform stage, a quantization stage, and an entropy coding stage, as shown in FIG. 1.
Video “codecs” (compressor/decompressor) are used to reduce the data rate required for data communication streams by balancing between image quality, processor requirements (i.e. cost/power consumption), and compression ratio (i.e. resulting data rate). The currently available compression approaches offer a different range of trade-offs, and spawn a plurality of codec profiles, where each profile is optimized to meet the needs of a particular application.
The intent of the transform stage in a video compressor is to gather the energy or information of the source picture into as compact a form as possible by taking advantage of local similarities and patterns in the picture or sequence. Compressors are designed to work well on “typical” inputs and ignore their failure to compress “random” or “pathological” inputs.
Many image compression and video compression methods, such as MPEG-2, use the discrete cosine transform (DCT) as the transform stage.
Some newer image compression and video compression methods, such as MPEG-4 textures, use various wavelet transforms as the transform stage.
A wavelet transform comprises the repeated application of wavelet filter pairs to a set of data, either in one dimension or in more than one. For image compression, a 2 D wavelet transform (horizontal and vertical) can be used. For video data streams, a 3 D wavelet transform (horizontal, vertical, and temporal) can be used.
Prior Art FIG. 2 shows an example 100 of trade-offs among the various compression algorithms currently available. As shown, such compression algorithms include wavelet-based codecs 102, and DCT-based codecs 104 that include the various MPEG video distribution profiles.
2D and 3D wavelets, as opposed to DCT-based codec algorithms, have been highly regarded due to their pleasing image quality and flexible compression ratios, prompting the JPEG committee to adopt a wavelet algorithm for its JPEG2000 still image standard. Unfortunately, most wavelet implementations use very complex algorithms, requiring a great deal of processing power, relative to DCT alternatives. In addition, wavelets present unique challenges for temporal compression, making 3D wavelets particularly difficult.
For these reasons, wavelets have never offered a cost-competitive advantage over high volume industry standard codecs like MPEG, and have therefore only been adopted for niche applications. There is thus a need for a commercially viable implementation of 3D wavelets that is optimized for low power and low cost focusing on three major market segments.
For example, small video cameras are becoming more widespread, and the advantages of handling their signals digitally are obvious. For instance, the fastest-growing segment of the cellular phone market in some countries is for phones with image and video-clip capability. Most digital still cameras have a video-clip feature. In the mobile wireless handset market, transmission of these still pictures and short video clips demand even more capacity from the device battery. Existing video coding standards and digital signal processors put even more strain on the battery.
Another new application is the Personal Video Recorders (PVR) that allow a viewer to pause live TV and time-shift programming. These devices use digital hard disk storage to record the video, and require video compression of analog video from a cable. In order to offer such features as picture-in-picture and watch-while-record, these units require multiple video compression encoders.
Another growing application area is the Digital Video Recorders (DVR) for surveillance and security video. Again, compression encoding is required for each channel of input video to be stored. In order to take advantage of convenient, flexible digital network transmission architectures, the video often is digitized at the camera. Even with the older multiplexing recorder architecture, multiple channel compression encoders are used.
Of course, there are a vast number of other markets which would benefit from a commercially viable compression scheme that is optimized for low power and low cost.
Temporal Compression
Video compression methods normally do more than compress each image of the video sequence separately. Images in a video sequence are often similar to the other images in the sequence nearby in time. Compression can be improved by taking this similarity into account. Doing so is called “temporal compression”. One conventional method of temporal compression, used in MPEG, is motion search. In this method, each region of an image being compressed is used as a pattern to search a range in a previous image. The closest match is chosen, and the region is represented by compressing only its difference from that match.
Another method of temporal compression is to use wavelets, just as in the spatial (horizontal and vertical) directions, but now operating on corresponding pixels or coefficients of two or more images. This is called 3D wavelets, for the three “directions” horizontal, vertical, and temporal.
Temporal compression, by either method or any other, compresses an image and a previous image together. In general, a number of images is compressed together temporally. As embodied on the present invention, this set of images is called a Group of Pictures or GOP.
Subbands
The output of a wavelet transform contains coefficients that represent “lowpass” or “scale” or “sum” information, that is generally common information over several pixels. The output also contains coefficients that represent “highpass” or “wavelet” or “difference” information, that generally represents how the pixels differ from their common information. The repeated application of wavelet filters results in numerous different combinations of these types of information in the output. Each distinct combination is referred to as a “subband”. The terminology arises from a frequency-domain point of view, but in general does not exactly correspond to a frequency band.
The wavelet transform produces very different value distributions in the different subbands of its output. The information that was spread across the original pixels is concentrated into some of the subbands leaving others mostly zero. This is desirable for compression.
Run-of-Zeros Compression
An intermediate step in some image and video compression algorithms is run-of-zeros elimination, which can be implemented by “piling” (see co-pending U.S. Patent Application 2003/0229773). In the run-of-zeros step, the coefficients of a subband (or a group of subbands) are compressed, crudely but very efficiently. The run-of-zeros step removes runs of zero values from the data, while preserving a record of where these zero values occurred. Run-of-zeros elimination can be applied at any point in the algorithm. In one embodiment, it is applied just following the quantization stage, before entropy coding. After run-of-zeros, the succeeding steps can be computed much faster because they only need to operate on significant (non-zero) information.
Piling has great value on computing engines that process multiple values in parallel, as it is a way to do zero-elimination that takes advantage of the available parallelism. In contrast, other methods run-of-zeros elimination (run-length coding) typically take as much time as it would take to eliminate the zeros during the entropy encoding.
Storage Area Per Subband
In some compression implementations according to the present invention, it is advantageous to construct a separate pile or run-of-zeros compressed storage area for each subband, or for a group of similar subbands, or in some cases multiple areas for a single subband. An advantage arises out of the sequence in which the subband results become available and other details of the algorithm. Thus instead of a single storage area as an intermediate representation for a picture or GOP, there is a set of storage areas or piles.
Rate Control
One method of adjusting the amount of compression, the rate of output bits produced, is to change the amount of information discarded in the quantization stage of the computation. Quantization is conventionally done by dividing each coefficient by a pre-chosen number, the “quantization parameter”, and discarding the remainder of the division. Thus a range of coefficient values comes to be represented by the same single value, the quotient of the division.
When the compressed image or GOP is decompressed, the inverse quantization process step multiplies the quotient by the (known) quantization parameter. This restores the coefficients to their original magnitude range for further computation.
However, division (or equivalently multiplication) is an expensive operation in many implementations, in terms of power and time consumed, and in hardware cost. Note that the quantization operation is applied to every coefficient, and that there are usually as many coefficients as input pixels.
In another method, instead of division (or multiplication), quantization is limited to divisors that are powers of 2. This has the advantage that it can be implemented by a bit-shift operation on binary numbers. Shifting is very much less expensive operation in many implementations. An example is integrated circuit (FPGA or ASIC) implementation; a multiplier circuit is very large, but a shifter circuit is much smaller. Also, on many computers, multiplication requires longer time to complete, or offers less parallelism in execution, compared to shifting.
While quantization by shifting is very efficient with computation, it has a disadvantage for some purposes: it only allows coarse adjustment of the compression rate (output bit rate). According to aspects of the present invention, It is observed in practice that changing the quantization shift parameter by the smallest possible amount, +1 or −1, results in nearly a 2-fold change in the resulting bit rate. For some applications of compression, this is quite acceptable. For other applications, finer rate control is required.
Therefore, there is a need to meet the requirement of finer rate control without abandoning quantization by shifting, and its associated efficiency.