Video (e.g., television) consists of a sequence of image frames. In modern video storage and transmission systems, the image frames are converted to digital bits (“encoded”). Various encoding techniques are employed to reduce the number of bits used (or to improve image quality for the same number of bits). To view a video, the digital bits are converted back to image frames (“decoded”), which are presented on a display.
A common encoding technique uses motion compensation, in which an estimate is made of the magnitude and direction of motion of an image from one frame to the next, to provide an estimate of the next frame, and only the difference between the estimate and the actual frame (the “motion-compensated residual”) is encoded. The amplitude of the residual will generally be much smaller than the intensity of the image, itself, and so fewer bits are needed for accurate encoding. Motion compensation can be used for all or just a portion of a frame.
Another common encoding technique, often used in conjunction with motion compensation, is transforming the intensity of the image (or the amplitude of the residual) to, e.g., a spatial frequency domain, and then digitizing the transform coefficients. For the same number of bits, transform encoding generally produces a higher quality image.
When image intensity is transformed and digitized, it is called “intra-frame” coding, as only information from the same frame is used. When residual amplitude is transformed and coded it is called “inter-frame” coding, as information from the current and at least one prior frame is used.
There are various advantages in using intra-frame coding. For example, because intra-frame coding does not involve other frames, it can be decoded without decoding other frames. This can be useful when a viewer changes television channels. In the United States digital television standard, an entire frame, referred to as an I-frame, is periodically encoded using intra-frame coding. When a channel change occurs, the television receiver can wait for the I-frame and begin decoding from that frame. It is also useful for VCR or DVD type applications, wherein only I-frames may be decoded to provide images during fast-forwarding. Also, intra-frame coding reduces the effect of error propagation, because errors that occurred in other frames do not affect intra-frame coded regions.
On the other hand, inter-frame encoding is very useful in reducing the bit rate for some image regions. As noted above, motion compensation takes advantage of the fact that scenes often do not change substantially from one frame to the next, and thus once the previous frame is decoded, it can be used to predict portions of the current frame. By encoding only image aspects that cannot be predicted, the bit rate used can be significantly reduced. Except for the I-Frame, in which the entire frame uses only intra-frame coding, some portions of a typical video frame are encoded using intra-frame coding (when motion prediction is not good) and other regions are encoded using inter-frame coding.
The transform that is used in video compression is typically the discrete cosine transform (DCT), which is a block-based transform. The image is divided into many non-overlapping blocks (typically the sizes are 8×8 or 16×16 pixels), and the DCT coefficients of the image intensity, in the case of intra-frame coding, and the motion-compensated residual, in the case of inter-frame coding, are quantized. An advantage of using a non-overlapping region transform (non-ORT) such as the DCT is that each block can be treated separately from other blocks, thus making it simple to mix intra-frame coded blocks and inter-frame coded blocks to form a complete frame.
A significant disadvantage of non-ORTs is the occurrence of blocking artifacts. Because the blocks are treated separately and the image is not perfectly reconstructed after compression, discontinuities can occur along the block boundaries. This becomes particularly evident when the bit rate is low (compression is high). Once the blocking effects occur, they can propagate to other frames as a result of inter-frame coding.
An approach to reducing the effects of blocking artifacts is to use an overlapping region transform (ORT), in which there is overlap in the regions transformed. This can increase the number of transform coefficients to encode, however, as the overlapped regions are represented more than once. But some ORTs, e.g., the lapped orthogonal transform (LOT), utilizes overlapping regions without increasing the number of coefficients relative to DCT.
Transform representations such as the DCT and LOT are related to subband representation. For a subband representation, a signal such as an image or a residual, is filtered by a set of filters, and the results are subsampled. The filtered and subsampled signals are the subband representation of the signal. We will refer to the filtered and subsampled signal as the subband coefficients. The filters used are called analysis filters because they are used in analyzing the signal. A different set of filters, of course, results in different subband coefficients for the same signal. In an image or video compression system based on a subband representation, the subband coefficients are quantized and the quantized coefficients are transmitted in applications such as digital television or stored in a storage medium in applications such as DVD.
The quantized subband coefficients can be used to reconstruct an estimate of the original signal by interpolation and filtering with a set of filters. The filters used in this process are called the synthesis filters. If we choose an appropriate set of analysis and synthesis filters, and perform the appropriate subsampling and interpolation functions, it is possible to reconstruct the original signal exactly from the unquantized subband coefficients. Because of the quantization process which is necessary in a typical image or video compression application, the reconstructed signal is only an estimate (approximation) of the original signal. A block diagram of a conventional signal compression system based on subband representation is shown in FIG. 1 and FIG. 2.
The transform coefficients and subband coefficients of the same signal may be very simply related to each other. For example, the DCT coefficients of a signal can be simply related one-to-one to the subband coefficients by choosing an appropriate set of analysis filters. For this reason, we will refer to the transform representation and subband representation collectively as the transform/subband representation. The transform coefficients and subband coefficients will be collectively referred to as transform/subband coefficients. The DCT and LOT are examples of the transform/subband representations.