Data compression occurs in a number of contexts. It is very commonly used in communications and computer networking to store, transmit, and reproduce information efficiently. It finds particular application in the encoding of images, audio and video. Video presents a significant challenge to data compression because of the large amount of data required for each video frame and the speed with which encoding and decoding often needs to occur. The current state-of-the-art for video encoding is the ITU-T H.264/AVC video coding standard. It defines a number of different profiles for different applications, including the Main profile, Baseline profile and others. A next-generation video encoding standard is currently under development through a joint initiative of MPEG-ITU termed High Efficiency Video Coding (HEVC/H.265).
There are a number of standards for encoding/decoding images and videos, including H.264 and HEVC/H.265, that use block-based coding processes. In these processes, the image or frame is partitioned into blocks and the blocks are spectrally transformed into coefficients, quantized, and entropy encoded. In many cases, the data being transformed is not the actual pixel data, but is residual data following a prediction operation. Predictions can be intra-frame, i.e. block-to-block within the frame/image, or inter-frame, i.e. between frames (also called motion prediction).
When spectrally transforming residual data, many of these standards prescribe the use of a discrete cosine transform (DCT) or some variant thereon. The resulting DCT coefficients are then quantized using a quantizer to produce quantized transform domain coefficients, or indices.
The block or matrix of quantized transform domain coefficients (sometimes referred to as a “transform unit”) is then entropy encoded using a particular context model. In H.264/AVC and HEVC/H.265, the quantized transform coefficients are encoded by (a) encoding a last significant coefficient position indicating the location of the last non-zero coefficient in the transform unit, (b) encoding a significance map indicating the positions in the transform unit (other than the last significant coefficient position) that contain non-zero coefficients, (c) encoding the magnitudes of the non-zero coefficients, and (d) encoding the signs of the non-zero coefficients.
Scalable video coding involves encoding a reference layer and an enhancement layer (and, in some cases, additional enhancement layers, some of which may also serve as reference layers). The reference layer is encoded using a given video codec. The enhancement layer is encoded using the same video codec, but the encoding of the enhancement layer may take advantage of information from the reconstructed reference layer to improve its compression. In particular, in the case of spatial scalable video compression (where the reference layer is a scaled-down version of the enhancement layer), a temporally co-located reconstructed reference layer frame may be used as the reference frame for a prediction in the equivalent frame at the enhancement layer. This is termed “inter-layer” prediction.
It would be advantageous to develop scalable video coding and decoding processes that improve compression at the enhancement layer.
Similar reference numerals may have been used in different figures to denote similar components.