Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless communication devices, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, video gaming devices, video game consoles, cellular or satellite radio telephones, and the like. Digital video devices may implement block-based video compression techniques, such as those defined by the MPEG-2, MPEG-4, ITU-T H.261, H.263, or H.264/MPEG-4, Part 10, Advanced Video Coding (AVC) standards, to transmit and receive digital video more efficiently. Video compression techniques perform spatial and temporal prediction to reduce or remove redundancy inherent in video sequences.
Spatial prediction reduces redundancy between neighboring video blocks within a given video frame. Temporal prediction, also known as motion estimation and compensation, reduces temporal redundancy between video blocks in past and/or future video frames of a video sequence. For temporal prediction, a video encoder performs motion estimation to track the movement of matching video blocks between two or more adjacent frames. Motion vectors indicate the displacement of video blocks relative to corresponding prediction video blocks in one or more reference frames. Motion compensation uses the motion vectors to identify prediction video blocks from a reference frame. A residual video block is formed by subtracting the prediction video block from the original video block to be coded. The residual video block can be sent to a video decoder along with the motion vector, and the decoder can use this information to reconstruct the original video block or an approximation of the original video block. The video encoder may apply transform, quantization and entropy coding processes to further reduce the bit rate associated with the residual block.
Some video coding makes use of scalable coding techniques, which may be particularly desirable for wireless communication of video data. In general, scalable video coding (SVC) refers to video coding in which a video data is represented by a base layer and one or more enhancement layers. For SVC, a base layer typically carries video data with a base spatial, temporal and/or signal to noise ratio (SNR) level. One or more enhancement layers carry additional video data to support higher spatial, temporal and/or SNR levels.
For spatial scalability, enhancement layers add spatial resolution to frames of the base layer. In SVC systems that support spatial scalability, inter-layer prediction may be used to reduce the amount of data needed to convey the enhancement layer. In inter-layer prediction, enhancement layer video blocks may be coded using predictive techniques that are similar to motion estimation and motion compensation. In particular, enhancement layer video residual data blocks may be coded using reference blocks in the base layer. However, the base and enhancement layers have different spatial resolutions. Therefore, the base layer video data may be upsampled to the spatial resolution of the enhancement layer video data, e.g., to form reference blocks for generation of the enhancement layer residual data.