Digital multimedia capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless communication devices, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras, digital recording devices, video gaming devices, video game consoles, cellular or satellite radio telephones, digital media players, and the like. Digital multimedia devices may implement video coding techniques, such as MPEG-2, ITU-H.263, MPEG-4, or ITU-H.264/MPEG-4 Part 10, Advanced Video Coding (AVC), to transmit and receive or store and retrieve digital video data more efficiently. Video encoding techniques may perform video compression via spatial and temporal prediction to reduce or remove redundancy inherent in video sequences.
In video encoding, the compression often includes spatial prediction, motion estimation and motion compensation. Intra-coding relies on spatial prediction and transform coding, such as discrete cosine transform (DCT), to reduce or remove spatial redundancy between video blocks within a given video frame. Inter-coding relies on temporal prediction and transform coding to reduce or remove temporal redundancy between video blocks of successive video frames of a video sequence. Intra-coded frames (“I-frames”) are often used as random access points as well as references for the inter-coding of other frames. I-frames, however, typically exhibit less compression than other frames. The term I-units may refer to I-frames, I-slices or other independently decodable portions of an I-frame.
For inter-coding, a video encoder performs motion estimation to track the movement of matching video blocks between two or more adjacent frames or other coded units, such as slices of frames. Inter-coded frames may include predictive frames (“P-frames”), which may include blocks predicted from a previous frame, and bidirectional predictive frames (“B-frames”), which may include blocks predicted from a previous frame and a subsequent frame of a video sequence. Conventional motion-compensated video coding techniques compare a video block to other video blocks of a previous or subsequent video frame in order to identify predictive video data that may be used to encode the current video block. A video block may be broken into sub-block partitions to facilitate higher quality coding.
A coded video block may be represented by prediction information that can be used to create or identify a predictive block, and a residual block of data indicative of differences between the block being coded and the predictive block. The prediction information may comprise the one or more motion vectors that are used to identify the predictive block of data. Given the motion vectors, the decoder is able to reconstruct the predictive blocks that were used to code the residual. Thus, given a set of residual blocks and a set of motion vectors (and possibly some additional syntax), the decoder may be able to reconstruct a video frame that was originally encoded. An encoded video sequence may comprise blocks of residual data, motion vectors, and possibly other types of syntax.
Template matching is a technique that can be used to eliminate motion vectors, yet still provide advantages of motion-compensated video coding. In template matching, neighboring pixels relative to video block being coded can define a template, and this template (rather than the video block being coded) can be compared to the data of a previous or subsequent video frame. Both the video encoder and the video decoder can perform the template matching process to identify motion without the use of motion vectors. Thus, with template matching, the motion vectors are not coded into bitstream. Rather, the motion vectors are essentially derived from the template matching process as the frame is encoded and decoded.