Template matching prediction (TMP) can increase the coding efficiency for both inter and intra prediction by avoiding the transmission of motion/displacement information (motion vectors, reference index, and displacement vectors). However, the prediction obtained from using TMP is highly dependent on the correlation between a target block and its neighboring pixels (i.e., the template). Thus, template matching usually uses a relatively small block such as 2×2 or even a pixel as a processing unit. In many block-based encoders and decoders, the basic block size is determined by the transform size, which is usually larger than the block unit that is handled by template matching. The basic block size will be further divided into sub-blocks for template matching prediction. Since template matching exploits the neighboring pixels (e.g., upper and left neighbors) as the template that used to search, the template of some sub-blocks could include some pixels from neighboring sub-blocks. However, the residual can only be added to the prediction after the whole basic block is predicted, since the transform can only be applied once the basic block is fully predicted. Thus, some part of a template could include some predicted data which is not reconstructed by adding residuals. Although using the predicted data in a template can avoid possible blockiness at sub-block boundaries, such use can cause a low quality matching result since the predicted data could have more information loss than the reconstructed data.
Basic template matching prediction is based on the assumption that there exists a lot of repetitive patterns in video pictures. In consideration of the same, template matching searches for similar patterns in decoded video pictures by matching the neighboring pixels. Due to backward prediction characteristics, template matching can avoid the transmission of overhead such as motion vectors or displacement vectors and, thus, improve coding performance. Moreover, template matching can be used in both inter and intra predictions.
Template Matching Prediction in Inter Prediction
Template matching prediction in inter prediction is one way to predict target pixels without sending motion vectors. Given a target block of a frame, a target pixel in the block is determined by finding an optimum pixel from a set of reference samples, where the adjacent pixels of the optimum pixels have the highest correlation with those of the target pixels. Those adjacent pixels of the target pixels are called the template. In the prior art, the template is usually taken from the reconstructed surrounding pixels of the target pixels. Turning to FIG. 1, an example of a template matching prediction scheme for inter prediction is indicated generally by the reference numeral 100. The template matching prediction scheme 100 involves a reconstructed reference frame 110 having a search region 111, a prediction 112 within the search region 111, and a neighborhood 113 with respect to the prediction 112. The template matching prediction scheme 100 also involves a current frame 150 having a target block 151, a template 152 with respect to the target block 151, and a reconstructed region 153. In the case of inter-prediction, the template matching process can be seen as a motion vector search at the decoder side. Here, template matching is performed very similar to traditional motion estimation techniques. Namely, motion vectors are evaluated by calculating a cost function for accordingly displaced template-shaped regions in the reference frames. The best motion vector for the template is then used to predict the target area. Only those areas of the image where a reconstruction or at least a prediction signal already exists are accessed for the search. Thus, the decoder is able to execute the template matching process and predict the target area without additional side information.
Template matching can predict pixels in a target block without transmission of motion vectors. It is expected that the prediction performance of template matching prediction is comparable to that of the traditional block matching scheme if the correlation between a target block and its template is high. In the prior art, the template is taken from the reconstructed spatial neighboring pixels of the target pixels. The neighboring pixels sometimes have low correlations with the target pixels. Thus, the performance of template matching prediction can be lower than the traditional block matching scheme.
Template Matching Prediction in Intra Prediction
In intra prediction, template matching is one of the available non-local prediction approaches, since the prediction could be generated by the pixels far away from the target block. In intra template matching, the template definition is similar to that in inter template matching. However, one difference is that the search range is limited to the decoded part of the current picture. Turning to FIG. 2, an example of a template matching prediction scheme for intra prediction is indicated generally by the reference numeral 200. The template matching prediction scheme 200 involves a decoded part 210 of a picture 277. The decoded part 210 of the picture 277 has a search region 211, a candidate prediction 212 within the search region 211, and a neighborhood 213 with respect to the candidate prediction 212. The template matching prediction scheme 200 also involves an un-decoded part 220 of the picture 277. The un-decoded part 220 of the picture 277 has a target block 221, a template 222 with respect to the target block 221. For simplicity, the following description is based on intra template matching. However, it is appreciated by one of ordinary skill in this and related arts that the inter template counterpart can be readily extended.
Residual Update Scheme in Template Matching
Since template matching can avoid additional overhead, it has an advantage in predicting relatively grainy areas in a video picture by minimizing the attendant additional overhead usually required for such areas. In order to exploit the advantages of template matching, a target block during prediction is usually smaller than the basic coding block. For example, in the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) Standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 Recommendation (hereinafter the “MPEG-4 AVC Standard”), the basic coding block size can be 4×4 or 8×8. In one prior art approach, the target block size of template matching is defined as 2×2, which is half the size of the basic coding block 4×4. Turning to FIG. 3, an example of a sub-divided block for intra template matching prediction is indicated generally by the reference numeral 300. The intra template matching prediction involves a decoded part 310 of a picture 377. The decoded part 310 of the picture 377 has a search region 311 and a candidate prediction 312 within the search region 311, and neighborhood 313 with respect to the candidate prediction 312. The intra template matching prediction also involves an un-decoded part 320 of a picture 377. The un-decoded part 320 of the picture 377 has a target block 321 (also referred to as the sub-divided block for intra template matching prediction in the example of FIG. 3) and a template 322 with respect to the target block 321.
Since the transform size is usually equal to the basic block size (there are 4×4 transforms and 8×8 transforms in the MPEG-4 AVC Standard), that means we can only get the prediction residual transformed, quantized, and updated after at least a whole basic block is predicted. Thus, in template matching with a smaller target block size, a template may include some prediction data from previous predicted sub-blocks. We refer to this approach as “Method 1”.
Turning to FIGS. 4A-D, an example of sub-blocks and their templates is indicated generally by the reference numeral 400. The example of FIG. 4 corresponds to the aforementioned “Method 1”. In the example, a basic block 410 is divided into 4 sub-blocks (indicated by the reference numerals 1, 2, 3, and 4, respectively), each of which is a respective target block of the template matching prediction. Referring to FIG. 4A, the template of target block 1 is indicated by the reference numeral 411, the target block of the template matching prediction (as applied to the basic block) is indicated by the reference numeral 412, and basic block size is indicated by the reference numeral 413. Referring to FIG. 4B, the template of target block 2 is indicated by the regions represented by reference numerals 421 and 422. Referring to FIG. 4C, the template of target block 3 is indicated by the regions represented by reference numerals 431 and 432. Referring to FIG. 4D, the template of target block 4 is indicated by the regions represented by the reference numeral 441, 442, and 443. The prediction process of the four target blocks involves the following steps:                (1) generate a template for a target block (e.g., for target block 1) by using neighboring pixels;        (2) search the matches and generate a prediction for this target block;        (3) update the target block with its prediction;        (4) repeat steps (1) through (3) for target blocks 2, 3, and 4;        (5) get the residual block of the whole basic block;        (6) transform the whole residual block, quantize the coefficients, and inverse transform the quantized coefficients;        (7) generate a reconstructed block by adding the prediction block and the quantized residual block;        (8) finish the basic block.        
Since target block 1 is the first sub-block of the block, the pixels in its template are all from the reconstructed picture. However, for other target blocks, such as target block 2, its template needs to have some pixels from target block 1. Accordingly, when predicting target block 2, target block 1 has just been predicted but not reconstructed because transformation and quantization have not been done until four target blocks are predicted at step (6). Thus, for the template of target block 2, some portion of its template comes from the prediction data of target block 1. The same issue exists when predicting other target blocks (i.e., other than target block 1, since as noted above, the pixels in the template for target block 1 are all from the reconstructed picture) by template matching.
Incorporating the prediction data of previous target blocks to the template of the current target block can make the prediction of the whole basic block vary smoothly. Moreover, we can enjoy the advantages from using a large transform when we apply the large transform (and attendant quantization) when all of the target blocks are completely predicted. However, incorporating the prediction data in the template for matching could propagate the prediction error and worsen the prediction.