Video compression is about reducing and removing redundant information from the video data. Typically information from neighboring image elements or pixels, both within a picture but also from previous coded pictures, are used to make a prediction of the video. Because the compression process is lossy, i.e., you lose information about the video sequence, the reconstructed video will always differ from the original video in some way. A major goal of any video codec standard is to provide tools to hide or minimize those distortions while still maintaining a high compression ratio to get the size of the video file as small as possible.
Pixel or image element prediction is an important part of video coding standards such as H.261, H.263, MPEG-4 and H.264 (ITU-T Rec. H.264 and ISO/IEC 14496-10, “Advanced Video Coding,” 2003). In H.264 there are three pixel prediction methods utilized, namely intra, inter and bi-prediction. Intra-prediction provides a spatial prediction of the current block from previously decoded pixels of a current frame. Inter-prediction gives a temporal prediction of the current block using a corresponding but displaced block in a previously decoded frame.
In state of the art video codecs intra-prediction is an important method for creating a prediction of image elements for the current block. Since the intra-coding tends to transport most of the signal energy in the video bit stream, any improvements on the prediction and coding methods is important for the reduction of the bits needed when compressing a video sequence.
Intra-prediction uses reference image elements neighboring the current block to predict blocks within the same frame. The order in which the blocks are encoded is from the upper left corner and then row-wise through the whole frame. Therefore, already encoded image elements in the frame will be to the upper left of the next block. Intra-prediction takes this into consideration when using the image elements to the left and above the block to predict image elements within the block. In the latest standard, HEVC, the intra-prediction consists of three steps: reference image element array construction, image element prediction, and post-processing. Intra-prediction can be classified into two categories: Angular prediction methods and DC/planar prediction methods. The first category is illustrated in FIG. 1 and is supposed to model structures with directional edges, and the second category estimates smooth image content.
The idea of reusing blocks within the same frame to remove redundant data has also later been proven efficient for screen content coding. Intra-Block Copy (IntraBC) is a method in state of the art video codecs where a block in an image is predicted as a displacement from already reconstructed blocks in the same image. It removes redundancy from repeating patterns, which typically occur in text and graphic regions, and therefore IntraBC is today mostly used for compressing screen content and computer graphics. The cost of encoding time increases compared to intra-prediction because of the search involved in intra-block matching. The most similar block in a specified search area next to the current block is found by comparing the blocks with some metric, where the calculation of sum of squared error or difference (SSD) is often included in the metric. This method is similar to the inter-prediction method in HEVC, where blocks from other reference frames are reused to predict blocks in the current frame, the major difference being that in IntraBC the referenced blocks comes from within the same frame as the current block.
More specifically, in intra-prediction, image elements neighboring the current block are used to create a prediction of the current block according to the intra-prediction mode. In intra-block copy prediction, reference image elements positioned relative to the current block by a block vector are copied to create a prediction of the current block. In inter-prediction, reference image elements, positioned relative to the current block by a motion vector, from previously decoded pictures are copied directly or an interpolated version is used to predict the current block. Inter-prediction also allows bi-prediction of two independent reference blocks using two independent motion vectors, the reference blocks, potentially interpolated, are then combined. The intra and inter-predictions can be re-generated on the decoder side because the intra-prediction mode and the displacement vector are typically included with the coded bit stream.
In the current state of the art video codecs template matching is a technique for the encoder to be able to reference a block of previous coded samples without having to signal a displacement vector for indicating the position. For this to work a template area of image elements neighboring the current block is selected by both the encoder and decoder using pre-determined information, which could, e.g., be signaled in a slice header or Picture Parameter Set (PPS). A search area of a size that has also been pre-determined, e.g., from a slice header or from a PPS or defined during a decoding process in a codec specification, is searched. For each location in the search area, an error metric is computed between the image elements at the search location and the image elements in the template area. The location that resulted in the lowest error metric is then selected as the final location, and the image elements at the location will then be used for creating a prediction of the current block. This process is performed by both the encoder and decoder to ensure that the same image elements are used for the prediction.
In template matching, both the encoder and the decoder determine from which reference image elements the current block shall be predicted. Template matching is used to find previously coded blocks that are similar to the current one by finding locations where the neighboring image elements are similar to the neighboring image elements of the current block. Image elements from the found location can then be used without having to send a displacement vector to indicate the position of the reference block.
Multiple reference pictures may be used for inter-prediction with a reference picture index to indicate which of the multiple reference pictures is used. In the P-type of inter encoding, only single directional prediction is used, and the allowable reference pictures are managed in list 0. However, in the B-type of inter encoding, two lists of reference pictures are managed, list 0 and list 1. In such B-type pictures, single directional prediction using either list 0 or list 1 is allowed, or bi-predictions using an average of a reference picture from list 0 and another reference picture from list 1 may be used.
The weighted prediction in H.264 represents a weight for respectively bi-directional prediction and also a DC offsets for the weighted combination in the slice header. The general formula for using weighting factors in inter-prediction is:P=((w0P0+w1P1)□Shift)+DC,  (1)where P0 and w0 respectively represent the list 0 initial predictor and weighting factor, and where P1 and w1 respectively represent the list 1 initial predictor and weighting factor. DC represents an offset that is defined per frame basis, Shift represent a shifting factor, and □ Shift represents right shift by Shift. In the case of bi-directional prediction w0=w1=0.5.
PCT publication WO 2004/064255, titled “Mixed Inter/Intra Video Coding of Macroblock Partitions” and filed 6 Jan. 2004 suggests a hybrid intra-inter bi-predictive coding mode that allows both intra and inter frame predictions to be combined together for hybrid-encoding a macroblock. In this hybrid coding, an average of selected intra and inter-predictions or a differently weighted combination of the intra and inter-predictions is used. The hybrid coding suggested in WO 2004/064255 basically uses a summing of the two input intra and inter-predictions or uses slice-specific weights. Thus, the same weight is applied to all pixels in all macroblocks of a slice that is used as inter and/or intra-prediction. Such an approach is far from optimal from an image quality point of view.
Further, intra-prediction can only predict simple structures in the original block as only one row and one column of an image elements are used from the neighboring blocks. Thus, intra-prediction provides useful low frequency information. It is not possible to represent more complex structures and high frequency information, however, using the intra-prediction modes (current angular directions, planar, and dc predictions) in state of the art video codecs. Template matching and intra-block copy can retain more structure and higher frequency information but will often lead to large discontinuous at the border between the current block and neighboring blocks.
For at least these reasons, alternate solutions are desired to improve the encoding and decoding of video sequences.