High Efficiency Video Coding (HEVC) is a standardized block-based video codec that utilizes both temporal and spatial prediction. Spatial prediction is achieved using intra prediction from within a current picture, while temporal prediction is achieved using inter prediction or bi-directional inter prediction on block level from previously decoded reference pictures. The difference between the original pixel data and the predicted pixel data, referred to as the residual (or prediction error), is transformed into the frequency domain and quantized and entropy encoded using e.g. context-adaptive variable-length coding (CAVLC) or Context-adaptive binary arithmetic coding (CABAC). The residual is thereafter transmitted to a receiving decoder together with necessary prediction parameters such as mode selections and motion vectors (all being entropy encoded). By quantizing the transformed residuals the tradeoff between bitrate and quality of the video may be controlled, wherein the level of quantization is determined by a quantization parameter (QP). The receiving decoder performs entropy decoding, inverse quantization and inverse transformation to obtain the residual. The decoder then adds the residual to an intra prediction or inter prediction in order to reconstruct a picture.
The Video Coding Experts Group (VCEG) of Telecommunication Standardization Sector of the International Telecommunications Union (ITU-T) has recently started the development of a successor to HEVC. In a first exploration phase, an experimental software codec called Key Technical Areas (KTA) is being used. KTA is based on the HEVC reference codec software HEVC Test Model (HM).
One tool that has been proposed and included in the KTA software is frame-rate up-conversion (FRUC). The FRUC tool is a motion estimation tool that derives the motion information at the decoder side. FRUC has two different modes, template matching and bilateral matching.
FIG. 1 illustrates the principle of template matching. The template matching is a digital image processing technique for finding small parts of an image that matches a template image. A current block B of a current picture is to be decoded and a search image (template A) is therefore selected. The decoder derives a motion vector by matching a template area A of the current picture (denoted Cur Pic) that is neighboring the current block B, with the same template area A in a reference picture (denoted Ref0). The prediction area in the reference picture Ref0 with the best matching template area is selected as the prediction for the current block B.
FIG. 2 illustrates the principle of bilateral matching. In bilateral matching a block (or picture) is predicted from a previous picture (Ref0) and a following picture (Ref1). A continuous motion trajectory (indicated by a dotted line in the figure) along the current block (denoted Cur block) between two blocks of the reference pictures (Ref0 and Ref1) is assumed to model linear motion. The displacement between a current block and a best matching block is the motion vector. The motion vector between the previous picture Ref0 and the current picture CurPic (the pictures having temporal difference TD0) is indicated by MV0 and the motion vector between the current picture CurPic and following picture Ref1 (the pictures having temporal difference TD1) is indicated by MV1. The motion vectors MV0 and MV1 are proportional to the temporal differences TD0 and TD1. The motion vectors along the motion trajectory that minimizes the prediction error is selected, and their corresponding reference prediction blocks are used to interpolate (or extrapolate) the prediction for the current block of the current picture CurPic.
The above described motion compensating prediction methods may give more or less accurate predictions depending on the video at hand, e.g. for videos having fast and complex changing picture, the predictions may be less accurate. For instance, the prediction for natural geometrical transformations in the video may be far from optimal and result in worse quality for a given bitrate. To have the encoder side signal information, such as scaling factors, in a bitstream to the decoder side is generally expensive in terms of bits.