High Efficiency Video Coding (HEVC) is a standardized block-based video codec that utilizes both temporal and spatial prediction. Spatial prediction is achieved using intra prediction from within a current picture, while temporal prediction is achieved using inter prediction or bi-directional inter prediction on block level from previously decoded reference pictures. The difference between the original pixel data and the predicted pixel data, referred to as the residual (or prediction error), is transformed into the frequency domain and quantized and entropy encoded using e.g. context-adaptive variable-length coding (CAVLC) or Context-adaptive binary arithmetic coding (CABAC). The residual is thereafter transmitted to a receiving decoder together with necessary prediction parameters such as mode selections and motion vectors (all being entropy encoded). By quantizing the transformed residuals the tradeoff between bitrate and quality of the video may be controlled, wherein the level of quantization is determined by a quantization parameter (QP). The receiving decoder performs entropy decoding, inverse quantization and inverse transformation to obtain the residual. The decoder then adds the residual to an intra prediction or inter prediction in order to reconstruct a picture.
The Video Coding Experts Group (VCEG) of Telecommunication Standardization Sector of the International Telecommunications Union (ITU-T) has recently started the development of a successor to HEVC. In a first exploration phase, an experimental software codec called Key Technical Areas (KTA) is being used. KTA is based on the HEVC reference codec software HEVC Test Model (HM).
One tool that has been proposed and included in the KTA software is frame-rate up-conversion (FRUC). The FRUC tool is a motion estimation tool that derives the motion information at the decoder side. FRUC has two different modes, template matching and bilateral matching.
FIG. 1 illustrates the principle of template matching. The template matching is a digital image processing technique for finding small parts of an image that matches a template image. A current block B of a current picture is to be decoded and a search image (template A) is therefore selected. The decoder derives a motion vector by matching a template area A of the current picture (denoted Cur Pic) that is neighboring the current block B, with the same template area A in a reference picture (denoted Ref0). The prediction area in the reference picture Ref0 with the best matching template area is selected as the prediction for the current block B.
FIG. 2 illustrates the principle of bilateral matching. In bilateral matching a block (or picture) is predicted from a previous picture (Ref0) and a following picture (Ref1). A continuous motion trajectory (indicated by a dotted line in the figure) along the current block (denoted Cur block) between two blocks of the reference pictures (Ref0 and Ref1) is assumed to model linear motion. The displacement between a current block and a best matching block is the motion vector. The motion vector between the previous picture Ref0 and the current picture CurPic (the pictures having temporal difference TD0) is indicated by MV0 and the motion vector between the current picture CurPic and following picture Ref1 (the pictures having temporal difference TD1) is indicated by MV1. The motion vectors MV0 and MV1 are proportional to the temporal differences TD0 and TD1. The motion vectors along the motion trajectory that minimizes the prediction error is selected, and their corresponding reference prediction blocks are used to interpolate (or extrapolate) the prediction for the current block of the current picture CurPic.
Natural images captured either with digital cameras or conventional film cameras will pick up noise from a variety of sources such as low number of photons per pixel in the image sensor, dust inside the camera, etc. Further, there exist several types of noise, for instance, salt and pepper noise, which is characterized by the pixels being very different from their surrounding pixels in intensity or color. Another type of noise is Gaussian noise, in which the noise for each pixel in contrast typically just changes by a small amount compared to its original or intended value.
Noisy images may decrease the prediction accuracy in both template matching and bilateral matching since the noise of the pixels tends to change between pictures. Denoising a video in a pre-processing step is one way to address this problem, but it is difficult to balance the amount of noise removal on the one hand and the level of details on the other hand. There is a risk that the picture is denoised too much in some areas resulting in that real details are removed.
Another way to address the problem is to do a rate-distortion decision of filter strengths for a denoising filter at the encoder side and signal the filter strength in the bit stream. This will however cost additional bits that adversely affect the compression efficiency.
There is thus a tradeoff between the amount of noise that can be removed and the level of detail that can be kept, and it is difficult to find a suitable balance.