In state of the art video coding schemes, block-based motion compensated prediction (MCP) is used to exploit temporal redundancy. For inter-view coding in a multi-view video coding (MVC) scenario, a block matching procedure can also be applied to perform disparity compensated prediction (DCP), thus exploiting inter-view redundancy. Multi-view video coding (MVC) is the compression framework for the encoding of multi-view sequences. A multi-view video coding (MVC) sequence is a set of two or more video sequences that capture the same scene from a different view point.
However, there exist mismatches in the video content that are beyond translational displacement, such as focus changes, motion blur in monoscopic video, and illumination and/or focus mismatches across different views in multi-view video coding. Furthermore, the exhibited mismatches may be localized such that different portions of a video frame can undergo different types of change with respect to the corresponding areas in one or more frames used as reference. For example, with heterogeneous camera settings among cameras (utilized in multi-view video coding), different types of blurriness/sharpness mismatches will be associated to objects with different depths. As for motion blur in monoscopic video, objects moving in different directions could result in directional blurring. These non-translational mismatches will lower the coding efficiency of motion compensated prediction/disparity compensated prediction.
Without prior information about the mismatch in the video content, a two-pass encoding scheme can be utilized, in which an initial search and filter estimation are performed first to adaptively design filters based on the differences between the current frame and the reference frame(s). Higher coding efficiency is achieved by the preceding described two-pass encoding scheme as new references are created using the estimated filters. However, such scheme significantly increases encoding complexity and also increases the overhead since we transmit filter coefficients for every frame encoded with this scheme.
In the context of video coding, reference frame filtering approaches have been proposed, in which new reference frames are generated to improve coding efficiency.
For focus changes and/or camera panning, a technique referred to as blur compensation was proposed, in which a fixed set of blurring (lowpass) filters are used to generate blurred reference frames for video coding. This technique has two shortcomings for the scenarios we consider. First, the filter selection is made only at the frame-level, i.e., applying different filters to different parts of a frame was not considered. Second, this method relies on a very limited pre-defined filter set (lowpass only).
To more efficiently capture the actual mismatch in the video content, we have previously proposed an adaptive reference filtering approach, which is a two-pass encoding scheme. For example, to encode a frame with inter-view prediction, it was proposed to first perform an initial disparity estimation. By exploiting the disparity fields as an estimation of scene depth, video frames are partitioned into regions which correspond to different scene-depth levels. For each depth level, a spatial filter is adaptively designed based on the difference between the current frame and the reference frame to minimize the residue energy. Such a design approach is able to address depth-dependent focus mismatches exhibited across different views. The estimated filters are applied to the reference frame to create filtered references. For each block, the encoder selects the predictor (filtered or unfiltered) that provides the lowest rate-distortion cost (RD-cost), thus ensuring the highest coding efficiency. In this adaptive reference filtering (ARF) method, the overhead (frame-wise filter coefficients) will be larger as compared to fixed filter set approaches. More importantly, this two-pass method significantly increases encoding complexity. The additional steps (initial search and filter estimation) are necessary if we do not have prior knowledge about the mismatch.