In current encoder and/or decoder designs, motion compensation is used to exploit temporal redundancy by subtracting a prediction block from the current block. However, there exist various situations in which the motion compensated prediction is not efficient. For example, one situation involves when there is a discrepancy of sharpness/blurriness between a current frame to be encoded and a reference frame. This may be caused by, for example, changing focus, camera pan with hand-held devices, and/or a special effect created for a scene change. Such phenomenon can often be observed in dramas when the camera is first focused on one character and then shifts its focus to another character when the two characters appear in different scene/focus depths. The first person looks sharper in the reference frame while the second person is blurred in the reference frame. Herein, we denote this focus-changing example in regular single-view video coding as “Case 1”.
Another source of discrepancy that degrades the quality of the prediction signal appears in multi-view video sequences. In multi-view video coding systems, scenes are captured simultaneously by multiple cameras from different view points. Disparity compensation is applied from view to view to exploit the redundancy among different view pictures. Higher coding efficiency can be achieved by performing both motion and disparity compensations, as compared to encoding each view independently. Multi-view video systems may be built with heterogeneous cameras, or cameras that have not been perfectly calibrated. This leads to discrepancies such as, for example, illumination mismatch, color mismatch and/or focus mismatch among different views. The efficiency of cross-view disparity compensation may deteriorate due to such discrepancies. Furthermore, objects with different depths may possess different kinds of incongruity between two views. For example, object A in view 1 may be in focus while view 2 may be in focus with object B. To perform disparity compensation from view 1 to view 2, object A is sharper in the reference frame while object B is blurred. Herein, we denote such camera focus mismatch in multi-view systems as “Case 2”.
Most of the previous literature on adaptive reference frame filtering is focused upon generating a sub-pixel reference for motion compensation.
For example, in one conventional adaptive interpolation filtering approach, an adaptive interpolation filter has been proposed on a frame basis. After obtaining the motion vectors using an interpolated reference frame with a fixed six-tap filter corresponding to the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 recommendation (hereinafter the “MPEG-4 AVC standard”), the coefficients of the adaptive interpolation filter are calculated by minimizing a matching error measure such as sum of square difference (SSD) or sum of absolute difference (SAD). The adaptive filter is used to generate the interpolated reference picture, which is then used for motion compensation. The process does not carry out further motion estimation with the newly interpolated sub-pixel reference. The filter design is constrained to be separable in vertical and horizontal directions and is cascaded with bilinear filters.
In an improvement to the previously described adaptive interpolation filtering approach, another approach involves first obtaining the motion vectors with standard interpolation filters. Depending upon the sub-pixel portion of the motion vector, different interpolation filters are designed for different sub-pixel positions. The filters employed are two-dimensional non-separable, with certain symmetric constraints to reduce the number of coefficients to be solved. A second motion estimation/compensation is performed with these new filters to generate a sub-pixel reference.
In the above described prior art approaches relating to adaptive interpolation filtering, the integer pixels in the reference frame remain unchanged, as the interpolation keeps the original data points unchanged. However, such an approach may not be efficient for predictive video coding with inherent discrepancies.
In a prior art approach to video compression using blur compensation, it is proposed to use a blurring filter to generate a blurred reference frame. However, this approach is only directed to situations where the current frame is a blurred version of the reference frame, but not the reverse case. The set of filters that may be selected by the encoder are predefined, which is quite restrictive and suboptimal as compared to the adaptive filter design approach. Moreover, for each frame to be encoded, only one filter from the predefined set will be selected based on frame-level rate reduction.