The advent of digital multimedia such as digital images, speech/audio, graphics, and video have significantly improved various applications as well as opened up brand new applications due to relative ease by which it has enabled reliable storage, communication, transmission, and, search and access of content. Overall, the applications of digital multimedia have been many, encompassing a wide spectrum including entertainment, information, medicine, and security, and have benefited the society in numerous ways. Multimedia as captured by sensors such as cameras and microphones is often analog and the process of digitization in the form of Pulse Coded Modulation (PCM) renders it digital. However, just after digitization, the amount of resulting data can be quite significant as is necessary for recreation of the analog representation needed by speakers and/or TV display. Thus, efficient communication, storage or transmission of the large volume of digital multimedia content requires its compression from raw PCM form to a compressed representation, and thus many techniques for compression of multimedia have been invented. Over the years, video compression techniques have grown very sophisticated to the point that they allow achieving high compression factors between 10 and 100 while retaining high psychovisual quality, often similar to uncompressed digital video.
While tremendous progress has been made to date in the art and science of video compression (as exhibited by the plethora of standards bodies driven video coding standards such as MPEG-1, MPEG-2, H.263, MPEG-4 part2, MPEG-4 AVC/H.264, MPEG-4 SVC and MVC, as well as industry driven proprietary standards such as Windows Media Video, RealVideo, On2 VP, and the like), the ever increasing appetite of consumers for even higher quality, higher definition, and now 3D (stereo) video, available for access whenever, wherever, has necessitated delivery via various means such as DVD/BD, over the air broadcast, cable/satellite, wired and mobile networks, to a range of client devices such as PCs/laptops, TVs, set top boxes, gaming consoles, portable media players/devices, and smartphones, fueling the desire for even higher levels of video compression. In the standards-body-driven standards, this is evidenced by the recently started effort by ISO MPEG in High Efficiency Video Coding (HEVC), which is expected to combine new technology contributions and technology from last couple of years of exploratory work on H.265 video compression by ITU-T standards committee.
All aforementioned standards employ a general interframe predictive coding framework that involves reducing temporal redundancy by compensating for motion between frames (or fields) of video by first dividing a frame into blocks and assigning motion vector/s to each block of a frame to be coded, with respect to past decoded frame; these motion vectors are then transmitted to the decoder and used to generate a motion compensated prediction frame that is differenced with a past decoded frame and coded block by block, often by transform coding. For higher coding efficiency, it has been recognized that motion vector/s should have a higher precision than integer pixel, so MPEG-1 and MPEG-2 allow ½ pixel accuracy while more recent standards such as MPEG-4 part 2 (version 2) video, and H.264 use ¼ pixel accuracy motion compensation. However since actual pixels of a frame are only available at integer pixel precision, special filters are needed to interpolate a block of previous frame to a subpel location as needed for generating motion compensated prediction. The H.264 standard specifies a fixed filter set of separable filters that can be used for generating all 16 phases needed for ¼ pel interpolation. This fixed filter set is theoretically optimum, as it is derived from Weiner theory for maximum gain; however, some filters in the fixed filter set are limited to 6 taps, while for others as many as 9 taps are allowed. Furthermore, there is some loss in accuracy in the integerization process due to precision limitation. While this type of prediction generally works on the average, this or any other single fixed filter, for specific pictures or scenes, can have a mismatch with characteristics of content, so there is room for improvement.
Over last few years there has been substantial research in the area of filtering for ¼ pel motion compensation leading to development of adaptive motion filtering. In adaptive filtering, coefficients are not fixed and thus are not known a priori; i.e., the coefficients are computed from the content itself and vary with time. The mathematical procedure used to compute filter coefficients is based on wiener hopf equation.
While the weiner hopf mathematical procedure can calculate an optimum filter coefficient set, there are significant issues in practical integration of this approach in a video coding system. Thus in the context of H.265 and/or HEVC development, a number of proposals have been made to address the various shortcomings as well as to increase adaptivity for improved coding gain. These proposals can be briefly listed as follows.
Choice between nonseparable vs. separable filters—The computation of nonseparable filters can be more compute intensive, but theoretically nonseparable filters can result in higher quality. However, nonseparable filters also require roughly twice the number of coefficients of separable filters so the coding overhead of nonseparable filters can be high. Overall, for motion filtering, separable filters can provide reasonable coding gain and thus offer an overall better tradeoff.
The tradeoff of number of iterations vs. gain for each iteration—The iterative solution to weiner hopf equation takes a number of iterations to converge to give good results. The number of iterations needed depends on how far the default filter set (used initially for first iteration) is from the optimum results. If they are close, often up to 4 iterations may be enough, but if they are very different, 16 to 20 iterations may be needed. If the difference between default filter and the optimal filter is a lot, stopping at 4 iterations will not result in most of the gains.
Accuracy of various coefficients for best tradeoff of bit cost vs. quality—By keeping certain coefficients in a filter set to lower accuracy, some bit savings can result. In fact the maximum precision of a filter coefficient often can be limited to 8 bits, and in some exceptional cases, the precision used may be limited to 10 bits.
Coefficient bit cost reduction for transmission to decoder (actual values, differential, limits on updates)—The number of coefficients to be sent to decoder must be limited, as for instance, typically nonseparable filter set may require sending as many as 120 coefficients (with bit cost of 650-950 bits per frame) while even a separable filter set may require sending 45-60 coefficients (with bit cost of 400-550 bits per frame). With differential coding (encode filter set of current frame differentially with previous frame's filter set, or even with respect default filter set) or by placing limits on updates, this bit count can be reduced somewhat, albeit with some loss in quality.
Multiple filter sets within a picture to improve overall gains—While even one wiener filter set computed every frame, can give gains over, say, the H.264 standard filter set, by having a choice of multiple filter sets within a frame (such as on block or slice basis) can result in higher gains. One problem however that the bit cost of even 2 filter sets per frame may be excessive as compared to the additional gains. Thus managing bit cost of filter coefficient is necessary when using multiple filter sets.
Rate Distortion Optimization (“RDO”) complexity as it may otherwise involve multiple iterations on filter switching map to derive the best results—When using multiple filters, to get high gain, often it is necessary to use rate distortion optimization in an iterative manner to get the right block size for filter selection that yields a good tradeoff between frequency of filter switching versus the gains.
Block map overhead bit cost when using multiple filters—When using multiple filters (including the case of switching between a standard filter and a computed filter), the switching map cost can be substantial. For instance, if one were to switch between two filters on a macroblock basis using 1 bit for macroblock, then for a Common Intermediate Format (“CIF”) sequence, 396 bits of extra overhead would be added, in addition to bits cost to possibly send two filter-sets.
Additional gains by using integer position filters, and filters with offsets—Some attempts have been made to squeeze gains by computing a filter for integer position in case when the best ¼ pel position is the integer position. Further, some experiments have been conducted using filters with offsets for higher gains. Both these cases require additional bits of overhead.
As noted earlier, while many techniques and variations have been suggested, the coefficient bit count overhead of current approaches, even after differential coding, is too high. Additionally, the existing techniques are computationally complex due to the need for calculating on the fly multiple iterations of coefficient sets. Moreover, the only way for current system to be very adaptive is by extensive use of RDO for multiple switched filters which means extra overhead and complexity.
Additional information related to adaptive motion-compensation filtering may be found in the following references, each of which is incorporated fully by reference, for all purposes:    T. Wedi, “Adaptive Interpolation Filter For Motion Compensated Prediction,” Proc. Int. Conf on Image Processing (ICIP) 2002, pp. 509-511, 2002.    V. Vatis et al, “Coding of Coefficients of two-dimensional non-separable Adapive Wiener Interpolation Filter,” Proc. SPIE Visual Communications and Image Processing (VCIP) 2005, July 2005    V. Vatis and Joern Ostermann, “Locally Adaptive Non-Separable Interpolaion Filter for H.264/AVC,” Proc. Int. Conf on Image Processing (ICIP) 2006, October 2006.    S. Wittmann and T. Wedi, “Separable Adaptive Interpolation Filter for Video Coding”, Proc. Int. Conf on Image Processing (ICIP) 2008, pp. 2500-2503, 2008.