1. Field of the Invention
The present invention relates generally to video compression and filter and filtering methods used in video compression, and more particularly, to a deblocking filter and method (and devices incorporating such a filter or performing such a method) that provides an in-loop filter implementation of the deblocking filter algorithm of the H.264/MPEG-4 AVC high compression digital video CODEC standard that is adapted for parallel processing and predication using the VLIW processing architecture.
2. Relevant Background
Advances in video compression techniques have revolutionized the way and places video information is communicated and displayed. Applications that use video compression include broadcast television and home entertainment including high definition (HD) television and other forms of video devices including those that are adapted for exchanging digital video information and especially those that call for high-definition picture resolutions such as computers, DVD players, gaming consoles and systems, wireless and cellular devices. These applications and many more are made possible by the standardization of video compression technology. To address limitations in prior standards, video coding experts in the International Telecommunications Union—Telecommunication (ITU-T) and the Moving Picture Experts Group (MPEG) have produced a new video compression standard that defines the syntax of an encoded video bit stream together with the method of decoding this bit stream but without explicitly defining a CODEC (encoder/decoder pair) which allows considerable variation in implementing the standard in working devices. The new standard is labeled H.264 or MPEG-4 AVC (Advanced Video Coding).
Generally, compression allows video content to be transferred and stored using much lower data rates while still providing desirable picture quality, e.g., providing relatively pristine video at low data rates or at rates that use less bandwidth. To this end, compression identifies and eliminates redundancies in a signal to produce a compressed bit stream and provides instructions for reconstituting the bit stream into a picture when the bits are uncompressed. Video compression techniques today follow a common set of steps. Video compression involves segmenting a video frame into blocks of pixels. An estimate is made of frame-to-frame motion of each block to identify temporal or spatial redundancy within the frame. An algorithmic transform decorrelates the motion-compensated data to produce an expression having a low number of coefficients to reduce spatial redundancy. Then, the frequency coefficient is quantized based on psycho-visual redundancy to reduce the average number of bits necessary to represent the compressed video.
Video compression techniques may introduce artifacts or discontinuities that need to be filtered or corrected to decode the compressed video to near its original state. Most video compression standards, including the new H.264, divide each input field or frame into blocks and macroblocks of fixed size. Pixels within these macroblocks are considered as a group without reference to pixels in other macroblocks. Compression may involve transformation of the pixel data into a spatial frequency domain, such as via an integer transform. This frequency domain data is quantized and encoded from low frequency to high frequency. Since much of the energy in the frequency domain data is usually concentrated in the low frequencies, an end of block symbol enables truncation of coding high frequency symbols. The resulting quantized data is typically entropy coded. In entropy coding more frequently used symbols are coded with fewer bits than less frequently used symbols. The net result is a reduction in the amount of data needed to encode video. This coding in separate macroblocks can create coding artifacts at the block and macroblock boundaries. Because adjacent macroblocks may be encoded differently, the image may not mesh well at the macroblock boundary. For example, other features of the macroblock may cause a different quantization parameter. Upon decoding, the same color or gray-scale value at the macroblock boundary may be displayed differently based upon this different quantization which may appear as block or edge artifacts in the displayed video image.
To eliminate these artifacts, H.264 defines a deblocking method that operates on 16×16 macroblocks and 4×4 block boundaries. In the case of the macroblocks, the deblocking filter eliminates artifacts resulting from motion or intraframe estimation or different quantizer scales. For the 4×4 blocks, the deblocking filter removes artifacts that are caused by transformation/quantization and motion-vector differences between adjacent blocks. Generally, an in-loop filter modifies pixels on either side of the boundary using a content-adaptive, nonlinear filter, with both the encoder and decoder complying with H.264 using the same deblocking filter to provide a “loop.” As a result of the use of the deblocking filter, the decoded or de-compressed stream has significantly improved visual quality.
Video compression compliant with H.264 provides a greatly improved rate distortion when compared with prior compression standards, such as MPEG-4, and several studies have indicated that H.264 provides for comparable video quality with MPEG-2 while requiring less than half the bit rate. In order to obtain better compression, H.264 calls for a directional spatial prediction scheme to find more redundancies among pixels within a video frame. For inter coding, H.264 implements multiple frame reference, weighted prediction, a deblocking filter, variable block size, and quarter sample accurate motion compensations. For transformation, H.264 uses a small, block-based integer and hierarchical transform. For entropy coding, H.264 adopts two coding techniques (i.e., context adaptive based arithmetic coding for the main profile and context adaptive variable length coding for baseline, main, and extended profiles. From the high level architectural viewpoint, the H.264 coding scheme is similar to the architectures of other video CODECs. However, the basic functional blocks of H.264, such as prediction, transformation, quantization, and entropy coding are significantly different than those in prior video coding schemes. As a result, hardware designed to comply with prior video compression standards is not compatible with H.264 and cannot be used to implement this new compression standard. As a result, new implementations of software and/or hardware are required code video streams according to H.264.
While providing a lower bit rate, there are implementation problems associated with the H.264 standard as its computational complexity is relatively high. From recent studies and research, it appears that real-time implementations of H.264 may require a powerful processor multiple processors and two level caches and special memory interfaces. Some early proposed video processing systems involved multicore architectures that included one MIPS processor and eight 600 MHz Trimedia processors that each had three level caches. Such implementations are likely not cost effective and may not be suited for mainstream consumer electronics, and there remains a need for effective implementations of H.264 before it will be readily adopted and used by consumers.
Particularly, there is a need for a hardware solution for efficiently implementing a deblocking filter complying with the requirements of H.264. The deblocking filter plays an important role in H.264 video compression. It is used to reduce blocking artifacts that are created during the motion compensation process and/or by the coarse quantization of transform coefficients. The deblocking filter is an advanced tool of the H.264 standard used to maximize the coding performance. Loop or in-loop filters that operate within the coding loop have been shown by empirical testing to significantly improve both objective and subjective quality of video streams compared with post filters, and as a result, in-loop filters are likely to be used to implement the H.264 deblocking filter. Unfortunately, in-loop or loop filters increase computational complexity for both an encoder and a decoder that may be implemented to comply with H.264. Research has shown that even with tremendous speed optimization, the deblocking filter process or algorithm specified by H.264 may consume about one third of the processor time of a software implementation of the H.264 decoder.
Hence, there remains a need for an efficient implementation of an in-loop deblocking filter complying with the H.264 video compression standard. Preferably, such a filter and associated filtering process would be adapted or designed to significantly reduce the amount of processor time consumed during deblocking while being useful with existing processor architectures, such as the very long instruction word (VLIW) architecture.