Video compression is useful for reducing the bandwidth required to transmit video or to minimize storage requirements of video data in a recording medium. Some applications include motion picture transmission and playback, video storage, videoconferencing, television broadcasting, video streaming over the Internet, and video communications generally. Lossless compression, although providing superior reproduction quality, has not proved to be viable in these applications. Lossy compression algorithms, on the other hand, which are specified by most video compression standards produce objectionable visual artifacts, such as “blocking” or checker board image in the perceived video. This phenomenon is more pronounced at low bandwidths or during low bit-rate transmission. In the context of predictive video coding specified under MPEG-1/2/4 and H.263/+/4 compression standards, for example, prediction chains typically span a large number of video frames. Since these standards employ macroblock processing of video information in 16×16 pixel arrays, progressive degradation of video quality ensues as cumulative error introduced by artifacts increases with the length of the prediction chain.
To reduce unwanted visual artifacts, filtering or dc-blocking routines may be applied at any stage during compression or decompression (e.g., encoding or decoding). Pre-filtering, occurs before compressing the video information. Dynamic pre-filters may be used in coordination with video encoding by modulating the degree of filtering in response to one or more control signals or certain statistical characteristics of the video information generated during the encoding stage. Post-filtering, on the other hand, occurs after decompressing (or decoding) the video information but before storing, transmitting, or displaying the information on a monitor. In addition, the degree of post-filtering may be modulated by one or more control signals responsive to the degree of perceived artifacts in the decompressed video information. It is known in the art, however, that pre-filtering rather than post-filtering more satisfactorily reduces unwanted visual artifacts. Routines that filter blocked-processed video information in a prediction chain require intense, high-speed processing since handling or transformation of the individual pixel elements within a macroblock may widely differ. The problem is exacerbated in SIMD (single instruction multiple data) architectures where multiple pixel elements are processed in a single instruction.
Loop filtering, which is defined under the H.263+ standard and also adopted in the recently ratified JVT-AVC H.264 standard, provides another filtering technique. These standards specify filtering video information within a prediction loop, and differ from pre-filtering in that video information is compressed before being filtered. During loop filtering, however, any prediction derived from previously compressed video information and used in subsequent compression steps is also filtered. Loop filtering implemented at the decoder is believed to produce the best reduction in compression artifacts. However, a standard that specifies loop filtering forces every compliant video decoder (in addition to the encoder) to perform filtering since such filtering cannot be excluded or separated from the video compression process.
Loop filtering defined under the JVT-AVC (Joint Video Team-Advanced Video CODEC) standard is particularly complex in that each pixel or picture element (luminance and/or chrominance value) in a video frame may potentially be filtered at a different level and the process that determines the level of filtering may be quite complex. The JVT-AVC standard specifies filtering of macroblocks comprising a matrix of 16×16 picture elements. It has been estimated that activities of loop filtering for an optimized JVT-AVC codec may consume up to 50% of the codec's processing cycles, depending on the profile and level of the standard being employed. Thus, in a video decoder implementing a SIMD instruction set, it is advantageous to provide a loop filter that performs real-time filtering robustly in order to avoid processing or transmission delays in the video stream.
As known, SIMD instructions enable logical operations on multiple picture elements contained in a macroblock, but (to not necessarily provide instructions for branching or looping. Although some SIMD architectures provide limited branching capability, the performance penalty introduced by branching, in terms of processing delays and breaking the flow of instructions during pipeline processing, requires such instructions to be used only in exceptional cases.