The transmission of moving pictures in real-time is known and utilized in applications such as video conferencing, net meetings, TV broadcasting and video telephony. However, representing moving pictures requires bulk information as digital video and is typically described by representing each pixel in a picture with 8 bits (1 Byte). Such uncompressed video data results in large bit volumes, and can be difficult to transfer over conventional communication networks and transmission lines in real time due to limited bandwidth.
Thus, enabling real time video transmission may require a large extent of data compression. Data compression may, however, compromise picture quality. Therefore, efforts have been made to develop compression techniques allowing real time transmission of high quality video over bandwidth-limited data connections.
A number of algorithmic capabilities are generally common between multiple video decoding/encoding standards, such as MPEG-*, H.26*, and SMPTE-VC-1. Deblocking filtering and motion estimation/compensation are two typical examples of general algorithms that are required for video encoding. The coding is performed on block wise parts of the video picture. A macro block consists of several sub blocks for luminance (luma) as well as for chrominance (chroma).
Blockbased coding/decoding has proven to be very efficient. However, one of the drawbacks is that the reconstructed image may include visible artifacts corresponding to the blocks used for prediction and residual signal coding. This phenomenon is usually referred to as blocking or blocking artifacts.
One way of reducing blocking artifacts is to integrate a de-blocking filter in the coding loop, which is the preferred solution in the specification ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC. This coding integrated solution is processor consuming because it requires a test procedure for each pixel line crossing the block edges to be smoothed.
Although the de-blocking filter, per se, is not complex, during the filtering process almost every pixel of a reconstructed picture frame needs to be accessed by the filtering algorithm. The de-blocking operation is therefore quite processor consuming.
Video encoding for high-definition (HD) formats increase the demands for memory and data processing, and requires efficient and high bandwidth memory organizations coupled with compute intensive capabilities. Due to these multiple demands, a flexible parallel processing approach must be found to meet the demands in a cost effective manner.
To efficiently support de-blocking filtering algorithms and other complex programmable functions, which may vary in requirements across the multiple standards, a processor alone would require significant parallelism and very high clock rates to meet the requirements. A processor of this capability would be cost prohibitive for commercial products.
In many cases the de-blocking filter is one of the main bottlenecks in both the encoding and the decoding process, especially for high resolution images as in the case of HD. As the filtering occurs on a per macroblock basis, with horizontal filtering of the vertical edges performed first, and followed by vertical filtering (of the horizontal edges), both directions of filtering on each macroblock must be conducted before moving to the next macroblock. Video codecs are typically installed on customized hardware in video endpoints with digital signal processing (DSP) based processors. However, it has recently become more common to install video codecs in general purpose processors with a SIMD processor environment. When implemented in typical general purpose processors, each macroblock must typically be loaded from a random access memory (RAM) to registers in the general purpose processor twice, once for vertical de-blocking filtering, and once for horizontal de-blocking filtering, and possibly transposing each macroblock in several inefficient and small steps. This is particularly demanding computationally as it involves loading distant memory references.
In light of the above deficiency in the art, currently, a video de-blocking process is desired in which the number of times macroblocks are loaded from memory and computational overhead are reduced.