1. Field of the Invention
Embodiments relate to methods for deblocking filtering a macroblock.
2. Background of the Related Art
In many applications, standardised methods are used to code image information. The video compression method according to the standard H.264 is used, for example, for:                high-resolution television such as High Definition Television (HDTV) with video data storage on high-definition digital versatile disc (HD DVD) and Blu-ray disc,        mobile terminals such as mobile telephones, personal digital assistants (PDAs), portable games consoles and MP3 players on which videos can be played back,        multimedia,        video-conference technology, and        video cameras and digital cameras.        
In video compression, video data is coded in a transmitter, transmitted to a receiver in coded form, and decoded in the receiver. The standards used are therefore also called codecs, a combination of the English words “code” and “decode”.
In standard H.264 and its predecessor, standard H.263, individual video images, also called frames, are split into blocks and coded block by block, whereby in both standards, a deblocking filter is used in each case in the coding loop. Other video compression standards have deblocking filters with which the coded frames are filtered in a post-processing stage. The deblocking filter is used to increase the perceived image quality, whereby optically perceptible transitions between adjacent blocks, also called block artefacts, which can form between adjacent blocks, are smoothed by the filtering of the image points of each frame saved in the blocks. Below, “filtering” means “deblocking filtering”. “Filtering a block” means “filtering the image points, also called pixels, saved in this block”.
The blocks which are adjacent horizontally and vertically, and exist in a multiplicity of rows and columns, can be imagined as fields in a chessboard arrangement in a Cartesian coordinate system. The filtering of a block at position (x/y) of the Cartesian coordinate system is, according to standard H.264, dependent on filtered pixels of the blocks in the positions (x−1/y) and (x/y−1) of the Cartesian coordinate system. With the coding of image information according to standard H.264, the blocks are combined to form macroblocks, wherein each macroblock is made of blocks which are arranged adjacent to each other horizontally in block rows and adjacent to each other vertically in block columns, with four blocks in each case. “Filtering a macroblock” means “filtering the pixels of the blocks in the macroblock saved in this macroblock”.
Filtering is carried out via a sequence of calculation steps also called an algorithm, wherein the calculation steps normally proceed in the context of a filter program on a computer. To carry out these calculation steps, a processor normally with a computation core, or simply core, is used. There is currently a trend towards processors with a number of cores, also called a “many-core processor” or a “many-core system”. Even architectures of processors with a multiplicity of cores, also called “multi-core processors” or “multi-core systems”, are being developed, as evidenced by the “Terascale” project or the “Larrabee” project from the company Intel. Graphics processors, called “graphic processor units” (GPUs), for example from the company NVidia, even today have many-core processors, which because of their high computing power and ease of programming are increasingly used for high-performance computing applications. To fully use the computing power available, processors with several or many cores need algorithms which are parallelisable. In parallelised algorithms, one calculation step does not depend on the results of previous calculation steps. If a calculation step depends on the results of previous calculation steps, these calculation steps must be carried out in series, i.e. consecutively, and cannot proceed in parallel with each other.
A problematic area in the implementation of standard H.264 in many-core processors is the deblocking filter in which there is no provision for a filtering of a macroblock independently of another macroblock due to the dependence of the filtering of a block at position (x/y) on the filtered pixels of the blocks at positions (x−1/y) and (x/y−1).
A known method for the partial parallelisation of the calculation steps of the filtering taking into account the data dependence described in the previous section consists of combining a number of blocks in the image of the Cartesian coordinate system into a diagonal. The diagonals formed from blocks are filtered consecutively in series, whereby the filtering of the blocks within a diagonal occurs in parallel to each other. However, in the procedure of the filtering of diagonals of blocks using a GPU from the company NVidia, drawbacks do arise:                1. The algorithm in which the implementation model of the filter programs is found, is also called a “kernel”. There is a high cost in terms of time to initialise the GPU, also called the startup overhead, for the running of the filter programs for the parallel filtering of blocks within a diagonal, as the kernel algorithm, which becomes extensive to meet the requirements of the parallel filtering of blocks within a diagonal, has to be transferred onto the graphics card which contains the GPU and the computation cores of the GPU have to be configured according to the kernel algorithm before the calculation steps start.        2. “Kernel” algorithms carried out in parallel cannot be synchronised with each other or can only be synchronised with long delays, also called latency times.        3. The diagonals of blocks are often not long enough to fully utilise all the computation cores of the GPU, and in this case some of the GPU's computing power remains unused.        
Another option for parallelising the calculation steps of the filtering consists of the filtering of macroblocks without taking account of the data dependence arising during filtering according to standard H.264, whereby this type of filtering is described by the inventor as a naive filter method. A filtered video frame is used to predict video frames which follow the filtered frame. If, the macroblocks needed to filter a macroblock and adjoining said macroblock are not filtered, discrepancies arise between this filtering result and the filtering result according to the standard. These discrepancies lead to pixel discrepancies between the filtering result of the naive filter method and the filtering result according to the standard, whereby these pixel discrepancies are also described as a drift effect, which clearly adversely affects the image quality of the decoded video frame compared to the image quality with filtering according to the standard.