In the MPEG-4 AVC video coding standard divides each video frame into 16×16 pixels blocks called macroblocks. This process may lead to artifacts upon decoding at the macroblock boundaries. A deblocking filter improves the visual quality of the decoded frames by reducing these artifacts. The deblocking filter is applied to all the edges of 4×4 pixels blocks in each macroblock except the edges on the boundary of a frame or a slice.
For each block, vertical edges are filtered from left to right, and then horizontal edges are filtered from top to bottom. The decoding process is repeated for all the macroblocks in a frame. A major challenge is the detection of true edges in an image. Blindly applying a low pass filter would remove most of the blocking artifacts, but would blur the image as well. Analysis of run-time profiles of decoder sub-functions shows the deblocking filter process is the most computationally intensive part of the decoder. This deblocking takes as much as one-third of computational resources of the decoder.
A deblocking filter usually processes multiple passes of an image. In embedded applications on-chip memory can hold only a portion of the image and external memory must hold the entire image. Straightforward implementation of deblocking thus incurs significant memory access time and power consumption due to external memory accesses.
FIG. 1 illustrates the role of the deblocking filter in an MPEG-4 AVC decoder. The multiple passes involved in deblocking are performed by block 105. The decoder accepts and encoded bitstream at entropy decoding block 101. Entropy decoding block 101 translates the bitstream to the frequency domain. Inverse scan and dequantization block 102 properly scales the frequency-domain information to the original scale. Higher frequency components are often scaled down to take advantage of the property that human vision is less sensitive to changes and thus tolerates larger errors in the higher frequency components. Inverse transformation block 103 converts the frequency-domain information to spatial domain image pixel values.
A block of pixels can be intra-coded, spatial-predicted or motion-compensated. For an intra-coded block, macroblock mode switch 108 produces a zero predictor to the prediction adder 104. Thus the output of inverse transform block 103 passed through unaltered to deblocking filter 105. Deblocking filter 105 performs deblocking. For a spatial-predicted block, spatial compensation block 107 retrieves an already-decoded block in the same frame from frame store 106 to construct a predictor signal. Macroblock mode switch 108 then feeds this intra-frame prediction signal to prediction adder 104. For a motion-compensated block, motion compensation block 109 retrieves an already decoded block in another frame from frame store 106 to construct a predictor to signal. Macroblock mode switch 108 feeds this motion-compensated signal to prediction adder 104. One output of deblocking filter 105 is the decoded frame. A second output to deblocking filter 105 is stored back into frame store 106 for future reference.
Because the video encoder performs spatial-to-frequency-domain transform and quantization in blocks (typically 8×8 in size), there are often abrupt transitions at block boundaries. The deblocking filter in a video encoder and decoder evens out such block boundary transitions and improves the quality of decoded video. The video encoder employs deblocking filter in the encoding flow to accurately predict the reference frames in the decoder.
Deblocking algorithms normally use complex mathematical derivations to identify and remove block artifacts. They can achieve significant improvement in subjective and objective quality, but their high computation and implementation complexity prohibits adoption directly in a real time MPEG-4 decoder.
There are a number of known deblocking algorithms which reduce the block artifacts in block DCT-based compressed images with minimal smoothing of true edges. They can be classified as: (a) regression-based algorithms; (b) wavelet-based algorithms; (c) anisotropic diffusion based algorithms; (c) weighted sum of pixels across block boundaries based algorithms; (d) iterative algorithms based on projection on convex sets (POCS); and (e) adaptive algorithms. These algorithms operate in the spatial domain. Other proposed algorithms work on the DCT transformed domain. There are three key classes of frequency domain deblocking algorithms: (a) projection on convex sets (POCS); (b) weighted sum of pixels across the block boundaries; and (c) adaptively applying different filters.
Projection on convex sets (POCS) iterative algorithms originate from early work on image restoration. A number of constraints, usually two, are imposed on an image to restore it from its corrupted version. After defining the transformations between the constraints, the algorithm starts at an arbitrary point in one of the sets, and projects iteratively among them until convergence occurs. The mean square error (MSE) is used as a metric of closeness between two consecutive projections. Convergence is reached when the MSE falls below an assigned threshold.
If the constraints are convex sets, some believe convergence is guaranteed if the intersection of the sets is non-empty. The constraint sets generally chosen are frequency band limits in both the vertical and horizontal directions (known as filtering constraint) and quantization intervals of the transform coefficients (referred to as quantization constraint). In the first step, the image is band-limited by applying a low-pass filter. The image is then transformed to obtain the transform coefficients, which are subjected to the quantization constraint. The coefficients lying outside of the quantization interval are mapped back into the interval.
For example, the coefficients can be clipped to the minimum and maximum value if outside the interval. The algorithm iterates this two-step process until convergence. The algorithm typically converges after about twenty iterations.
In weighted sum of symmetrically aligned pixels algorithms the value of each pixel is recomputed with a weighted sum of itself and the other pixel values symmetrically aligned with block boundaries. Some schemes include three other pixels, which are taken from the block above, to the left and the block above the left block. The weights are determined empirically and can either be linear or quadratic. The combined effect of these weighted sums on the pixels is an interpolation across the block boundaries.
However, there is a problem in this approach when a weighted sum of a pixel in a smooth block takes the pixels in the adjacent high-detail blocks into account. The texture details leak into the smooth region and a vague image of the high-detail blocks can be seen. This new artifact is called hosting. A scheme of grading each block according to the level of details with a grading matrix seeks to minimize this new artifact. The weights on each of the four pixels are then increased or reduced according to the grades.
The execution time in weighted sum of symmetrically aligned pixels algorithms is guaranteed, as the operations are well defined. Since the pictures must be graded before applying the filter on the pixels, this requires a four-pass scheme. This algorithm essentially performs a filtering of matrix operations in the grading process. A very high performance processor is required to implement this algorithm in real time.
In the adaptive deblocking filter algorithm, the deblocking process is separated into two stages. In the first stage, the edge is classified into different boundary strengths with pixels along the normal to the edge. In the second stage, a different filtering scheme is applied according to the strengths obtained in the first stage. In some applications the edges are classified into 3 types to which no filter, a weak 3-tap filter or a strong 5-tap filter are applied. The algorithm is adaptive because the thresholds for edge classification are based on the quantization parameters included in the relevant blocks. An edge will only be filtered if the difference between the pixel values along the normal to the edge, but not across the edge, is smaller than the threshold. For high detail blocks on the side of edges, the differences are usually larger and so strong filtering is seldom applied to preserve detail. As the threshold increases with the quantization parameters, the edges across high detail blocks will be filtered eventually because a high coding error is assumed for large quantization parameters. Since the edges are classified before processing, strong filtering can be replaced by weak filtering or even skipped. Also the filtering is not applied to every pixel but only those across the edges. A significant amount of computation can be saved through the classification. A disadvantage of this algorithm is the high complexity in control flow of the algorithm.
Table 1 summarizes the relative computation and implementation complexity of these three key classes of algorithms. POCS-based algorithms are considered the most complex algorithms because the flow complex and major operations are much more intensive than the other two.
The major operation performed in the weighted sum based algorithm and the adaptive algorithm is similar. For 4×4 pixels blocks, the major operation performed by adaptive algorithm is only about half of that by the weighted sum based algorithm. The adaptive algorithm is considered more difficult to implement because of the complexity of adaptive filtering.
TABLE 1AlgorithmPOCS basedWeightedAdaptiveAlgorithm FlowIterativelyGradingIterativelyprojectingblocks withclassify andback andgradingapply filterforthmatrixon everybetween twoiterative onblock edgesets onevery pixelwholepictureMajorLow passWeighted sum3-tap or 5-Operationsfilteringof fourtap filterDiscretepixels fouron pixelsCosineeach pixelacross edgesTransformRelativeHighMediumLowComputationComplexityRelativeHighLowMediumImplementationComplexity