This invention relates to video compression, and more particularly for spatial de-blocking methods.
Video data can greatly enhance the quality of a computing experience. The consumer use of the Internet took off once graphics was linked to earlier text-based web pages. Portable consumer devices such as cell phones and personal digital assistant (PDA's) are being equipped with small cameras to allow for capture of still or even video pictures. Efficient transmission of captured images over limited-bandwidth links requires some sort of compression of the images.
A number of video-compression techniques are known. Compression standards, such as those developed by the motion-picture-experts group (MPEG), have been widely adopted. These compression techniques are lossy techniques, since some of the picture information is discarded to increase the compression ratio. However, compression ratios of 99% or more have been achieved with minimal noticeable picture degradation.
Next-generation compression standards have been developed for transmitting video over wireless networks. The MPEG-4 standard provides a robust compression technique for transmission over wireless networks. Recovery can occur when parts of the MPEG-4 bit stream is corrupted.
These MPEG standards ultimately break the image up into small 16×16 pixel macroblocks or even smaller 8×8 pixel blocks. Each block can then be compressed more or less independently of other blocks, and movement of blocks can be described as highly compressed “motion vectors” rather than large bitmaps of pixels.
FIG. 1 shows an image frame divided into rows and columns of blocks. The MPEG standard uses a divide-and-conquer technique in which the video sequence is divided into individual image frames known as video object planes (VOPs), and each frame is divided into rows and columns of macroblocks. Each macroblock is a rectangle of 16 by 16 pixels. Each macroblock can be further divided into 8×8 blocks.
Various window sizes and image resolutions can be supported by MPEG standards. For example, one common format is an image frame of 176 by 144 pixels. The image frame is divided into 18 rows of 8×8 blocks, with each row having 22 blocks each of 8×8 pixels. A total of 396 blocks are contained in each frame.
The blocks are arranged in a predetermined order, starting in the upper left with the first block (BLK #0). The second block, BLK #1, is to the right of BLK #0 in the first row, followed by blocks #2 to BLK #21 in the first row. The second row contains BLK #22 to BLK #43. The last row contains BLK #374 to BLK #395. Of course, other image sizes and formats can have the blocks in rows of various lengths, and various numbers of rows.
When an image frame is encoded, each block is encoded in order, starting with BLK #0 in the first row, and continuing with BLK #1, BLK #2 to BLK #21 in the first row, then BLK #22 to BLK #43 in the second row, and on until the last row with BLK #374 to BLK #395. The blocks are arranged in the bit stream into one or more video packets (VP) with a header.
Since each bock is compressed separately from other blocks, there can be noticeable discontinuities at block edges. For example, an highly-compressed image of a blue sky may have visible color-change bands caused by blocking artifacts. While the actual sky changes gradually from one shade of blue to another, only a few blue shades may be used to represent the sky in the compressed image. Several rows of blocks may code all pixels as a dark shade of blue while a next row codes all pixels as a lighter shade of blue. An abrupt color change may be noticeable at the edge of the row. When the color change boundary moves diagonally or crosses rows, the row crossing may cause a stair-step of jagged block edges to be visible.
MPEG decoders may use a de-blocking filter to reduce the visibility of such blocking artifacts. Filters can be applied along block edges, both vertical and horizontal edges. See for example the ISO/IEC JTC1/SC29/WG11 standard's working group document m4960.doc: section 9.1 De-Blocking Filter. FIGS. 2A-B show prior-art de-blocking filter applied to vertical and horizontal block edges. In FIG. 2A, a row of pixels V0, V1, . . . V9 crosses a vertical boundary between two 8×8 blocks, BLK #N and BLK #N+1. Pixel V4 is the last pixel in BLK #N, while pixel V5 is the first pixel in block #N+1.
Due to compression, a noticeable color difference may appear between pixels V4 and V5. The prior-art de-blocking filter combines pixels V1, V2, V3, V4 in BLK #N using the S1 smoothing filter, pixels V5, V6, V7, V8 in BLK #N+1 using the S2 smoothing filter, and pixels V3, V4, V5, V6 that cross the boundary using the S3 smoothing filter.
In FIG. 2B, a column of pixels V0, V1, . . . V9 crosses a horizontal boundary between two 8×8 blocks, BLK #N and BLK #N+22 in the next row. Pixel V4 is the last pixel in BLK #N, while pixel V5 is the first pixel in block #N+22.
Due to compression, a noticeable color difference may appear between pixels V4 and V5. The prior-art de-blocking filter combines pixels V1, V2, V3, V4 in BLK #N using the S1 smoothing filter, pixels V5, V6, V7, V8 in BLK #N+22 using the S2 smoothing filter, and pixels V3, V4, V5, V6 that cross the boundary using the S3 smoothing filter.
The smoothing filters can be re-applied for each of the 8 columns and each of the 8 rows in each 8×8 block, for all blocks in a frame. The pixel inputs V1, V2 . . . V8 can be shifted to the right by one column, or down by one row, and the operations repeated on the new inputs.
FIG. 3 is a diagram highlighting computational steps performed during a prior-art de-blocking process. The three 4-pixel groups S0, S1, S2 are input to vector multipliers 11, 14, 16. Vector multiplier 11 receives pixels V3, V4, V5, V6 that cross the horizontal or vertical boundary, and generates frequency component A0 as the inner product of the S0 pixel vector and the transposed ([ . . . ]T) discrete cosine transform (DCT) kernel [2−5 5 2]T:A0=([2−552]*[V3V4V5V6]T)/8
Vector multiplier 14 receives pixels V1, V2, V3, V4 in BLK #N, and generates frequency component A1 as the inner product of the S1 pixel vector and the DCT kernel:A1=([2−552]*[V1V2V3V4]T)/8
Vector multiplier 16 receives pixels V5, V6, V7, V8 in BLK N+1 or N+22, and generates frequency component A2 as the inner product of the S2 pixel vector and the DCT kernel:A2=([2−552]*[V5V6V7V8]T)/8
These three inner products are frequency-domain components of the pixel arrays within the two blocks and crossing the block edge. The absolute values of frequency components A0, A1, A2 are generated by absolute-value generators 22, 24, 26, such as by dropping the sign bit. Minimum selector 10 then selects the minimum absolute value of A0, A1, A2. The original sign of S0 is applied to this selected minimum to generate A0′.
The original A0 is then subtracted from A0′, the difference multiplied by 5 and integer-divided by 8 using calculator 28. The result is input to clipper 20, which clips the result to a value between 0 and DE. DE is half of the edge-pixel difference (V4−V5)/2, generated by pixel differencer 18. Thus clipper 20 limits extreme values to a range of 0 to (V4−V5)/2. The clipped difference value output by clipper 20 is D.
Applicator 30 then adds clipped difference D to edge pixel V5 to generate the filtered pixel V5. Clipped difference D is also subtracted from edge pixel V4 to generate the new filtered pixel V4. The filtered values of V4 and V5 replace the old values in the image to be displayed. The edge difference between pixels V4 and V5 is thus smoothed by D, reducing the change in pixel value from V4 to V5. This reduces visible color change at the block edge.
When the absolute value of the frequency component A0 (from the edge-crossing pixels) is greater than the quantization parameter QP, then condition checker 32 disables filtering for the current row or column. Large pixel differences may be caused by a real edge in the image, while smaller pixel differences are more likely caused by compression noise. When the absolute value of A0 is less than or equal to QP, then applicator 30 is enabled to add D to pixel V5 and subtract D from pixel V4.
A large amount of computational work is required for each pair of edge pixels that are smoothed. In particular, vector multipliers 11, 14, 16 each perform four multiplies and three adds, and a final divide-by-eight or 3-bit left-shift. While one arithmetic-logic-unit (ALU) or multiplier/adder/divider could be re-used three times, either the hardware required or the number of clock cycles to perform the three vector multiplies is significant. A total of 12 multiply operations is needed, and each multiply can be a full integer multiply rather than a simple right-shift multiply.
Since there are 300 or more blocks in each frame, and the de-blocking process may be repeated 16 times for each block, many computations may be performed by the de-blocking filter. It is therefore desirable to reduce computational complexity of the de-blocking filter. A more streamlined de-blocking filter is desirable.