In block-based coding schemes such as, for example, MPEG or ITU-T, blocking artifacts often occur and smoothness is lost between adjacent blocks in the images. De-blocking techniques are used to restore smoothness in images.
A de-blocking filter may be utilized to smooth out the edges between adjacent blocks within an image. In AVC (also known as ITU-H.264 and MPEG 4, part 10), for example, a de-blocking filter processes macroblocks to the left and above a current macroblock (a macroblock may be a block of 16×16 pixels or smaller). The left, top and current macroblocks are stored in a macroblock buffer and conditional filtering is applied to all 4×4 block edges of a picture.
In vertical edge filtering, 4 pixels on the left of an edge of a 4×4 block (left 4 columns) and 4 pixels on the right of the edge of the 4×4 block (right 4 columns), in a given row, are filtered. If the edge is at macroblock boundary, the left and right pixels are from different rows of the macroblock buffer, since the right pixels are from a row in the current macroblock and the left pixels are from a row in the left macroblock.
Similarly, in horizontal edge filtering, 4 pixels above an edge of a 4×4 block (4 rows) and 4 pixels below the edge of the 4×4 block (4 rows) in a given column are filtered. If the edge is at macroblock boundary, the pixels above and below are from different rows of the macroblock buffer, since the pixels below are from a column in the current macroblock and the pixels above are from a column in the macroblock above.
When performing horizontal edge filtering, for example, 8 pixels have to be read out of the macroblock buffer, 4 above an edge and 4 below the edge. Each pixel resides in a different row of pixels in a macroblock, so reading each pixel from a memory unit such as, for example, a synchronous random access memory (SRAM) requires one clock cycle. After reading out the pixels, the filter is applied to the pixels, and then each pixel is written back to the macroblock buffer on the SRAM, which requires one clock cycle to write one pixel. As a result, at each pixel of the 16×16 block, horizontal or vertical edge filtering requires 8 clock cycles to read the pixels plus 8 clock cycles to write the pixels plus the amount of clock cycles needed for filtering.
One way to make the process faster is to have the components associated with each field stored on a different RAM, and read the needed values simultaneously. However, this solution gets costly in terms of the area used for implementation, since multiple RAMs occupy larger space than one. So, there is a trade off between the space and speed of the process.
The macroblock buffer may be implemented with a single-port SRAM. For every clock cycle, the SRAM can either be read or written, but not both at the same time, and as a result the process time may be slowed down. A dual-port SRAM allows both read and write at the same time, but at the expense of huge area increase.
Another way to make the process faster is to use a flip-flop implementation, where everything is stored in flip-flops. Using flip-flops provides the flexibility of being able to read data out of them at any time without the limitation of reading one pixel at a time. However, a complete flip-flop implementation would require a large number of flip-flips and that would require a large area as well.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with some aspects of the present invention as set forth in the remainder of the present application with reference to the drawings.