Block-based video compression causes inconsistent visual quality on the block boundaries. The contemporary video compression technology utilizes a post-processing device, such as, deblocking filter, to reduce the blocking effect and improve the compression efficiency. As shown in the exemplary embodiment of FIG. 1A, the deblocking filter not included inside the coding/decoding loop of the video codec is called out-loop filter 110. On the other hand, H.264/advanced video coding (AVC) technology uses a deblocking filter inside the coding/decoding loop, called in-loop filters 122, 124, respectively, as shown in FIG. 1B, to remove the blocking effect and improve the compression efficiency.
As shown in the video sequence of FIG. 2A, H.264/AVC video compression standard allows a compressed video sequence 200 to be an arbitrary combination of interlaced frame-picture 210 and field-picture 220. As shown in the exemplar in FIG. 2B, a frame-picture 230 may be composed of a top-field 232 and a bottom-field 234. Top-field 232 is composed of even pixel rows and bottom-field 234 is composed of odd pixel-rows. The format of interlaced video sequence 200 is called as the picture adaptive frame field (PICAFF) format.
As shown in the macroblock format exemplar of FIG. 3, a frame-picture 300 may be partitioned to a plurality of macroblocks (MBs) 302, with each MB being composed of a 16×16 pixels luma component 310 and two chroma components, such as, chroma components 312, 314. 16×16 pixels luma component 310 is composed of 16 4×4 blocks. The chroma components of H.264/AVC have three types of formats, 4:2:0, 4:2:2 and 4;4:4, respectively. As shown in the exemplar of FIG. 3, a chroma component of a 4:2:0 format is composed of 8×8 pixels, such as, chroma components 312, 314. A chroma component of a 4:2:2 format is composed of 16×8 pixels, such as, chroma components 322, 324. A chroma component of a 4:4:4 format is composed of 16×16 pixels, such as, chroma components 332, 334.
As shown in FIG. 4, an MB pair in a frame-picture 400 may be a frame-MB pair 410 or a field-MB pair 420, where the MB pair has the same horizontal position and adjacent vertical positions in the frame. Top-field MB 422 of field-MB pair 420 is composed of even-numbered pixel rows in the field-MB pair 420 and bottom-field MB 424 is composed of odd-numbered pixel rows in the field-MB pair 420. H.264/AVC compression standard allows a frame-picture 400 to be an arbitrary combination of frame-MB pair 410 and field-MB pair 420. This type of format is called as the macroblock adaptive frame field (MBAFF) format.
For an MB with 4:2:0 chorma component, deblocking filter needs to process 48 block edges, including 24 vertical edges and 24 horizontal edges. For an MB with 4:2:2 chorma component, deblocking filter needs to process 64 block edges, and for an MB with 4:4:4 chorma component, deblocking filter needs to process 96 block edges. As shown in FIG. 5, when the deblocking filter processes deblocking for H.264/AVC compression, vertical edge 510 is processed before horizontal edge 520. Furthermore, the filtered pixels of the vertical edges deblocked by the deblocking effect are used as the input data for filtering horizontal edge 520, where v is the pixels of the vertical edge, on the left of edge 510 is the filtered pixels of the left neighboring block and on the right of edge 510 is the filtered pixels of the current block; and h is the pixels of the horizontal edge, above edge 520 is the filtered pixels of the top neighboring block and underneath edge 520 is the filtered pixels of the current block. In other words, the filtered pixels deblocked by a vertical filter are used as input to a horizontal filter for deblocking.
As shown in FIG. 6, for a vertical edge 510 or a horizontal edge 520, four lines of total 32 pixels related to a block edge will be processed line by line. Each line is composed of 8 pixels across the block-edge, named as p3, p2, p1, p0, q0, q1, q,2 q3, where p0 and q0 are the two adjacent pixels located in each side of the block edge. The 8 pixels and related parameters, such as, boundary strength, are used as input data for deblocking filters. The 8 pixels after deblocked by a deblocking filter are named as p′3, p′2, p′1, p′0, q′0, q′1, q′2, q′3. The deblocking filter that processes a line of pixels at a time is called line filter.
Accordingly, for a full HD video sequence with a frame rate of 30 frames per second, if the chroma components use 4:2:0 format, the deblocking filter needs to processes up to 11,705,280 block edges. If the deblocking filter is realized by the software implementation on the processor, the working clock of the processor will exceed 500 MHz. When the video sequence is compressed with MBAFF format, as shown in FIG. 7, if the top neighboring macroblock of the current frame-MB 705 is a field-MB, the deblocking filter must process the boundaries of the two top MBs (top-field MB 710 and bottom-field MB 720) additionally. For the video sequence, up to 12,194,880 edges must be processed per second. Furthermore, when performing vertical processing, the memory for reading and writing access is up to 32×195840×2 bytes, and when performing horizontal processing, the memory for reading and writing access is up to 32×(195840+16320)×2 bytes. In other words, the memory bandwidth requirement for the deblocking filter is up to 780,472,320 bytes per second.
U.S. Patent Publication No. 2008/0043853 disclosed a deblocking filter. As shown in FIG. 8, deblocking filter 801 uses a processing unit 802 to concurrently perform column-direction-edge filtering on a plurality of groups of pixels which are displayed in rows of cross-edge 812 on a current MB 804. For example, in a first clock cycle, pixels E4-E7 and pixels G4-G7 are processed, and in the second clock cycle, pixels F4-F7 and pixels H4-H7 are processed. Also, a rearrangement unit 803 is used to rearrange the processed pixels into respective rows. For example, pixels E4-E7 in row E are rearranged into pixels E4, F4, G4, H4 in column 4, pixels F4-F7 of row F are rearranged into pixels E5, F5, G5, H5 in column 5, pixels G4-G7 of row G are rearranged into pixels E6, F6, G6, H6 in column 6, and pixels H4-H7 of row H are rearranged into pixels E7, F7, G7, H7 in column 7. In this manner, the processing time of waiting for reading necessary pixels for deblocking horizontal edges may be reduced.
Cheng-An Chien, et. al, disclosed an in-loop deblocking filter with high throughput in 2008 and 2009. As shown in FIG. 9, deblocking filter 910 uses a 4×4/8×8 line filter 912 and a buffer management scheme for supporting various video coding tools of H.264/AVC, such as, PICAFF format and MBAFF format. The buffer management scheme uses two types of internal buffers to store data of reference MB pair, and the internal pixels are not required to be written into external memory 920 when deblocking filter switching between horizontal and vertical edge processing and rearranging the internal pixels for processing.