Motion compensated transform coding has been widely adopted in various coding standards, where block transform is applied to motion-compensated residues. The motion compensated inter-frame coding system also uses intra-frame mode periodically or adaptively. During the coding process, transform coefficients are quantized in order to reduce bitrate and consequently artifacts are introduced. The artifacts are more visible at boundaries around transform blocks. In order to alleviate the coding artifacts, a technique called deblocking has been developed, which applies in-loop filtering across block boundaries adaptively. The deblocking technique is also called deblocking filter in the field of video coding.
The H.264 coding standard also adopts block-based motion-compensation, where the deblocking filter is applied to reduce the distortion most visible across block boundaries. FIG. 1 shows an exemplary decoding flow used in H.264. The input coded bitstream provided to decoder 100 is processed by variable length decoding (VLD) 110. The decoded data is then processed by Intra block decoding 120 or Inter block decoding 130 for reconstruction depending on whether the block is Intra-coded or Inter-coded. The reconstructed data is stored and used for motion compensation (MC) 140 by other frames. In order to improve video quality, in-loop filtering (LF) 150 is applied to the reconstructed video. The in-loop filter is applied across boundaries of 4×4 blocks. The horizontal deblocking filter is applied to vertical block boundaries first in the order from left to right and the vertical deblocking filter is then applied to the horizontal boundaries in the order from top to bottom. The deblocked video data is then stored in the frame buffer (not explicitly shown in FIG. 1) and used for motion compensation by other frames.
FIG. 2 illustrates an example of filtering 4×4 block boundaries of the luma component in a macroblock according to the H.264 coding standard. The deblocking operation is applied to vertical boundaries in the order of 211, 212, 213 and 214. The deblocking operation is also applied to horizontal boundaries in the order of 215, 216, 217 and 218. FIG. 3 illustrates an example of filtering block boundaries of a macroblock for the chroma component. Similarly, the deblocking filter processes vertical boundaries in the order of 311 and 312. Horizontal boundaries 313 and 314 are then filtered.
The filtering operation on every block boundary of the luma component updates 0 to 3 pixels on each side of the boundary. FIG. 4A shows an example of filtering a pixel line across a vertical boundary. In the pixel line across the vertical boundary 410, four pixels on each side of the vertical boundary, labeled as (p3v, p2v, p1v, p0v, q0v, q1v, q2v, q3v), respectively are used to derive filter parameters. Pixels immediately next to the block boundary, i.e., p0 and q0, are named the first boundary pixels. Similarly, p1 and q1 are named the second boundary pixels, p2 and q2 are named the third boundary pixels and p3 and q3 are named the fourth boundary pixels. For filtering the vertical boundary of the luma component, the deblocking filter updates at most 3 pixels from the first boundary pixel to the third boundary pixel on each side of the vertical boundary depending on the boundary strength assigned to the vertical boundary. For the chroma component, no pixel or only the first boundary pixel of each side of vertical boundary 410 may be modified by the deblocking process depending on the boundary strength. Therefore, at most 1 pixel on each side of the vertical boundary may be affected by the deblocking operation.
FIG. 4B shows an example of filtering a pixel line across horizontal boundary 420. Similar to filtering a vertical boundary as shown in FIG. 4A, the deblocking filter may update up to 3 pixels from the first boundary pixel to the third boundary pixel on each side of the horizontal boundary for the luma component. In other words, only pixels p2h, p1h, p0h, q0h, q1h, and q2h may be modified by the deblocking process for the luma component. For the chroma component, only one pixel on each side of the boundary, i.e., p0h and q0h may be updated.
The number of pixels to be updated for deblocking on each side of a block boundary is determined based on the boundary strength. The boundary strength parameter Bs is estimated according to the information of the current macroblock (MB) to be processed. The information used to determine Bs includes the Intra/Inter prediction mode information, the coded block pattern, the motion vector, the pixel values or other information of the MB. The boundary strength parameter Bs(Cx, Cy) for filtering the chroma block boundary can be derived from the boundary strength parameter Bs(Yx, Yy) for filtering block boundaries of the luma component in the same MB. The relationship between (Bs(Cx, Cy) and Bs(Yx, Yy) can be represented by the following three equations:Bs(Cx,Cy)=Bs(Yx,Yy),Yx=subwidthC*Cx, andYy=subheightC*Cy, where Yx and Yy denote the location of the current block boundary of the luma component in x and y directions respectively, and Cx and Cy denote the locations of the current block boundary of the chroma component in x and y directions respectively. The parameters subwidthC and subheightC are used to map the location of the current block boundary of the chroma component to the corresponding location of the current block boundary of the luma component.
FIG. 5 illustrates an exemplary diagram to determine Bs for filtering block boundaries of the luma component according to H.264. For a block boundary to be filtered, a test is performed regarding whether the samples to be filtered belong to an Intra coded MB, a slice of SI (Switching I-Picture) or SP (Switching P-Picture) type as shown in step 510. If the samples to be filtered is in an Intra coded block or a SI/SP slice, step 520 is performed to determine whether the current block boundary is a MB boundary. When the current block boundary to be processed is also a MB boundary, significant blocking distortion may exist in the current block boundary. Step 530 is used further to determine the Bs for the block boundary. In step 530, if any of the following two conditions are met, Bs is set to 4: (1) if the samples to be filtered are in frame macroblocks; or (2) if (the samples are in a macroblock pair or in a field picture), and the samples are associated with a vertical block edge. If neither of the two conditions is met in step 530, the block strength parameter Bs is set to 3. In step 520, when the current block boundary is not a MB boundary, Bs is set to 3.
If the samples to be filtered is not in an Intra coded block or a SI/SP slice, a further test is performed in step 521 to determine whether the Coded Block Pattern (CBP) value is equal to 1 (i.e. CBP=1), which implies at least one of the two adjacent 4×4 blocks on both sides of the current boundary contains coded coefficients. If the Coded Block Pattern value is equal to 1, then Bs is 2. When none of the two adjacent 4×4 blocks contains coded coefficients (i.e. CBP=0), a further test is performed in step 531 to determine the value of Bs. In step 531, if any of the following three conditions is met, Bs is set to 1: (1) the two first boundary pixels belong to different macroblock pairs as indicated by mixedModeEdgeFlag=1; (2) the two adjacent 4×4 blocks on both sides of the current boundary have different reference frames or a different number of reference frames as indicated by Ref(p)!=Ref(q); or (3) the two adjacent 4×4 blocks on both sides of the current boundary have different motion vector values as indicated by #mv(p)!=#mv(q). If none of the above three conditions in step 531 is met, a further test is performed in step 541 to determine the value of Bs. In step 541, if the absolute difference between the respective horizontal or vertical component of the motion vectors for the two adjacent blocks used is greater than or equal to 4 in units of quarter luma frame samples (i.e. |mv(p)−mv(q)|>=4(quarter pel)), Bs is set to 1. Otherwise, Bs is set to 0.
The filter mode for deblocking is selected based on the boundary strength of neighboring blocks and the gradient of samples across the boundary. When the current block boundary to be filtered is a MB boundary, the deblocking filter may update at most 3 pixels on each side of the current block boundary. When the current block to be filtered is not a MB boundary, the deblocking filter will update less than 3 pixels on each side of the current block boundary.
The decoding method with deblocking operation mentioned above is usually implemented using a single processor or core to decode one slice of video image. However, dual-core processors or multi-core processors are becoming the trend in personal computer, note book, tablet or smartphone environments. The dual-core processors or multi-core processors would be helpful to fulfill the needed processing power to decode ultra-high definition (UHD) video bitstream. Each coded picture/image in UHD can be divided into at least two independent slices. It is desirable to use dual-core or multi-core processors to perform parallel decoding of independent slices or other picture units concurrently. However, the deblocking process is configured as an in-loop processing, where the deblocking process of a subsequent adjacent macroblock cannot be performed until a previous macroblock is deblocked. Due to data dependency on adjacent previous macroblock, a current slice cannot be processed until the deblocking process for a previous slice is completed. Accordingly, the data dependency associated with the deblocking process poses as a challenge to decoding based on a dual-core or multi-core processor.