Motion estimation is an effective inter-frame coding technique to exploit temporal redundancy in video sequences. Motion-compensated inter-frame coding has been widely used in various international video coding standards. The motion estimation adopted in various coding standards is often a block-based technique, where motion information such as coding mode and motion vector is determined for each macroblock or similar block configuration. In addition, intra-coding is also adaptively applied, where the picture is processed without reference to any other picture. The inter-predicted or intra-predicted residues are usually further processed by transformation, quantization, and entropy coding to generate a compressed video bitstream. During the encoding process, coding artifacts are introduced, particularly in the quantization process. In order to alleviate the coding artifacts, additional processing has been applied to reconstructed video to enhance picture quality in newer coding systems. The additional processing is often configured in an in-loop operation so that the encoder and decoder may derive the same reference pictures to achieve improved system performance.
FIG. 1A illustrates an exemplary adaptive inter/intra video coding system incorporating in-loop processing. For inter-prediction, Motion Estimation (ME)/Motion Compensation (MC) 112 is used to provide prediction data based on video data from other picture or pictures. Switch 114 selects Intra Prediction 110 or inter-prediction data and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transformation (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to form a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion, mode, and other information associated with the image area. The side information may also be subject to entropy coding to reduce required bandwidth. Accordingly, the data associated with the side information are provided to Entropy Encoder 122 as shown in FIG. 1A. When an inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
As shown in FIG. 1A, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Accordingly, various in-loop processing is applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. In the High Efficiency Video Coding (HEVC) standard being developed, Deblocking Filter (DF) 130, Sample Adaptive Offset (SAO) 131 and Adaptive Loop Filter (ALF) 132 have been developed to enhance picture quality. The in-loop filter information may have to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, in-loop filter information from SAO and ALF is provided to Entropy Encoder 122 for incorporation into the bitstream. In FIG. 1A, DF 130 is applied to the reconstructed video first; SAO 131 is then applied to DF-processed video; and ALF 132 is applied to SAO-processed video. However, the processing order among DF, SAO and ALF can be re-arranged.
A corresponding decoder for the encoder of FIG. 1A is shown in FIG. 1B. The video bitstream is decoded by Video Decoder 142 to recover the transformed and quantized residues, SAO/ALF information and other system information. At the decoder side, only Motion Compensation (MC) 113 is performed instead of ME/MC. The decoding process is similar to the reconstruction loop at the encoder side. The recovered transformed and quantized residues, SAO/ALF information and other system information are used to reconstruct the video data. The reconstructed video is further processed by DF 130, SAO 131 and ALF 132 to produce the final enhanced decoded video.
The coding process in HEVC encodes or decodes a picture using a block structure named Largest Coding Unit (LCU). The LCU is adaptively partitioned into coding units (CUs) using quadtree. In each leaf CU, DF is performed for each 8×8 block and in HEVC Test Model Version 4.0 (HM-4.0), the DF is applied to 8×8 block boundaries. For each 8×8 block, horizontal filtering across vertical block boundaries (also called vertical edges) is first applied, and then vertical filtering across horizontal block boundaries (also called horizontal edges) is applied. During processing of a luma block boundary, four pixels on each side of the boundary are involved in filter parameter derivation, and up to three pixels on each side of the boundary may be changed after filtering. FIG. 2A illustrates the pixels involved in the DF process for a vertical edge 210 between two blocks, where each smallest square represents one pixel. The pixels on the left side (i.e., pixel columns p0 to p3 as indicated by 220) of the edge are from one 8×8 block, and the pixels on the right side (i.e., pixel columns q0 to q3 as indicated by 230) of the edge are from another 8×8 block. In the DF process according to HM-4.0, the coding information of the two 8×8 blocks is used to calculate the boundary strength of the edge first. However, there are also variations where the boundary strength is determined using other schemes. After the boundary strength is determined, columns p0-p3 and q0-q3 of the reconstructed pixels are used to derive filter parameters including filter on/off decision and strong/weak filter selection as shown in FIG. 2B and FIG. 2C respectively. FIG. 2B illustrates an example of filter on/off decision based on pixels from the third line 240 (counted from top) and the sixth line 250 according to HM-4.0. FIG. 2C illustrates an example of filter strong/weak decision for each line based on respective boundary pixels as indicated by the thick-lined boxes 260-267. In HM-4.0, the derivation is only required for the luma component. Finally, reconstructed pixels are horizontally filtered to generate DF intermediate pixels. During the luma filtering horizontally across the vertical boundary 210, pixels in columns p0-p3 and q0-q3 are referenced, but only pixels in columns p0-p2 and q0-q2 may be modified (i.e., filtered).
For horizontal filtering across vertical block boundaries, unfiltered reconstructed pixels (i.e., pre-DF pixels) are used for filter parameter derivation and also used as source pixels for the filter operation. For vertical filtering across horizontal block boundaries, unfiltered reconstructed pixels (i.e., pre-DF pixels) are used for filter parameter derivation, and DF intermediate pixels (i.e. pixels after horizontal filtering) are used as source pixels for the vertical filtering. For DF process of a chroma block boundary, two pixels on each side are involved in filter parameter derivation, and at most one pixel on each side may be modified after filtering. During chroma filtering, pixels in columns p0-p1 and q0-q1 are referenced, but only pixels in columns p0 and q0 are filtered.
FIG. 3 illustrates the boundary pixels involved in the DF process for a horizontal edge 310, where each smallest square represents one pixel. The pixels on the upper side (i.e., pixel rows p0 to p3 as indicated by 320) of the edge are from one 8×8 block, and the pixels on the lower side (i.e., pixel rows q0 to q3 as indicated by 330) of the edge are from another 8×8 block. The DF process for the horizontal edge is similar to the DF process for the vertical edge. First, the coding information of the two 8×8 blocks is used to calculate the boundary strength of the edge. Next, rows p0-p3 and q0-q3 of reconstructed pixels are used to derive filter parameters including filter on/off decision and strong/weak filter selection. Again, this is only required for luma. In HM-4.0, reconstructed pixels are used for deriving filter decisions. Finally, DF intermediate pixels are vertically filtered to generate DF output pixels. During the luma filtering, pixels in rows p0-p3 and q0-q3 are referenced, but only pixels in rows p0-p2 and q0-q2 are filtered. During chroma filtering, pixels in rows p0-p1 and q0-q1 are referenced, but only pixels in rows p0 and q0 are filtered.
When DF is processed on an LCU by LCU basis in a raster scan order, there will be data dependency between LCUs as shown in FIG. 4A through FIG. 4D. Vertical edges in each LCU are horizontally filtered first and horizontal edges are then vertically filtered. The rightmost vertical edge of the current LCU cannot be horizontally filtered until the involved boundary pixels from the next LCU become available. Similarly, the lowest horizontal edge of the current LCU cannot be vertically filtered until the involved boundary pixels from the below LCU become available. Accordingly, data buffers are required to accommodate filtering operation due to the data dependency. For the horizontal DF process of the vertical boundary between two adjacent LCUs, four reconstructed pixel columns of one LCU height will be required for the luma component and two reconstructed pixel columns of one LCU height are required for the chroma component. FIG. 4A illustrates the pixels involved in the DF process of the vertical boundary between a current LCU 410 and an adjacent LCU 412 on the left, where four pixel columns from the adjacent LCU 412 are required. Similarly, four pixel rows from the above LCUs will also be buffered for the vertical DF process. Accordingly, four pixel rows for the adjacent LCUs 410a, 420a, 412a and 422a corresponding to LCUs 410, 420, 412 and 422 respectively are buffered. In FIGS. 4A-4D, an unfiltered pixel 401 is indicated by a non-shaded smallest square. On the other hand, a horizontally filtered pixel 402, a vertically filtered pixel 403, and a horizontally and vertically filtered pixel 404 are indicated by different shaded patterns. As shown in FIG. 4A, the three pixel columns on each side of the vertical boundaries may be changed after the horizontal DF filtering.
After horizontal filtering of the vertical edges of LCU 410, vertical DF process can be applied to the horizontal edges of LCU 410 except for the bottom edge. The horizontally filtered pixels, vertically filtered pixels, and horizontally and vertically filtered pixels after the vertical DF filtering are shown in FIG. 4B. The DF process is then moved to the next LCU 420. Horizontal DF process is applied to the vertical edges of LCU 420 except for the rightmost edge and the horizontally filtered pixels are indicated by respective shaded areas in FIG. 4C. The boundary pixels of LCU 410 corresponding to the vertical edge between LCU 410 and LCU 420 are also processed by the horizontal DF process during this step. After horizontal DF process of vertical edges of LCU 420, the vertical DF process is applied to the horizontal edges of LCU 420 except for the bottom edge. The corresponding processed pixels are shown in FIG. 4D. The DF process shown in FIG. 4A to FIG. 4D are intended to illustrate an example of data dependency associated with the DF process. Depending on the particular DF process used, the line and column buffer requirement due to data dependency may be different.
In addition to pixel line buffers for unfiltered and filtered pixels of neighboring LCUs, there is also a need for storing other information to support LCU-based DF process.
For hardware based implementation, these column buffers are often implemented as on-chip registers or SRAMs since the storage requirement for preceding pixel columns is relatively small. For example, four reconstructed pixel columns of one LCU height and two reconstructed pixel columns of one LCU height are required for processing DF on luma and chroma respectively. On the other hand, the line buffers for storing the four pixels rows of one picture width for luma and two pixel rows of one picture width for chroma corresponding to the LCUs above may be sizeable, particularly for large size pictures. Line buffer implementation based on on-chip memory (e.g. Static Random Access Memory (SRAM)) may significantly increase the chip cost. On the other hand, line buffer implementation based on off-chip memory (e.g. Dynamic Random Access Memory (DRAM)) may significantly increase power consumption and system bandwidth. Therefore, it is desirable to reduce line buffers required for the DF process.