Motion estimation is an effective inter-frame coding technique to exploit temporal redundancy in video sequences. Motion-compensated inter-frame coding has been widely used in various international video coding standards. The motion estimation adopted in various coding standards is often a block-based technique, where motion information such as coding mode and motion vector is determined for each macroblock or similar block configuration. In addition, intra-coding is also adaptively applied, where the picture is processed without reference to any other picture. The inter-predicted or intra-predicted residues are usually further processed by transformation, quantization, and entropy coding to generate compressed video bitstream. During the encoding process, coding artifacts are introduced, particularly in the quantization process. In order to alleviate the coding artifacts, additional processing has been applied to reconstructed video to enhance picture quality in newer coding systems. The additional processing is often configured in an in-loop operation so that the encoder and decoder may derive the same reference pictures to achieve improved system performance.
FIG. 1A illustrates an exemplary adaptive inter/intra video coding system incorporating in-loop processing. For inter-prediction, Motion Estimation (ME)/Motion Compensation (MC) 112 is used to provide prediction data based on video data from other picture or pictures. Switch 114 selects Intra Prediction 110 or inter-prediction data and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transformation (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to form a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion, mode, and other information associated with the image area. The side information may also be subject to entropy coding to reduce required bandwidth. Accordingly, the data associated with the side information are provided to Entropy Encoder 122 as shown in FIG. 1A. In the Intra mode, a reconstructed block may be used to form Intra prediction of spatial neighboring block. Therefore, a reconstructed block from REC 128 may be provided to Intra Prediction 110. When an inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data can be stored in Reference Picture Buffer 134 and used for prediction of other frames.
As shown in FIG. 1A, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Accordingly, various in-loop processing is applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. In the High Efficiency Video Coding (HEVC) standard being developed, deblocking (DF) processing module 130, Sample Adaptive Offset (SAO) processing module 131 and Adaptive Loop Filter (ALF) processing module 132 have been developed to enhance picture quality. The in-loop filter information may have to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, in-loop filter information from SAO and ALF is provided to Entropy Encoder 122 for incorporation into the bitstream. In FIG. 1A, DF 130 is applied to the reconstructed video first; SAO 131 is then applied to DF-processed video (i.e., deblocked video); and ALF 132 is applied to SAO-processed video. However, the processing order among DF, SAO and ALF may be re-arranged.
A corresponding decoder for the encoder in FIG. 1A is shown in FIG. 1B. The video bitstream is decoded by Video Decoder 142 to recover the transformed and quantized residues, SAO/ALF information and other system information. At the decoder side, only Motion Compensation (MC) 113 is performed instead of ME/MC. The decoding process is similar to the reconstruction loop at the encoder side. The recovered transformed and quantized residues, SAO/ALF information and other system information are used to reconstruct the video data. The reconstructed video is further processed by DF 130, SAO 131 and ALF 132 to produce the final enhanced decoded video.
SAO processing adopted by HEVC consists of two methods. One is Band Offset (BO), and the other is Edge Offset (EO). BO is used to classify pixels into multiple bands according to pixel intensities and an offset is applied to pixels in each band. EO is used to classify pixels into categories according to relations between a current pixel and respective neighbors and an offset is applied to pixels in each category. In HM-4.0, a pixel can select 7 different SAO types including 2 BO groups (outer group and inner group), 4 EO directional patterns (0°, 90°, 135°, and 45°) and no processing (OFF). The four EO types are shown in FIG. 2.
Upon classification of all pixels in a picture or a region, one offset is derived and transmitted for pixels in each category. In HM-4.0, SAO processing is applied to luma and chroma components, and each of the luma components is independently processed. One offset is derived for all pixels of each category except for category 4 of EO, where Category 4 is forced to use zero offset. Table 1 below lists the EO pixel classification, where “C” denotes the pixel to be classified. As shown in Table 1, the conditions associated with determining a category are related to comparing the current pixel value with two respective neighbor values according to the EO type. The category can be determined according to the comparison results (i.e., “>”, “<” or “=”).
TABLE 1CategoryCondition0C < two neighbors1C < one neighbor && C == one neighbor2C > one neighbor && C == one neighbor3C > two neighbors4None of the above
In the HEVC reference software, deblocking filter processes a whole picture followed by SAO. Then, SAO processing is applied to the deblocked picture. This means that a frame buffer is necessary between the deblocking filter (DF) and SAO. FIG. 3A illustrates an example of software-based implementation, where a frame buffer 312 is used to store a picture processed by DF 310. SAO processing 314 then reads DF-processed data (i.e., deblocked data) from frame buffer 312. The frame buffer can be implemented using an external memory for hardware-based implementation. However, this would result in a high bandwidth overhead. On the other hand, an internal memory (i.e., on-chip memory) would result in higher chip cost.
For hardware-based implementation, system cost is a sensitive issue and neither the external frame memory nor the internal frame memory can offer an affordable solution. In addition, the high bandwidth associated with the external memory approach not only increases system design complexity, but also causes high power consumption. In conventional video coding systems, block-based processing such as motion estimation/compensation and DCT/IDCT has been using block-based processing. In block-based implementation, the picture may be partitioned into MBs (macroblocks) or LCUs (largest coding units). Picture processing is based on rows of LCUs/MBs or tiles, where a tile comprises Nx×Ny LCUs (or MBs), and Nx and Ny are positive integers. A hardware-based coding system incorporating DF and SAO is shown in FIG. 3B. An LCU buffer 322 is used to store LCUs processed by DF 320. Usually some LCUs in the boundary region of two LCU rows or two tiles need to be buffered due to data dependency associated with DF and SAO. SAO processing 324 then reads DF-processed LCU (i.e., deblocked LCU) and stores the output in an output buffer 326. The overhead associated with the block-based processing corresponds to video data associated with LCUs between any two neighboring block rows or two neighboring tiles to be buffered. Therefore, it is desirable to reduce the buffer requirement for an encoder or a decoder incorporating DF and SAO. In a conventional block-based system with pipeline structure, processing for a block in a current stage usually needs to be finished before the processing moves to the next stage. It is desirable to improve the efficiency of the pipeline processing.