In video coding, various technologies have been developed to improve the coding efficiency by reducing the compressed video bitrate with comparable video quality. With more and more demands for higher resolution video (high-definition and beyond HD resolutions), there are increasing needs of even higher efficiency in video coding than previous standards. This leads to the standard development of High-Efficiency Video Coding (HEVC). HEVC incorporates many video coding tools to substantially improve video coding efficiency and meet the requirement of diverse applications. A typical HEVC decoder employs processes including entropy decoding, inverse scaling and quantization, inverse transformation, intra-picture prediction, inter-picture prediction, and in-loop filters.
The Deblocking Filter (DF) and Sample Adaptive Offset (SAO) filter are in-loop filters used by HEVC. An SAO filter adds an adaptive offset to the reconstructed pixel to compensate for distortions in reconstruction.
FIG. 1 illustrates an exemplary of HEVC decoding system incorporating sample adaptive offset (SAO) which is used in HEVC Test Model 7.0 (HM-7.0). Bitstream from coding unit is decoded by entropy decoding 110, output of which includes intra mode information which is fed into intra prediction 111, inter mode information which is fed into motion compensation (MC) 112, adaptive loop filter information which is fed into adaptive loop filter 133, sample adaptive offset information which is fed into sample adaptive offset 132, and residues which are fed into inverse quantization (IQ) 120. For intra prediction, intra prediction data is provided by intra prediction 111 based on intra mode information from entropy decoding 110. For inter-prediction, motion compensation (MC) 112 is used to provide reference picture data based on inter mode information and previously reconstructed video data from other picture or pictures. Either intra prediction data or inter-prediction data depending on the mode is provided to reconstruction (REC) 122 for the reconstruction of video data. The residues of entropy decoded data are processed by inverse quantization (IQ) 120 followed by inverse transformation (IT) 121 to recover the residues. The recovered residues are then supplied to REC 122 combining with predicted data to reconstruct the video data. The reconstructed video data from REC will be used for reconstruction of subsequent blocks in the same picture (intra mode) or reconstruction of other pictures (inter mode). For inter mode, the reconstructed video data is stored in reference picture buffer 133. However, loop filters are usually applied to the reconstructed video data before it is stored. In FIG. 1, the reconstructed video data is filtered by three filters which are respectively the deblocking filter (DF) 130, the SAO 131 and the adaptive loop filter (ALF) 132. DF is applied to reconstructed video data first. SAO 131 is then applied to deblocked video data from DF 130. Sample adaptive offset information from entropy decoding 110 is provided to SAO 131 for proper SAO operation. ALF 132 is applied to processed video data from SAO 131. Adaptive loop filter information from entropy decoding is applied to ALF 132 for proper ALF operation. The processed reconstructed video data from ALF 132 is then stored in reference picture buffer 133 and used by MC 112 to generate reference picture for prediction of other frames.
As shown in FIG. 1, in the High Efficiency Video Coding (HEVC) decoder, three in-loop filters, DF, SAO and ALF are applied to the reconstructed video data to improve the quality of the reconstructed video data. The DF 130 is applied to boundary pixels and the DF processing is dependent on the underlying pixel data of the reconstructed video data and coding information associated with corresponding blocks. On the other hand, the SAO and ALF processing are adaptive, where filter information such as filter parameters and filter type may be dynamically determined by analysis of the underlying video data at the encoder side. Therefore, filter information associated with SAO and ALF is incorporated in the video bitstream so that the decoder can properly recover the required information for SAO and ALF. During decoding, filter information is decoded and provided to respective SAO and ALF for proper operation.
The decoding process as well as coding process in HEVC is applied according to Largest Coding Unit (LCU). The LCU is adaptively partitioned into coding units using a quadtree. In each leaf CU, DF filtering is first applied to boundary pixels of each block. Then following DF, SAO filtering is applied to all applicable pixels for each block. In HEVC Test Model Version 7.0 (HM-7.0), DF applies to block boundaries of each 8×8 block. For each 8×8 block, horizontal filtering across vertical block boundaries is first applied, and then vertical filtering across horizontal block boundaries is applied. FIG. 2A illustrates an example of a vertical block boundary 210 with 4 boundary pixels on each side of the block boundary. The boundary pixels are designated as q0, q1, q2 and q3, and p0, p1, p2 and p3, where q0 and p0 are two pixels immediately adjacent to the vertical boundary. FIG. 2B illustrates an example of a horizontal block boundary 220 with 4 boundary pixels on each side of the block boundary. Again, the boundary pixels are designated as q0, q1, q2 and q3, and p0, p1, p2 and p3, where q0 and p0 are two pixels immediately adjacent to the horizontal boundary. For each picture, boundary pixel rows across one or more vertical boundaries can be horizontally filtered in parallel to improve processing speed. After horizontal filtering across vertical boundaries, boundaries pixel columns across one or more horizontal boundaries can be vertically filtered in parallel.
Sample adaptive offset (SAO) is also adopted in HM-7.0, as shown in FIG. 1. SAO is a per-pixel in-loop filtering. SAO can divide one picture into multiple LCU-aligned regions, and for each region one SAO type is determined to be one of the following types: two Band Offset (BO) types, four Edge Offset (EO) types, and no processing (OFF). Then for each SAO type, different type of filtering method should be applied. For BO type, each to-be-processed pixel is mapped into a band based on the pixel's intensity. The full range of pixel intensity is equally divided into 32 bands. One offset is derived for all pixels of each band, and the offsets are selected and coded. For EO type, pixel classification is first done to classify pixels into different groups (also called categories or classes). The pixel classification for each pixel is based on a calculation of gradient using a 3×3 window, as shown in FIG. 3 where four configurations corresponding to 0°, 90°, 135°, and 45° are used for classification.
Upon classification of all pixels in a picture or a region, one offset is derived and transmitted for each group of pixels. In HM-7.0, SAO is applied to Luma and Chroma components, and each of the Luma components is independently processed. One offset is derived for all pixels of each category except for category 4 of EO, where Category 4 is forced to use zero offset. Table 1 below lists the EO pixel classification, where “C” denotes the pixel to be classified.
TABLE 1CategoryCondition0C < two neighbors1C < one neighbor && C == one neighbor2C > one neighbor && C == one neighbor3C > two neighbors4None of the above
In HEVC system, SAO can substantially enhance coding efficiency. However, SAO involves multiple neighboring pixel data when calculating the gradient for each pixel. For example, for the EO type, the calculation of the gradient in SAO is based on a 3 by 3 window of pixels with the to-be-processed pixel in the center. Due to this neighboring pixel referencing, SAO needs to buffer decoded video data of neighboring lines in the decoding system. This additional line buffer needs to be implemented as additional internal memory or an external memory. In the HEVC standard, the LCUs in a picture may be divided in tiles so that the picture can be processed in a tile by tile fashion. The LCUs in a picture may also be divided into LCU rows for LCU row based processing. The boundaries between tiles or between LCU rows may require larger size of the line buffer for SAO processing. Additional internal memory or external memory results in higher hardware cost of the decoding system. Therefore, it is desirable to reduce the required data size for SAO processing across tile boundaries or LCU-row boundaries.