Motion estimation is an effective inter-frame coding technique to exploit temporal redundancy in video sequences. Motion-compensated inter-frame coding has been widely used in various international video coding standards The motion estimation adopted in various coding standards is often a block-based technique, where motion information such as coding mode and motion vector is determined for each macroblock or similar block configuration. In addition, intra-coding is also adaptively applied, where the picture is processed without reference to any other picture. The inter-predicted or intra-predicted residues are usually further processed by transformation, quantization, and entropy coding to generate compressed video bitstream. During the encoding process, coding artifacts are introduced, particularly in the quantization process. In order to alleviate the coding artifacts, additional processing has been applied to reconstructed video to enhance picture quality in newer coding systems. The additional processing is often configured in an in-loop operation so that the encoder and decoder may derive the same reference pictures to achieve improved system performance. High Efficiency Video Coding (HEVC) is the new-generation international video coding standard developed under the Joint Collaborative Team on Video Coding (JCT-VC).
FIG. 1 illustrates an exemplary system block diagram for a video encoder based on High Efficiency Vide Coding (HEVC) using adaptive Inter/Intra prediction. In the system, a picture is divided into multiple non-overlapped largest coding units, or called coding tree blocks (CTBs). For Inter-prediction, Motion Estimation (ME)/Motion Compensation (MC, 112) is used to provide prediction data based on video data from other picture or pictures. Switch 114 selects Intra prediction data from Intra Prediction 110 or Inter-prediction data from ME/MC 112. The selected prediction data (136) is supplied to Adder 116 to be subtracted from the input video data in order to form prediction errors, also called residues. The prediction error is then processed by Transformation (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to form a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion, mode, and other information associated with the image area. The side information may also be subject to entropy coding to reduce required bandwidth. Accordingly, the data associated with the side information are provided to Entropy Encoder 122 as shown in FIG. 1. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
As shown in FIG. 1, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Accordingly, various in-loop processing is applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. In the High Efficiency Video Coding (HEVC) standard being developed, Deblocking Filter (DF) 130 and Sample Adaptive Offset (SAO) 131 have been developed to enhance picture quality. The in-loop filter information may have to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, in-loop filter information from SAO is provided to Entropy Encoder 122 for incorporation into the bitstream.
The current HEVC standard can only support the 4:0:0 and 4:2:0 picture sampling formats with a pixel depth equal to 8 bits or 10 bits for each color component. However, the range extension of the HEVC is being developed for emerging video coding applications at a high fidelity level such as UHDTV (Ultra-High Definition Television). The extended HEVC standard is expected to be able to further support YUV4:2:2, YUV4:4:4 and RGB4:4:4 picture formats, and the pixel depth can further support 12 bits and 16 bits for each color component.
In the HEVC standard, the sample-adaptive offset (SAO) processing is utilized to reduce the distortion of reconstructed pictures. The SAO processing is performed after deblocking filtering (DF) and is part of the non-deblock filtering (NDFs) operation in. FIG. 2 illustrates a system block diagram of an exemplary HEVC-based decoder including deblocking filter (DF) and SAO. Since the encoder also contains a local decoder for reconstructing the video data, some decoder components are also used in the encoder. For a decoder, entropy decoder 222 is used to parse and recover the coded syntax elements related to residues, motion information and other control data. The switch 214 selects intra-prediction or inter-prediction and the selected prediction data are supplied to reconstruction (REC) 228 to be combined with recovered residues. Besides performing entropy decoding on compressed video data, entropy decoding 222 is also responsible for entropy decoding of side information and provides the side information to respective blocks. For example, intra mode information is provided to intra-prediction 210, inter mode information is provided to motion compensation 212, sample-adaptive offset information is provided to SAO 232 and residues are provided to inverse quantization 224. The residues are processed by IQ 224, IT 226 and subsequent reconstruction process to reconstruct the video data. Again, reconstructed video data from REC 228 undergo a series of processing including IQ 224 and IT 226 as shown in FIG. 2 and are subject to intensity shift. The reconstructed video data are further processed by deblocking filter (DF) 230 and sample adaptive offset (SAO) 232.
The concept of SAO is to classify the reconstructed pixels into categories according to their neighboring pixel values. Each category is then assigned an offset value coded in the bitstream and the distortion of the reconstructed signal is reduced by adding the offset to the reconstructed pixels in each category. In the HEVC standard, the SAO tool supports two kinds of pixel classification methods: band offset (BO) and edge offset (EO).
For BO, the reconstructed pixels are classified into bands by quantizing the pixel magnitude, as shown in FIG. 3. An offset value can then be derived for each band to reduce the distortion of the reconstructed pixels in the band. A group of offsets identified by the starting band position are selected and coded into the bitstream. For each color component (luma or chroma), the SAO algorithm can divide a picture into non-overlapped regions, and each region can select one SAO type among BO (with starting band position), four EO types (classes), and no processing (OFF). The SAO partitioning can be aligned with the CTB boundaries to facilitate the CTB-based processing. The total number of offset values in one picture depends on the number of region partitions and the SAO type selected by each region.
For EO, the reconstructed pixels are classified into categories by comparing the current pixel with its neighboring pixels along the direction identified by the EO type as shown in FIG. 4. Table 1 lists the decision for the EO pixel classification according to HEVC, where “c” denotes a current pixel to be classified. The category index, cat_idx, according to the existing HEVC standard for the current pixel “c” is determined by:
                              cat_idx          =                                    sign              ⁡                              (                                  c                  ⁢                                      -                                    ⁢                                      c                    1                                                  )                                      +                          sign              ⁡                              (                                  c                  ⁢                                      -                                    ⁢                                      c                                          -                      1                                                                      )                                      +            2                          ,        and                            (        1        )                                          sign          ⁡                      (            x            )                          =                  {                                                                                          1                    ;                                                                                        x                    >                    0                                                                                                                    0                    ;                                                                                        x                    =                    0                                                                                                                                          -                      1                                        ;                                                                                        x                    <                    0                                                                        ,                                              (        2        )            where “c1” and “c−1” are the neighboring pixels corresponding to a given EO type as shown in FIG. 4. The four EO types with selections of neighboring pixels for different orientations are also shown in FIG. 4. An offset value is derived for all pixels in each category. Four offset values corresponding to category indices 1 through 4 respectively, are coded for one coding tree block (CTB) in HEVC.
TABLE 1CategoryCondition1C < two neighbors2C < one neighbor && C == one neighbor3C > one neighbor && C == one neighbor4C > two neighbors0None of the above
The category of EO classification has some physical sense related to the three consecutive samples. As shown in FIG. 5, the scenarios of three neighboring pixels are shown for corresponding categories. For example, category 1 corresponds to a valley and category 4 corresponds to a peak.
In HEVC, a picture is divided into multiple non-overlapped Coding Tree Units (CTUs), each CTU consists of multiple CTBs and each CTB is for one color component. Each CTB can select no processing (SAO-off) or apply one of SAO types or classes (i.e., BO with starting band position index, 0-degree EO, 90-degree EO, 135-degree EO, and 45-degree EO). To further reduce side-information, SAO parameters of a current CTB can reuse those of its upper or left CTB by using Merge syntax as shown in FIG. 6. SAO syntax consists of sao_merge_left_flag, sao_merge_up_flag, sao_type_idx_luma, sao_type_idx_chroma, sao_eo_class_luma, sao_eo_class_chroma, sao_band_position, sao_offset_abs, and sao_offset_sign. Syntax sao_merge_left_flag indicates that the current CTB reuses the parameters of left CTB. The syntax sao_merge_up_flag represents that the current CTB reuses the parameters of upper CTB. The syntax sao_type_idx represents the selected SAO type (i.e., sao_type_idx_luma and sao_type_idx_chroma for luma component and chroma component respectively). The syntax sao_eo_class_luma and sao_eo_class_chroma represent the selected EO type for luma and chroma respectively. The syntax sao_band_position represents the starting band position of the selected bands. cIdx indicates one of three color components. Furthermore, the SAO processing is applied separately to different color components of the video data. The color components may correspond to (Y, Cb, Cr), (Y, U, V) or (R, G, B).
The syntax sao_offset_abs represents the offset magnitude and the syntax sao_offset_sign represents the offset sign. The offset value, SaoOffsetVal, is determined according to:SaoOffsetVal=offsetSign*sao_offset_abs<<(bitDepth−Min(bitDepth,10)),  (3)where bitDepth is the number of the bits used for each component of a raw pixel, offsetSign is equal to −1 when sao_offset_sign is 1, and equal to 1 otherwise. The syntax sao_offset_abs is entropy coded according to the existing HEVC standard using truncated Rice (TR) binarization process with the parameters given bycMax=(1<<(Min(bitDepth,10)−5))−1,  (4)with Rice parameter, cRiceParam equal to 0. The Truncated Rice code is well known in the field for video coding. The TR codes comprise a prefix part represented by truncated unary (TU) codes and a remainder part represented by fixed-length codewords without truncation.
FIG. 7 illustrates the coding process for the CTU-level SAO information when the current CTU is not merged with the left or above CTU. Note that EO classes and band position is a kind of sub-class or sub-type to describe the SAO type information.
As shown in FIG. 7, the SAO type decision is made in step 710. If it is an EO type, the unsigned luma offsets (712) and luma EO class (714) are coded in the bitstream. If the SAO type is BO, signed luma offset (716) and luma band position (718) are coded in the bitstream. If SAO type is off, no other SAO information is signaled and the process goes to step 720. Similar SAO information for the chroma components follow. If the chroma components select EO, the unsigned Cb offsets (722), chroma EO class (724) and unsigned Cr offsets (726) are signaled. If the chroma components select BO, the signed Cb offsets (732), Cb band position (734), signed Cr offsets (736) and Cb band position (738) are signaled. The SAO type is off for the chroma components, no other SAO information is signaled and the process goes to step 740.
While the SAO process in the existing HEVC standard is able to improve performance by adaptively compensate the local intensity offset, it is desirable to further improve the performance whenever possible in order to achieve an overall efficiency target.