Motion estimation is an effective Inter-frame coding technique to exploit temporal redundancy in video sequences. Motion-compensated Inter-frame coding has been widely used in various international video coding standards The motion estimation adopted in various coding standards is often a block-based technique, where motion information such as coding mode and motion vector is determined for each macroblock or similar block configuration. In addition, Intra-coding is also adaptively applied, where the picture is processed without reference to any other picture. The Inter-predicted or Intra-predicted residues are usually further processed by transformation, quantization, and entropy coding to generate a compressed video bitstream. During the encoding process, coding artifacts are introduced, particularly in the quantization process. In order to alleviate the coding artifacts, additional processing has been applied to reconstructed video to enhance picture quality in newer coding systems. The additional processing is often configured in an in-loop operation so that the encoder and decoder may derive the same reference pictures to achieve improved system performance.
FIG. 1A illustrates an exemplary adaptive Inter/Intra video coding system incorporating in-loop processing. For Inter-prediction, Motion Estimation (ME)/Motion Compensation (MC) 112 is used to provide prediction data based on video data from other picture or pictures. Switch 114 selects Intra Prediction 110 or Inter-prediction data and the selected prediction data is supplied to Adder 116 to form prediction errors, also called residues. The prediction error is then processed by Transformation (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to form a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion, mode, and other information associated with the image area. The side information may also be subject to entropy coding to reduce required bandwidth. Accordingly, the data associated with the side information are provided to Entropy Encoder 122 as shown in FIG. 1A. When an Inter-prediction mode is used, a reference picture or pictures have to be reconstructed at the encoder end as well. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the residues. The residues are then added back to prediction data 136 at Reconstruction (REC) 128 to reconstruct video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
As shown in FIG. 1A, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to a series of processing. Accordingly, various in-loop processing is applied to the reconstructed video data before the reconstructed video data are stored in the Reference Picture Buffer 134 in order to improve video quality. In the High Efficiency Video Coding (HEVC) standard being developed, Deblocking Filter (DF) 130, Sample Adaptive Offset (SAO) 131 and Adaptive Loop Filter (ALF) 132 have been developed to enhance picture quality. The in-loop filter information may have to be incorporated in the bitstream so that a decoder can properly recover the required information. Therefore, in-loop filter information from SAO and ALF is provided to Entropy Encoder 122 for incorporation into the bitstream. In FIG. 1A, DF 130 is applied to the reconstructed video first; SAO 131 is then applied to DF-processed video; and ALF 132 is applied to SAO-processed video. However, the processing order among DF, SAO and ALF can be re-arranged. The system in FIG. 1A may correspond to the High Efficiency Video Coding (HEVC) system (except for the ALF) or AVS2, which is a video coding standard developed by the Audio and Video Coding Standard Workgroup of China. The ALF process has been evaluated during HEVC development. However, ALF is not adopted in the current HEVC standard.
FIG. 1B illustrates a system block diagram of a corresponding video decoder including deblocking filter, sample adaptive offset and adaptive loop filter. Since the encoder also contains a local decoder for reconstructing the video data, some decoder components are already used in the encoder except for the entropy decoder 142. Furthermore, only motion compensation 144 is required for the decoder side. The switch 146 selects Intra-prediction or Inter-prediction and the selected prediction data are supplied to reconstruction (REC) 128 to be combined with recovered residues. Besides performing entropy decoding on compressed video data, entropy decoding 142 is also responsible for entropy decoding of side information and provides the side information to respective blocks. For example, Intra mode information is provided to Intra-prediction 110, Inter mode information is provided to motion compensation 144, adaptive offset information is provided to SAO 131, adaptive loop filter information is provided to ALF 132 and residues are provided to inverse quantization 124. The residues are processed by IQ 124, IT 126 and subsequent reconstruction process to reconstruct the video data. Again, reconstructed video data from REC 128 undergo a series of processing including IQ 124 and IT 126 as shown in FIG. 1B and are subject to intensity shift. The reconstructed video data are further processed by DF 130, SAO 131 and ALF 132.
The coding process in HEVC is applied according to Largest Coding Unit (LCU), also called Coding Tree Unit (CTU). The LCU is adaptively partitioned into coding units using quadtree. In HEVC, the DF is applies to 8×8 block boundaries. For each 8×8 block, horizontal filtering across vertical block boundaries is first applied, and then vertical filtering across horizontal block boundaries is applied. FIG. 2A illustrates an example of DF processing for the luma component in HEVC, where block boundary 210 with 4 boundary pixels on each side of the block boundary are involved. The boundary may correspond to a vertical boundary or a horizontal boundary. The boundary pixels are designated as q0, q1, q2 and q3, and p0, p1, p2 and p3, where q0 and p0 are two pixels immediately adjacent to the boundary. During processing of a luma block boundary, 4 pixels of each side are involved in filter parameter derivation, and up to 3 pixels on each side (i.e., p0, p1, p2 or q0, q1, q2) can be modified after filtering. For horizontal filtering across vertical block boundaries, unfiltered reconstructed pixels are used for filter parameter derivation and are used as source pixels for filtering. For vertical filtering across horizontal block boundaries, DF processed intermediate pixels (i.e. pixels after horizontal filtering) are used for filter parameter derivation and also used as source pixel for filtering. The DF processing for the chroma component in HEVC, 2 boundary pixels on each side of the block boundary are involved and only 1 pixel may be modified (i.e., p0 or q0).
FIG. 2B illustrates an example of DF processing for the luma component in AVS2, where block boundary 220 with 3 boundary pixels on each side of the block boundary are involved. The boundary pixels are designated as q0, q1 and q2, and p0, p1 and p2, where q0 and p0 are two pixels immediately adjacent to the boundary. For DF processing of a chroma block boundary, two pixels of each side are involved in filter parameter derivation. For AVS2, the DF processing may modify all involved boundary pixels. In other words, 3 luma pixels and 2 chroma pixels on each size of the block boundary may be modified.
Sample adaptive offset (SAO) types according to HEVC and AVS2 are shown in FIG. 3, where four SAO types are used corresponding to four orientations at 0°, 90°, 135°, and 45°. SAO is a per-pixel in-loop filtering. SAO parameters are updated for each LCU or CTU. For SAO orientation type, pixel classification is first done to classify pixels into different groups (also called categories or classes) as according to the classification conditions shown in Table 1. After classification, each reconstructed and DF processed pixel is compensated by an offset value based on the orientation type selected and the classification result.
TABLE 1CategoryCondition1C < two neighbors2C < one neighbor && C == one neighbor3C > one neighbor && C == one neighbor4C > two neighbors0None of the above
The conditions for the SAO classification as shown in Table 1 can be implemented by comparing the center pixel with two neighboring pixels individually. The conditions for classification checks whether the center pixel is greater than, smaller than or equal to one of the neighboring pixels. The three comparison results may be represented by a 2-bit data for each comparison result.
The SAO parameters such as pixel offset values and SAO types can be determined adaptively for each CTU. For HEVC, the SAO parameter boundary is the same as the CTU boundary. Within the parameter boundary, SAO process for all pixels share the same SAO types and offset values. Since SAO is applied to DF processed pixels, the SAO process for a current CTU has to wait for the DF process to complete for the current CTU. However, the pixels around the CTU boundary cannot be processed by DF until the reconstructed video data around the CTU boundary on the other side of the CTU boundary are ready. Due to such data dependency, AVS2 adopted shifted SAO parameter boundaries. FIG. 4 illustrates an example of SAO parameter boundary shift according to the AVS2 standard. The SAO parameter boundary example 410 corresponds to the HEVC case, where the SAO parameter boundary is aligned with the CTU boundary. The SAO parameter boundary 420 corresponds to the AVS2 case, where the SAO parameter boundary is shifted left and up with respect to the CTU boundary by xS and yS respectively. In particular, AVS2 uses xS=4 and yS=4.
Adaptive Loop Filtering (ALF) 132 is a video coding tool to enhance picture quality. ALF has been evaluated during the development stage of HEVC. However, ALF is not adopted in the current HEVC standard. Nevertheless, it is being incorporated into AVS2. In particular, a 17-tap symmetric ALF filter is being used for AVS2 as shown in FIG. 5. The 17-tap symmetric ALF filter implies that the filter operation for a current pixel may require data from 3 following lines. When these lines are from another CTU, particular the CTU in a following CTU row, the ALF process has to be delayed till the following related data are available. This implies the need for line buffer to temporarily store the related data in the current CTU for later processing. In order to overcome this data dependency issue, AVS2 adopts ALF virtual boundary to restrict ALF processing not to cross the virtual boundary. FIG. 6 illustrates an example of ALF virtual boundary for the luma component according to AVS2, where the ALF processing for selected pixels (i.e., a, b, c and d) are shown. Line 610 represents the CTU boundary between CTU X and CTU Y. Line 620 represents the luma ALF virtual boundary, which is located at 4 lines (i.e., yC-4) above the CTU boundary (i.e., yC) according to AVS2. For the chroma component, the ALF virtual boundary is located 3 lines (i.e., yC-3) above the CTU boundary according to AVS2 (Information Technology—Advanced Media Coding Part 2: Video Final Committee Draft, Audio and Video Coding Standard Workgroup of China, Feb. 7, 2015, Document: N2120.D3). For pixels a, b and c, the ALF process is applied during the CTU X processing stage. Furthermore, the ALF process for pixels a, b and c only uses information above the virtual boundary. For pixel d below the virtual boundary, the ALF process is applied during the CTU Y processing stage and only uses information below the virtual boundary. The use of virtual boundary to restrict data dependency can help to reduce the requirement on the line buffer capacity.
As mentioned above, the DF, SAO and ALF process involves neighboring data. In HEVC and AVS2, CTU has been used as a unit for coding process. When the DF, SAO and ALF processes are applied to data across a CTU boundary, the data dependency has to be managed carefully to minimize line buffer. Since the DF, SAO and ALF processes are applied to each CTU sequentially, the corresponding hardware implementation may be arranged in a pipeline fashion. FIG. 7 illustrates an example of data dependency associated with the DF, SAO and ALF processes for an AVS2 decoder. The CTU based processing order 700 is shown in FIG. 7 and the CTU boundary between CTU X and CTU Y is indicated by reference number 705. As shown in FIG. 7, the reconstructed video from reconstruction block 710 is processed by DF 720, SAO 730 and ALF 740. The output from ALF 740 is stored in a decoded frame buffer.
The processing status for corresponding DF 720, SAO 730 and ALF 740 processes are indicated by respective reference numbers 725, 735 and 745. Diagram 725 illustrates the DF processing status at the end of DF processing stage for CTU X. Luma pixels above line 722 and chroma pixels above line 724 are DF processed. Luma pixels blow line 722 and chroma pixels below line 732 cannot be processed during DF processing stage for CTU X since involved pixels on the other side of block boundary (i.e., below CTU boundary 705) are not available yet. Diagram 735 illustrates the SAO processing status at the end of SAO processing stage for CTU X. Luma pixels above line 732 and chroma pixels above line 734 are SAO processed, where line 732 and line 734 are aligned. Diagram 745 illustrates the ALF processing status at the end of ALF processing stage the CTU X. Again, the luma pixels below line 732 and the chroma pixels below line 734 cannot be processed by SAO for CTU X yet since it involves SAO parameter signaled in the CTU Y, which is not yet processed by VLD. Luma pixels above line 742 (luma ALF virtual boundary) are ALF processed. Chroma pixels above line 744 (chroma ALF virtual boundary) would be ALF processed according to the AVS2 draft standard. Nevertheless, the ALF process for the chroma component cannot be performed for chroma lines A through D during the CTU X processing stage. For example, the ALF process for pixel 746 will use pixel 748. Since chroma pixel 748 is below the chroma SAO parameter boundary 734, chroma pixel 748 is not SAO processed yet for the CTU X processing stage. Therefore, even though it is above the chroma ALF virtual boundary, chroma pixel 746 cannot be ALF processed. Accordingly, 6 lines of chroma SAO processed lines above pixel 748 (i.e., above line D) have to be stored in buffer for later ALF process on lines A through D during the CTU Y processing stage, wherein the three lines above line A have been ALF processed in the CTU X processing stage but also being required by the ALF process on line A.
For hardware based implementation, the 6 lines of chroma samples with picture width have to be stored in line buffer, which is usually implemented using embedded memory and such implementation would result in high chip cost. Therefore, it is desirable to develop a method and apparatus that can reduce the required line buffer associated with DF, SAO and ALF processes. Furthermore, for different SAO parameter boundaries, the system will switch between different SAO parameters. This will increase system complexity and power consumption. Therefore, it is desirable to develop DF, SAO and ALF processes with proper system parameter design to reduce line buffer requirement, system complexity, system power consumption, or any combination thereof. In yet another aspect, it is desirable to develop method and apparatus for performance and cost efficient loop filter processing including DF, SAO and ALF for any video coding system incorporating such loop filter processing.