Motion estimation is an effective inter-frame coding technique to exploit temporal redundancy in video sequences. Motion-compensated inter-frame coding has been widely used in various international video coding standards. The motion estimation adopted in various coding standards is often a block-based technique, where motion information such as coding mode and motion vector is determined for each macroblock or similar block configuration. In addition, intra-coding is also adaptively applied, where the picture is processed without reference to any other picture. The inter-predicted or intra-predicted residues are usually further processed by transformation, quantization, and entropy coding to generate a compressed video bitstream. During the encoding process, coding artifacts are introduced, particularly in the quantization process. In order to alleviate the coding artifacts, additional processing has been applied to reconstructed video to enhance picture quality in newer coding systems. The additional processing is often configured in an in-loop operation so that the encoder and decoder may derive the same reference pictures to achieve improved system performance.
FIG. 1 illustrates an exemplary adaptive inter/intra video coding system incorporating in-loop filtering process. For inter-prediction, Motion Estimation (ME)/Motion Compensation (MC) 112 is used to provide prediction data based on video data from other picture or pictures. Switch 114 selects Intra Prediction 110 or inter-prediction data from ME/MC 112 and the selected prediction data is supplied to Adder 116 to form prediction errors, also called prediction residues or residues. The prediction error is then processed by Transformation (T) 118 followed by Quantization (Q) 120. The transformed and quantized residues are then coded by Entropy Encoder 122 to form a video bitstream corresponding to the compressed video data. The bitstream associated with the transform coefficients is then packed with side information such as motion, mode, and other information associated with the image unit. The side information may also be processed by entropy coding to reduce required bandwidth. Accordingly, the side information data is also provided to Entropy Encoder 122 as shown in FIG. 1 (the motion/mode paths to Entropy Encoder 122 are not shown). When the inter-prediction mode is used, a previously reconstructed reference picture or pictures have to be used to form prediction residues. Therefore, a reconstruction loop is used to generate reconstructed pictures at the encoder end. Consequently, the transformed and quantized residues are processed by Inverse Quantization (IQ) 124 and Inverse Transformation (IT) 126 to recover the processed residues. The processed residues are then added back to prediction data 136 by Reconstruction (REC) 128 to reconstruct the video data. The reconstructed video data may be stored in Reference Picture Buffer 134 and used for prediction of other frames.
As shown in FIG. 1, incoming video data undergoes a series of processing in the encoding system. The reconstructed video data from REC 128 may be subject to various impairments due to the series of processing. Accordingly, various loop processing is applied to the reconstructed video data before the reconstructed video data is used as prediction data in order to improve video quality. In the High Efficiency Video Coding (HEVC) standard being developed, Deblocking Filter (DF) 130, Sample Adaptive Offset (SAO) 131 and Adaptive Loop Filter (ALF) 132 have been developed to enhance picture quality. The Deblocking Filter (DF) 130 is applied to boundary pixels and the DF processing is dependent on the underlying pixel data and coding information associated with corresponding blocks. There is no DF-specific side information needs to be incorporated in the video bitstream. On the other hand, the SAO and ALF processing are adaptive, where filter information such as filter parameters and filter type may be dynamically changed according to underlying video data. Therefore, filter information associated with SAO and ALF is incorporated in the video bitstream so that a decoder can properly recover the required information. Therefore, filter information from SAO and ALF is provided to Entropy Encoder 122 for incorporation into the bitstream. In FIG. 1, DF 130 is applied to the reconstructed video first; SAO 131 is then applied to DF-processed video; and ALF 132 is applied to SAO-processed video. However, the processing order among DF, SAO and ALF may be re-arranged. In H.264/AVC video standard, the adaptive filters only include DF. In the High Efficiency Video Coding (HEVC) video standard being developed, the loop filtering process includes DF, SAO and ALF. In this disclosure, in-loop filter refers to loop filter processing that operates on underlying video data without the need of side information incorporated in video bitstream. On the other hand, adaptive filter refers to loop filter processing that operates underlying video data adaptively using side information incorporated in video bitstream. For example, deblocking is considered as an in-loop filter while SAO and ALF are considered as adaptive filters.
A corresponding decoder for the encoder of FIG. 1 is shown in FIG. 2. The video bitstream is decoded by Entropy Decoder 142 to recover the processed (i.e., transformed and quantized) prediction residues, SAO/ALF information and other system information. At the decoder side, only Motion Compensation (MC) 113 is performed instead of ME/MC. The decoding process is similar to the reconstruction loop at the encoder side. The recovered transformed and quantized prediction residues, SAO/ALF information and other system information are used to reconstruct the video data. The reconstructed video is further processed by DF 130, SAO 131 and ALF 132 to produce the final enhanced decoded video, which can be used as decoder output for display and is also stored in the Reference Picture Buffer 134 to form prediction data.
The coding process in H.264/AVC is applied to 16×16 processing units or image units, called macroblocks (MB). The coding process in HEVC is applied according to Largest Coding Unit (LCU). The LCU is adaptively partitioned into coding units using quadtree. In each image unit (i.e., MB or leaf CU), DF is performed on the basis of 8×8 blocks for the luma component (4×4 blocks for the chroma component) and deblocking filter is applied across 8×8 luma block boundaries (4×4 block boundaries for the chroma component) according to boundary strength. In the following discussion, the luma component is used as an example for loop filter processing. However, it is understood that the loop processing is applicable to the chroma component as well. For each 8×8 block, horizontal filtering across vertical block boundaries is applied first, and then vertical filtering across horizontal block boundaries is applied. During processing of a luma block boundary, four pixels of each side are involved in filter parameter derivation, and up to three pixels on each side can be changed after filtering. For horizontal filtering across vertical block boundaries, pre-in-loop video data (i.e., unfiltered reconstructed video data or pre-DF video data in this case) is used for filter parameter derivation and also used as source video data for filtering. For vertical filtering across horizontal block boundaries, pre-in-loop video data (i.e., unfiltered reconstructed video data or pre-DF video data in this case) is used for filter parameter derivation, and DF intermediate pixels (i.e. pixels after horizontal filtering) are used for filtering. For DF processing of a chroma block boundary, two pixels of each side are involved in filter parameter derivation, and at most one pixel on each side is changed after filtering. For horizontal filtering across vertical block boundaries, unfiltered reconstructed pixels are used for filter parameter derivation and as source pixels for filtering. For vertical filtering across horizontal block boundaries, DF processed intermediate pixels (i.e. pixels after horizontal filtering) are used for filter parameter derivation and also are used as source pixel for filtering.
The DF process can be applied to the blocks of a picture. In addition, DF process may also be applied to each image unit (e.g., MB or LCU) of a picture. In the image-unit based DF process, the DF process at the image unit boundaries depends on data from neighboring image units. The image units in a picture are usually processed in a raster scan order. Therefore, data from an upper or left image unit is available for DF processing on the upper side and left side of the image unit boundaries. However, for the bottom or right side of the image unit boundaries, the DF processing has to be delayed until the corresponding data becomes available. The data dependency issue associated with DF complicates system design and increase system cost due to data buffering of neighboring image units.
In a system with subsequent adaptive filters, such as SAO and ALF that operate on data processed by in-loop filter (e.g., DF), the additional adaptive filter processing further complicates system design and increases system cost/latency. For example, in HEVC Test Model Version 4.0 (HM-4.0), SAO and ALF are applied adaptively, which allow SAO parameters and ALF parameters to be adaptively determined for each picture (“WD4: Working Draft 4 of High-Efficiency Video Coding”, Bross et. al., Joint Collaborative Team on Video Coding (JCT-VC) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 6th Meeting: Torino, IT, 14-22 Jul. 2011, Document: JCTVC-F803). During SAO processing of a picture, SAO parameters of the picture are derived based on DF output pixels and the original pixels of the picture, and then SAO processing is applied to the DF-processed picture with the derived SAO parameters. Similarly, during the ALF processing of a picture, ALF parameters of the picture are derived based on SAO output pixels and the original pixels of the picture, and then the ALF processing is applied to the SAO-processed picture with the derived ALF parameters. The picture-based SAO and ALF processing require frame buffers to store a DF-processed frame and an SAO-processed frame. Such systems will incur higher system cost due to the additional frame buffer requirement and also suffer long encoding latency.
FIG. 3 illustrates a system block diagram corresponding to an encoder based on the sequential SAO and ALF processes at an encoder side. Before SAO 320 is applied, the SAO parameters have to be derived as shown in block 310. The SAO parameters are derived based on DF-processed data. After SAO is applied to DF-processed data, the SAO-processed data is used to derive the ALF parameters as shown in block 330. Upon the determination of the ALF parameters, ALF is applied to the SAO-processed data as shown in block 340. As mentioned before, frame buffers are required to store DF output pixels for the subsequent SAO processing since the SAO parameters are derived based on a whole frame of DF-processed video data. Similarly, frame buffers are also required to store SAO output pixels for subsequent ALF processing. These buffers are not shown explicitly in FIG. 3. In more recent HEVC development, LCU-based SAO and ALF are used to reduce the buffer requirement as well as to reduce encoder latency. Nevertheless, the same processing flow as shown in FIG. 3 is used for LCU-based loop processing. In other words, the SAO parameters are determined from DF output pixels and the ALF parameters are determined from SAO output pixels on an LCU by LCU basis. As discussed earlier, the DF processing for a current LCU cannot be completed until required data from neighboring LCUs (the LCU below and the LCU to the right) becomes available. Therefore, the SAO processing for a current LCU will be delayed by about one picture-row worth of LCUs and a corresponding buffer is needed to store the one picture-row worth of LCUs. There is a similar issue for the ALF processing.
For LCU-based processing, the compressed video bitstream is structured to ease decoding process as shown in FIG. 4 according to HM-5.0. The bitstream 400 corresponds to compressed video data of one picture region, which may be a whole picture or a slice. The bitstream 400 is structured to include a frame header 410 (or a slice header if slice structure is used) for the corresponding picture followed by compressed data for individual LCUs in the picture. Each LCU data comprises an LCU header 410 and LCU residual data. The LCU header is located at the beginning of each LCU bitstream and contains information common to the LCU such as SAO parameters and ALF control information. Therefore, a decoder can be properly configured according to information embedded in the LCU header before decoding of the LCU residues starts, which can reduce the buffering requirement at the decoder side. However, it is a burden for an encoder to generate a bitstream compliant with the bitstream structure of FIG. 4 since the LCU residues may have to be buffered until the header information to be incorporated in the LCU header is ready.
As shown in FIG. 4, the LCU header is inserted in front of the LCU residual data. The SAO parameters for the LCU are included in the LCU header. The SAO parameters for the LCU are derived based on the DP-processed pixels of the LCU. Therefore, the DP-processed pixels of the whole LCU have to be buffered before the SAO processing can be applied to the DF-processed data. Furthermore, the SAO parameters include SAO filter On/Off decision regarding whether SAO is applied to the current LCU. The SAO filter On/Off decision is derived based on the original pixel data for the current LCU and the DF-processed pixel data. Therefore, the original pixel data for the current LCU also has to be buffered. When an On decision is selected for the LCU, the SAO filter type, i.e., either Edge Offset (EO) or Band Offset (BO), will be further determined. For the selected SAO filter type, the corresponding EO or BO parameters will be determined. The On/Off decision, EO/BO decision, and corresponding EO/BO parameters are embedded in the LCU header as described in HM-5.0. At the decoder side, SAO parameter derivation is not required since the SAO parameters are incorporated in the bitstream. The situation for ALF process is similar to SAO process. However, while SAO process is based on the DP-processed pixels, ALF process is based on the SAO-processed pixels.
As mention previously, DF process is deterministic, where the operations rely on underlying reconstructed pixels and information already available. No additional information needs to be derived by the encoder and incorporated in the bitstream. Therefore, in a video coding system without adaptive filters such as SAO and ALF, the encoder processing pipeline can be relatively straightforward. FIG. 5 illustrates an exemplary processing pipeline associated with key processing steps for an encoder. Inter/Intra Prediction block 510 represents the motion estimation/motion compensation for inter prediction and intra prediction corresponding to ME/MC 112 and Intra Pred. 110 of FIG. 1 respectively. Reconstruction 520 is responsible to form reconstructed pixels, which corresponds to T 118, Q 120, IQ 124, IT 126 and REC 128 of FIG. 1. Inter/Intra Prediction 510 is performed on each LCU to generate the residues first and Reconstruction 520 is then applied to the residues to form reconstructed pixels. The Inter/Intra Prediction 510 block and the Reconstruction 520 block are performed sequentially. However, Entropy Coding 530 and Deblocking 540 can be performed in parallel since there is no data dependency between Entropy Coding 530 and Deblocking 540. FIG. 5 is intended to illustrate an exemplary encoder pipeline to implement a coding system without adaptive filter processing. The processing blocks for the encoder pipeline may be configured differently.
When adaptive filter processing is used, the processing pipeline needs to be configured carefully. FIG. 6A illustrates an exemplary processing pipeline associated with key processing steps for an encoder with SAO 610. As mentioned before, SAO operates on DF-processed pixels. Therefore, SAO 610 is performed after Deblocking 540. Since SAO parameters will be incorporated in the LCU header, Entropy Coding 530 needs to wait until the SAO parameters are derived. Accordingly, Entropy Coding 530 shown in FIG. 6A starts after the SAO parameters are derived. FIG. 6B illustrates alternative pipeline architecture for an encoder with SAO, where Entropy Coding 530 starts at the end of SAO 610. The LCU size can be as large as 64×64 pixels. When an additional delay occurs in the pipeline stage, an LCU data needs to be buffered. The buffer size may be quite large. Therefore, it is desirable to shorten the delay in the processing pipeline.
FIG. 7A illustrates an exemplary processing pipeline associated with key processing steps for an encoder with SAO 610 and ALF 710. As mentioned before, ALF operates on SAO-processed pixels. Therefore, ALF 710 is performed after SAO 610. Since ALF control information will be incorporated in the LCU header, Entropy Coding 530 needs to wait until the ALF control information are derived. Accordingly, Entropy Coding 530 shown in FIG. 7A starts after the ALF control information are derived. FIG. 7B illustrates alternative pipeline architecture for an encoder with SAO and ALF, where Entropy Coding 530 starts at the end of ALF 710.
As shown in FIGS. 6A-B and FIGS. 7A-B, a system with adaptive filter processing will result in longer processing latency due to sequential process nature of the adaptive filter processing. It is desirable to develop a method and apparatus that can reduce processing latency and buffer size associated with adaptive filter processing.
While the in-loop filters can significantly enhance picture quality, the associated processing requires multi-pass access to picture-level data at the encoding side in order to perform parameter generation and filter operation. FIG. 8 illustrates an exemplary HEVC encoder incorporating deblocking, SAO and ALF. The encoder in FIG. 8 is based on the HEVC encoder of FIG. 1. However, the SAO parameter derivation 831 and ALF parameter derivation 832 are shown explicitly. SAO parameter derivation 831 needs to access original video data and DF processed data to generate SAO parameters. SAO 131 then operates on DF processed data based on the SAO parameters derived. Similarly, the ALF parameter derivation 832 needs to access original video data and SAO processed data to generate ALF parameters. ALF 132 then operates on SAO processed data based on the ALF parameters derived. If on-chip buffers (e.g. SRAM) are used for picture-level multi-pass encoding, the chip area will be very large. Therefore, off-chip frame buffers (e.g. DRAM) are used to store the pictures. The external memory bandwidth and power consumption will be increased substantially. Accordingly, it is desirable to develop a scheme that can relieve the high memory access requirement.