Many video compression formats, such as for example H.263, H.264, MPEG-1, MPEG-2, MPEG-4, SVC, use block-based discrete cosine transform (DCT) and motion compensation to remove spatial and temporal redundancies. They are often referred to as predictive video formats. Each frame or image in the video signal is identified with an index known as the POC (standing for “picture order count”). Each frame or image is divided into at least one slice which is encoded and can be decoded independently. A slice is typically a rectangular portion of the frame, or more generally, a portion of a frame or an entire frame. Further, each slice may be divided into macroblocks (MBs), and each macroblock is further divided into blocks, typically blocks of 64×64, 32×32, 16×16 or 8×8 pixels.
In High Efficiency Video Coding (HEVC), blocks of from 64×64 to 4×4 may be used. The partitioning is organized according to a quad-tree structure based on largest coding units (LCUs). An LCU corresponds, for example, to a square block of 64×64. If an LCU needs to be divided, a split flag indicates that the LCU is split into four 32×32 blocks. In the same way, if any of these four blocks need to be split, the split flag is set to true and the 32×32 block is divided into four 16×16 blocks etc. When a split flag is set to false, the current block is a coding unit CU which is the frame entity to which the encoding process described below is applied. A CU has a size equal to 64×64, 32×32, 16×16 or 8×8 pixels.
Each CU can be further split into four or more transform units, TUs, which are the frame entities on which DCT and quantization operations are performed. A TU has a size equal to 32×32, 16×16, 8×8 or 4×4 pixels.
There are two families of coding modes for coding blocks of an image: coding modes based on spatial prediction, referred to as INTRA prediction and coding modes based on temporal prediction, referred to as INTER prediction. In both spatial and temporal prediction modes, a residual is computed by subtracting the predictor from the original block.
An INTRA block is generally predicted by an INTRA prediction process from the encoded pixels at its causal boundary. In INTRA prediction, a prediction direction is encoded.
Temporal prediction consists in finding in a reference frame, either a previous or a future frame of the video sequence, an image portion or reference area which is the closest to the block to be encoded. This step is typically known as motion estimation. Next, the block to be encoded is predicted using the reference area in a step typically referred to as motion compensation—the difference, known as residual, between the block to be encoded and the reference portion is encoded in a bitstream, along with an item of motion information relative to the motion vector which indicates the reference area to use for motion compensation. In temporal prediction, at least one motion vector is encoded.
Effective coding chooses the best coding mode between INTER and INTRA coding for each coding unit in an image to provide the best trade-off between image quality at the decoder and reduction of the amount of data to represent the original data to encode.
The residual resulting from the prediction is then subjected to DCT transform and quantization.
Both encoding and decoding processes involve in general a decoding process of an encoded frame. This process called close loop decoding is typically performed at the encoder side for the purpose of producing the same reference frames at the encoder than those used by the decoder during the decoding process.
To reconstruct the encoded frame, the residual is inverse quantized and inverse transformed in order to provide the “decoded” residual in the pixel domain. The “decoded” residual is added to the spatial or temporal predictor used above, to obtain a first reconstruction of the frame.
The first reconstruction is then filtered by one or several kinds of post filtering processes. These post filters are applied on the reconstructed frame at encoder side and the decoder side again in order that the same reference frame is used at both sides.
The aim of this post filtering is to remove compression artifacts and improve image quality. For example, H.264/AVC uses a deblocking filter. This filter can remove blocking artifacts due to the DCT quantization of residual and to block motion compensation. These artifacts are visually important at low bitrates. The deblocking filter operates to smooth the block boundaries according to the characteristics of two neighboring blocks. In the current HEVC standard, two types of loop filters are used generally consecutively: deblocking filter and sample adaptive offset (SAO).
The aim of the SAO loop filter is to improve frame reconstruction by sending additional data as opposed to a deblocking filter where no information is transmitted.
Conventional SAO filtering uses a rate distortion criterion to find the best SAO parameters, e.g. SAO filtering type, Edge Offset direction or Band Offset start, offsets. Usually such rate distortion criterion cannot be implemented at the decoder.
Implementing a SAO loop filtering at the encoder thus requires that the SAO parameters are transmitted in the bitstream to the decoder. Since SAO parameters are determined for each frame area, often each LCU, a great number of SAO parameters has to be transmitted.
This has a non-negligeable rate cost with regards to the transmitted bitstream, but also requires a SAO memory buffer that is sufficiently sized at the decoder to receive and store useful SAO parameters.
In addition, the current way of determining the best SAO parameters appears quite complex and resource-demanding for real time applications at the encoder and decoder.