Transform coding within video coding has been shown to be very useful in removing redundancy in prediction errors. Such prediction errors come from predicting a current block of samples or pixels, typically denoted coding block (CB) or coding unit (CU) in the art, based on spatially neighboring previously coded samples (intra prediction) or based on temporally neighboring previously coded samples (inter prediction). In High Efficiency Video Coding (HEVC), also denoted H.265 in the art, the current block is divided into prediction blocks (PBs) or prediction units (PUs) of the same size as the current block or for a portion of the current block. A respective intra prediction mode or inter prediction is then selected for each such prediction block. A transform is applied in transform blocks (TBs) or transform units (TUs) on prediction errors with the same size as the current block (maximum TU size is 32×32 samples in HEVC) or for a portion of the prediction errors of the current block (minimum TU size is 4×4 samples in HEVC) to obtain transform coefficients. The transform coefficients are then quantized and entropy encoded, e.g. by Context Adaptive Binary Arithmetic Coding (CABAC) in HEVC. HEVC also support transform skip which means the prediction errors are coded without a transform.
A problem with transform coding is that it can produce visual artifacts in the form of transform basis patterns when strong quantization is used during the video coding, e.g. at challenging bitrates. This problem is illustrated in FIGS. 1A and 1B. FIG. 1A illustrates a residual block of samples having a respective prediction error value before applying transformation and quantization. FIG. 1B illustrates a reconstructed version of the residual block in FIG. 1A obtained by transforming and then quantizing the residual block followed by dequantization and then inverse transformation. Artifacts known as “ringing” are clearly seen in the top part of FIG. 1B.
In the current video coding standards, all frequencies in a residual block are coded at the same block size. In practice, a large area in a picture or frame of a video stream might contain smooth gradients as well as local high frequency parts. An encoder then has to choose whether to select small transform block sizes and risk having to encode the smooth gradient many times or select a large transform block size and still try to encode the local high frequencies. The former case leads to inefficient coding, whereas the latter case leads to ringing artifacts as shown in FIG. 1B.
As an alternative the encoder can select to not use the transform and code the prediction errors of the residual block without any transformation. However, the efficiency of such an approach is relatively bad for natural video content which typically contain spatial correlation between samples.
U.S. Pat. No. 8,077,991 discloses a technique combining the energy compaction features of transform coding with localization properties of spatial coding. In more detail, transform coding is performed to the prediction errors of a residual block to create a first representation of the prediction errors. Spatial coding is performed to the prediction errors to create a second representation of the prediction errors. The two representations are joined to form a coded prediction error signal.
There is therefore a need for improvements within transform block decoding and encoding.