The present invention concerns scalable video coding.
In non-scalable coding, intra coding refers to coding techniques that do not reference data of already coding pictures, but exploit only data (e.g., reconstructed samples, coding mode, or symbol statistics) of already coded parts of the current picture. Intra-coded pictures (or intra pictures) are for example used in broadcast bitstreams in order to allow decoders to tune into a bitstream at so-called random access points. Intra pictures are also used to limit the error propagation in error-prone environments. In general, the first picture of a coded video sequence has to be coded as an intra picture, since here no picture is available that can be used as reference pictures. Often, intra pictures are also used at scene cuts where temporal prediction typically cannot provide a suitable prediction signal.
Furthermore, intra coding modes are also used for particular areas/blocks in so-called inter pictures, where they might perform better in terms of rate-distortion efficiency than inter coding modes. This is the often case in flat regions as well as in regions where temporal predictions performs rather poorly (occlusions, partially dissolves or fading objects).
In scalable coding, the concept of intra coding (coding of intra pictures and coding of intra blocks in inter pictures) can be extended to all pictures that belong to the same access unit or time instant. Therefore, intra coding modes for a spatial or quality enhancement layer can also make use of inter-layer prediction from a lower layer picture at the same time instant to increase the coding efficiency. That means that not only already coded parts inside the current enhancement layer picture can be used for intra prediction, but also already coded lower layer pictures at the same time instant can be exploited. The latter concept is also referred to as inter-layer intra prediction.
In the state-of-the-art hybrid video coding standards (such as H.264/AVC or HEVC), the pictures of a video sequence are divided into blocks of samples. The block size can either be fixed or the coding approach can provide a hierarchical structure which allows blocks to be further subdivided into blocks with smaller block sizes. The reconstruction of a block is typically obtained by generating a prediction signal for the block and adding a transmitted residual signal. The residual signal is typically transmitted using transform coding, which means the quantization indices for transform coefficients (also referred to as transform coefficient levels) are transmitted using entropy coding techniques, and at the decoder side, these transmitted transform coefficient levels are scaled and inverse transformed to obtain the residual signal which is added to the prediction signal. The residual signal is generated either by intra prediction (using only already transmitted data for the current time instant) or by inter prediction (using already transmitted data for different time instants).
If inter prediction is used, the prediction block is derived by motion-compensated prediction using samples of already reconstructed frames. This can be done by unidirectional prediction (using one reference picture and one set of motion parameters), or the prediction signal can be generated by multi-hypothesis prediction. In the latter case, two or more prediction signals are superimposed, i.e., for each sample, a weighted average is constructed to form the final prediction signal. The multiple prediction signals (which are superimposed) can be generated by using different motion parameters for the different hypotheses (e.g., different reference pictures or motion vectors). For unidirectional prediction, it is also possible to multiply the samples of the motion-compensated prediction signal with a constant factor and add a constant offset in order to form the final prediction signal. Such a scaling and offset correction can also be used for all or selected hypothesis in multi-hypotheses prediction.
In current state-of-the-art video coding techniques, the intra prediction signal for a block is obtained by predicting samples from the spatial neighborhood (which was reconstructed before the current block according to the blocks processing order) of the current block. In the most recent standards various prediction methods are utilized that perform prediction in the spatial domain. There are fine-granular directional prediction modes where filtered or unfiltered samples of neighboring blocks are extended in a specific angle to generate the prediction signal. Furthermore, there are also plane-based and DC-based prediction modes that use neighboring block samples to generate flat prediction planes or DC prediction blocks.
In older video coding standards (e.g., H.263, MPEG-4) intra prediction was performed in the transform domain. In this case the transmitted coefficients were inverse quantized. And for a subset of the transform coefficients, the transform coefficient value was predicted using the corresponding reconstructed transform coefficient of a neighboring block. The inverse quantized transform coefficients were added to the predicted transform coefficient values, and the reconstructed transform coefficients were used as input to the inverse transform. The output of the inverse transform did form the final reconstructed signal for a block.
In scalable video coding also the base layer information can be utilized to support the prediction process for the enhancement layer. In the state-of-the-art video coding standard for scalable coding, the SVC extension of H.264/AVC, there is one additional mode for improving the coding efficiency of the intra prediction process in an enhancement layer. This mode is signaled at a macroblock level (a block of 16×16 luma samples). This mode is only supported if the co-located samples in the lower layer are coded using an intra prediction mode. If this mode is selected for a macroblock in a quality enhancement layer, the prediction signal is built by the co-located samples of the reconstructed lower layer signal before the deblocking filter operation. If the inter-layer intra prediction mode is selected in a spatial enhancement layer, the prediction signal is generated by upsampling the co-located reconstructed base layer signal (after the deblocking filter operation). For upsampling, FIR filters are used. In general, for the inter-layer intra prediction mode, an additional residual signal is transmitted by transform coding. The transmission of the residual signal can also be omitted (inferred to be equal to zero) if it is correspondingly signaled inside the bitstream. The final reconstruction signal is obtained by adding the reconstructed residual signal (obtained by scaling the transmitted transform coefficient levels and applying an inverse spatial transform) to the prediction signal.
However, it would be favorable to be able to achieve a higher coding efficiency in scalable video coding.