The present invention relates to video-signal layered coding and decoding methods, apparatuses, and programs.
Several spatial- and temporal resolution and SNR (Signal-to-Noise) scalable video coding schemes have been proposed and employed in a variety of fields. In particular, spatial-domain resolution scalable video coding schemes are most applicable to still and moving pictures.
A known spatial-resolution scalable video coding scheme with two layers of base and enhancement layers is disclosed in, for example, Japanese Un-examined Patent Publication No. 2007-162870. In a coding apparatus, an input video signal having the spatial resolution of the enhancement layer is decimated into a signal having the spatial resolution of the base layer. The decimated signal is then coded at the base layer (base-layer coding), followed by prediction using correlation between the input video signal and a signal having the spatial resolution of the enhancement layer, the signal being given by spatial interpolation of a decoded signal produced in the base-layer coding. A predictive error signal produced in the prediction is coded into a bitstream. The bitstream and other bitstreams produced in the base-layer coding are multiplexed. The multiplexed bitstream is sent to a decoding apparatus and decoded in a reversed fashion.
In the known spatial-resolution scalable video coding scheme, the decoded signal produced in the base-layer coding is interpolated to be used as a predictive signal in enhancement-layer coding. This is because, the input video signal at the enhancement layer and the decoded signal at the base layer have some correlation. In other words, the base-layer decoded signal has some high-frequency components carried by the enhancement-layer input video signal.
In theory, the higher the correlation between the input video signal at the enhancement layer and the decoded signal at the base layer, the higher the coding efficiency. However, in reality, the decoded signal at the base layer has a lower spatial resolution due to decimation of the input video signal, thus not having high-frequency components of the input video signal. A larger quantization step size could give a decoded signal having a lower correlation with the input video signal.
It is thus required to perform an estimation procedure (a spatial-resolution enhancing procedure) so that a predictive signal has a higher correlation with the input video signal in terms of spatial resolution, in addition to interpolation of the base-layer decoded signal.