The present invention relates to layered coding and decoding methods, apparatuses, and programs for moving pictures based on spatial- and time-domain and also interlayer correlation.
Several spatial- and time-domain resolution and SNR (Signal-to-Noise) scalable video coding schemes have been proposed and employed in a variety of fields. In particular, spatial-domain resolution scalable video coding schemes are most applicable to still and moving pictures.
A spatial-domain resolution scalable layered video coding scheme produces interlayer predictive signals for use in coding enhancement layers that exhibit higher spatial resolutions than base layers also to be coded. The interlayer predictive signals are produced through enhancement-layer motion compensation and time domain prediction with base-layer spatial decimation and interpolation.
The known layered video coding scheme, however, suffers elimination of high-frequency components, that are required in production of interlayer predictive signals, through base-layer spatial decimation.
In detail, the layered video coding scheme employs interframe prediction in enhancement-layer coding based on correlation between video frames subjected to enhancement-layer coding and locally decoded frames produced in decoding of base-layer coded frames.
In this procedure, high-frequency components, like those carried by video frames to be subjected to enhancement-layer coding, are eliminated from video frames subjected to base-layer coding. The elimination may occur due to limited bandwidth in production of base-layer frames with a spatial scale down procedure. It may also occur through a coding procedure, e.g., quantization, when such high-frequency components are treated as less important components through base-layer coding.
The elimination of high-frequency components results in insufficient predictive coding to produce inaccurate enhancement-layer predictive frames based on correlation between enhancement and base layers. This mostly occurs for zones in a video frame that carries high-frequency components continuously, such as, an edge portion, due to the problem in that such high-frequency components are eliminated from video frames subjected to base-layer coding, as discussed above.
Even when predictive coding is done (although insufficiently) in spite of elimination of such high-frequency components through base-layer spatial decimation, another problem arises in which enhancement-layer predictive-coded frames carrying high-frequency components suffer increase in signal level or data amount, especially, for zones in a video frame that carries high-frequency components continuously, compared to zones that mainly carry low-frequency components.