The present invention relates to a moving-picture temporal scalable coding method and a moving-picture temporal scalable coding apparatus, a moving-picture temporal scalable decoding method and a moving-picture temporal scalable decoding apparatus, and also a computer program for performing the coding or the decoding method.
Moving-picture coding is classified into simple one-layer coding and scalable coding for encoding two-layer bitstreams. The latter allows decoding a bitstream of a base layer only and also decoding a bitstream of an enhancement layer, decoded base-layer and enhancement-layer pictures being combined to reproduce high-quality pictures.
Scalable coding is classified into SNR (Signal-to-Noise Ratio), spatial, and temporal scalable coding. The temporal scalable coding is to decimate, for example, a 60-fps (field per second) interlaced image per field to obtain a 30-fps image and encode this 30-fps image while predicting the remaining non-encoded fields by using a locally decoded image of the encoded fields and encode prediction residuals.
In known moving-picture temporal scalable coding, a 60-fps interlaced moving-picture video signal is divided into even-number fields and odd-number fields.
The even-number fields are subjected to coding while the odd-number fields are subjected to delay.
In coding, a video signal carrying 30-fps even-number fields is coded into a bitstream and quantization resultants (not a bitstream but signal components at least quantized). The coding technique may be MPEG inter-picture predictive coding or intrafield coding.
The quantization resultants are subjected to local decoding to be reproduced into a local decoded picture. The local picture is subjected to inter-picture prediction to produce a predictive signal for each odd-number field.
In delaying, each odd-number field is delayed until the predictive signal is produced based on each even-number field, as explained above.
The predictive signal is subtracted from an odd-number-field delayed signal to obtain a prediction residual.
The prediction residual is subjected to DCT (Discrete Cosine Transform). The resultant 8×8 DCT coefficients are subjected to quantization at a given step width. The resultant fixed-length coefficients (prediction residual) are subjected to variable-length coding to obtain a bitstream.
This bitstream is multiplexed with the bitstream already obtained from the even-number fields, as an output moving-picture bitstream under temporal scalable coding.
In summary, under the known temporal scalable coding, an interlaced moving-picture video signal is divided into even-number fields and odd-number fields. The even-number fields are converted into a base-layer bitstream while the odd-number fields an enhancement-layer bitstream, or vice versa.
The base-layer bitstream and the enhancement-layer bitstream are multiplexed with each other to form an output moving-picture bitstream under temporal scalable coding, as illustrated in FIG. 1.
In FIG. 1, a sign “field” indicates one field of an interlaced video. The numbers attached to the signs “field” indicate the order of coded pictures. Base-layer pictures come before enhancement-layer pictures for bi-directional prediction of the enhancement-layer pictures, even though the former pictures come after the latter pictures in the time domain. The reverse order is further required among the base-layer pictures when bi-directional prediction is performed for these pictures.
In known moving-picture temporal scalable decoding, a moving-picture bitstream obtained from a 60-fps interlaced moving-picture video signal by temporal scalable coding, is divided into a base-layer bitstream, an enhancement-layer bitstream, and a scale factor.
The base-layer bitstream is decoded so that a 30-fps video signal is reproduced. The reproduced signal carries even-number fields of the 60-fps interlaced moving-picture video signal. The reproduced signal is subjected to inter-picture prediction to produce a prediction signal for odd-number fields of the interlaced moving-picture video signal.
The enhancement-layer bitstream is subjected to variable-length decoding so that variable-length codes of prediction residual is reconverted into fixed-length codes.
The fixed-length codes are subjected to dequantization at a given quantization parameter to be reproduced into DCT coefficients of prediction residual.
The DCT coefficients are subjected to inverse DCT so that 8×8 DCT coefficients are converted into a decoded prediction-residual signal.
The decoded prediction-residual signal is added to the prediction signal already produced to form a 30-fps decoded video signal. This decoded signal carries the odd-number fields of the 60-fps interlaced moving-picture video signal.
The odd-number fields of the 30-fps decoded video signal and the even-number fields of the 30-fps video signal are selected in synchronism with the scale factor. The latter video signal carrying the even-number fields have already been decoded and delayed until the former video signal is decoded.
The odd-/even number field selection reproduces the 60-fps interlaced moving-picture video signal.
As explained, under the known temporal scalable coding, an interlaced moving-picture video signal is divided into even-number fields and odd-number fields. The even-number fields are converted into base-layer bitstream while the odd-number fields an enhancement-layer bitstream, or vise versa.
The known temporal scalable coding, however, has several drawbacks.
Base-layer coding causes many prediction errors in motion-compensated inter-picture prediction due to many aliasing components involved in field pictures.
Enhancement-layer coding suffers inaccurate inter-picture prediction due to difference in parity (even/odd) of fields between pictures to be coded and prediction reference pictures.
These two factors drastically lower coding efficiency in the known temporal scalable coding compared to other coding techniques.