The invention relates to a method and an apparatus for coding and decoding a picture sequence.
Predominantly motion-compensated hybrid codecs (encoder with simulated decoder contained therein) are used in data compression for moving picture sequences, such as e.g. in the MPEG1 or MPEG2 standard. By virtue of the regular insertion of intraframe-coded pictures (I frames), these compression methods enable access to any desired individual pictures in the entire bit stream or the playback of the bit stream from virtually any desired location. An intraframe-coded picture can be inherently individually decoded from the associated data and does not require any data from other pictures for the purpose of reconstruction. In contrast to this, interframe-coded pictures (P frames) cannot be inherently decoded but rather require in each case at least one reference picture for their reconstruction. This reference picture (anchor frame) must already have been decoded beforehand.
By virtue of the insertion of intraframe-coded pictures into a video bit stream, each picture of a picture sequence could be decoded, starting from such an intraframe-coded picture, without the video bit stream of the entire picture sequence having to be decoded. Each intraframe-coded picture could be decoded immediately and each interframe-coded picture could be decoded by decoding the chronologically nearest preceding intraframe-coded picture and subsequently decoding the picture sequence as far as the desired interframe-coded target picture.
A disadvantage of such a method is the high bit outlay necessary for intraframe-coded pictures. The factor of the required bits for intraframe-coded pictures I to the required bits for singly forward-predicted interframe-coded pictures P is approximately 10:1 in the typical MPEG2 format (M=3; N=12). In this case, N is the interval separating one intraframe-coded picture from the next and M is the interval separating an I picture and the succeeding P picturexe2x80x94or vice versaxe2x80x94and the interval separating one P picture and the next P picture. Situated in between there may be B pictures, which may be bi-directionally predicted for example.
The invention is based on the object of specifying a method for picture sequence coding and decoding in which it is possible to dispense with such I pictures or with their relatively frequent transmission, but decoding from virtually any desired location in a picture sequence can nevertheless take place.
The invention is based on the further object of specifying an apparatus for coding and for decoding a picture sequence with application of the method according to the invention.
As mentioned above, a codec usually contains a simulation of the receiver-end decoder. Coding errors thereof can thus also be taken into account by the encoder. The decoder simulation is usually arranged in a feedback loop of the hybrid codec. By virtue of the inventive insertion of an attenuation element into this feedback loop, the insertion and coding and the receiver-end decoding of intraframe-coded I pictures becomes superfluous. The attenuating element effects a decrease in amplitude values of predicted coefficients. The inventive attenuation in the feedback path of the hybrid codec therefore initially effects an artificial, actually undesirable deterioration in the prediction and leads to an increase in the prediction error to the coded. If the attenuation inserted in the feedback path is chosen suitably in terms of its size, however, then it is surprisingly possible to dispense entirely with the use of intraframe-coded pictures, without thereby losing the property of being able to decode the video bit stream at virtually any desired location.
The starting of the receiver-end decoding at any desired location in the bit stream or in a picture sequence occurs as follows: A grey-scale picture, preferably a grey-scale picture of average brightness, is used as the first prediction picture or as the reference picture. The first prediction error signal decoded in the receiver is combined with this grey-scale picture. The subsequent interframe-coded pictures are then decoded in a known manner. As a result of the insertion of the attenuation element in the encoder, the receiver-end reconstruction error generated in this way at the beginning is progressively reduced in the course of the decoding of the subsequent pictures since, on account of the attenuation element, errors which the encoder attempts to reduce have been artificially introduced in the encoder-end decoder function simulation as well.
The effect of the invention is that after the receiver-end decoding of L pictures, a viewer cannot perceive a difference between picture sequences which are coded in a known manner and contain I pictures at relatively short intervals, and picture sequences which are coded according to the invention and contain no I pictures at least over a relatively long period of time. B pictures may be arranged between the P pictures. The effect of these B pictures is no error propagation, on the one hand, but also no reduction of visible reconstruction errors, on the other hand.
The receiver-end video decoder can therefore display decoded pictures after the decoding of L picturesxe2x80x94or even a few pictures beforehand.
The value of the parameter L depends on the setting of the attenuation D of an attenuation element in the feedback of the codec and also determines the resulting bit rate. When L is small, the attenuation D is large and the convergence of the receiver-end decoding error is faster, but the bit rate is increased. When L is large, the attenuation D is small and the convergence of the receiver-end decoding error is slower, but the bit rate is low.
In principle, the inventive method for coding a picture sequence consists in the fact that transformed video data coefficients formed from difference values relating to pixel values of the picture sequence are entropy-encoded, with the transformed video data coefficients being subjected to inverse transformation, and being used in predicted form for the formation of the difference values relating to the pixel values, and in that the predicted pixel values are attenuated in terms of their amplitude prior to the formation of the difference values.
In principle, the inventive method for decoding a picture sequence of transformed and coded video data coefficients formed from difference values relating to pixel values of the picture sequence, consists in the fact that the entropy-decoded, transformed video data coefficients being subjected to inverse transformation and being combined in predicted form with the difference values, the difference values on which the transformed and coded video data coefficients are based being derived from predicted pixel values, attenuated in terms of their amplitude, and, at the start of decoding, the inverse-transformed video data coefficients being combined with a grey-scale picture.
In principle, the inventive apparatus for coding a picture sequence is provided with:
means for forming difference values, relating to the picture sequence, from pixel values, to which input data of the picture sequence are fed;
means for forming transformed video data coefficients which are derived from the difference values;
an entropy-encoder for the transformed video data coefficients;
means for forming inverse-transformed video data coefficients which are derived from the transformed video data coefficients;
means for forming predicted pixel values, whose output signal is used to form the difference values, an attenuation unit attenuating the predicted pixel values before they are used for forming the difference values.
In principle, the inventive apparatus for decoding a picture sequence is provided with:
an entropy-decoder for transformed video data coefficients;
means for forming inverse-transformed, decoded video data coefficients which contain difference values of pixel values;
means for forming predicted pixel values, whose output signal is combined with the difference values and constitutes the decoded picture sequence,
in which the difference values are derived from predicted pixel values, attenuated in terms of their amplitude at the encoder end, and, at the start of decoding, the inverse-transformed video data coefficients are combined with a grey-scale picture in combination means.