The invention relates to a coder for segmented coding of an input signal by means of a controllable quantizer and a buffer memory in which data of the input signal, which are quantized and VLC-coded in operation, are buffered.
Such coders are used, for example in videophones constructed in accordance with the CCITT standard H.261 (cf. for example: Draft Revision of Recommendation H.261: Video Codec for Audiovisual Services at p.times.64 kbit/s. Signal Processing: Image Communication 2 (1990), pp. 221-239, Elsevier). This publication will hereinafter be quoted as (I).
A further possibility of use is in coding still pictures, in so far as the coding is performed in accordance with the CCITT Recommendation T.81. Coding of a still picture is performed in several steps with an increasing picture quality, while in each step the previously coded and decoded still picture is used as a prediction picture. The Recommendation T.81 will hereinafter be quoted as (II) (cf.: International Standard DIS 1011918-1. CCITT Recommendation T.81. Digital Compression and Coding of Continuous-tone Still Images. Part I: Requirements and Guidelines (JPEG)).
Finally, coders having the features described in the opening paragraph are also used in audio coding, as is clear from, for example the following document: Brandenburg, K. et al.: Aspec : Adaptive spectral entropy coding of high-quality music signals. An Audio Engineering Society preprint. Presented at the 90th Convention 1991, Feb. 19-22, Paris. Preprint 3011 (A-4). It will hereinafter be quoted as (III).
However, the present invention will mainly be explained with reference to a video coder which is compatible with the H.261 standard. Its transfer to other possibilities of use will then be evident from this special explanation.
The video data to be coded are applied image by image to a coder in accordance with (I). The signal segments mentioned hereinbefore are generally (however, see below) the prediction errors associated with a video picture in the CIF format or in the QCIF format (cf. (I)).
In accordance with (I), video data in the CIF format are hierarchically divided into further sub-data, namely into twelve groups of blocks (GOBs). Each GOB is split up into thirty-three macroblocks and each macroblock is split up into six blocks. A block consists either of 64 luminance values (Y values) of a square picture section of 8.times.8 pixels or of 64 values of one of the two colour difference components (C.sub.B or C.sub.R values). These chrominance values result from a picture section of 16.times.16 pixels which with respect to the chrominance is horizontally and vertically sub-sampled in a 1:2 ratio. A macroblock comprises both luminance and chrominance information components of a picture section of 16.times.16 pixels.
A coder in accordance with (I) operates in two main modes, namely the intermode or the intramode. In the intermode, the differences of the pixel data between an input picture (current picture) and its decoded and possibly motion-compensated previous picture (prediction picture) are coded. In the intramode, the input picture instead of the prediction error is coded without reference to the previous picture.
Coding is performed in blocks. The block data or the block data differences are subjected to a two-dimensional discrete cosine transform. The transform coefficients then determined are applied to a controllable quantizer, VLC coded after their quantization and stored in a buffer memory. Moreover, the coded picture is decoded again (in blocks) in a feedback branch of the coder, stored in a picture memory and used as a prediction picture for the next input picture.
The quantizer of the coder is controllable in as far as its quantization curve can be adjusted. Different quantization curves can be distinguished by the length of their quantization intervals. Within a quantization curve, the length of the quantization interval (also referred to as quantization step) is usually equal. A coarse quantization is obtained with characteristic curves having a large quantization step and a fine quantization is obtained with characteristic curves having a small quantization step. If a coarse quantization is carried out, a large quantization error is taken into the bargain and generally fewer VLC data are stored in the buffer memory. If a fine quantization is carried out, the number of bits generally increases, which bits are buffered in the form of VLC codewords in the buffer memory before they are transmitted.
The buffer memory is emptied with a temporally constant bitrate, while the memory can be filled with an extremely temporally dependent bitrate. Whether many or few bits are stored per second in the buffer memory depends on the adjustment of the quantizer as well as on the size of the prediction error, hence on the picture data.
Reference (I) does not prescribe which quantization curve is to be used for which data of a video picture. It only prescribes that only one quantization curve is to be used within a macroblock, i.e. this curve may at most be changed from macroblock to macroblock, and that always the same quantization curve should be used for the DC coefficients in the intramode. The possible quantization steps are the integral numbers 2, 4, 6 . . . up to 62.
The quantizer is generally controlled in dependence upon the filling state P of the buffer, as shown, for example in EP 0 363 682 or EP 0 284 168. Such a control is used to avoid overflow or underflow of the buffer memory. Since the capacity of memories is hardly a technical problem these days, so that the risk of overflow and the attendant loss of information no longer exists, because sufficiently large memories can be used, it is nevertheless desirable that the filling state P of the buffer memory does not exceed a threshold value PS. The reason is that the temporal distance--quite distinct at low transmission bitrates--between coding a video picture at the transmitter end and decoding and building up the same video picture at the receiver end will become unacceptably large. For example, this temporal distance should be substantially unnoticeable to the participants in a videophone dialogue. A time within a range of one tenth of a second is acceptable. Therefore, also for memory capacities which are technically almost unlimited, the problem remains that the filling state P of the buffer memory should not exceed the threshold value PS for the reasons mentioned hereinbefore.
The control of the quantizer by means of the filling state of the buffer memory only (hereinafter referred to as filling state control) generally has the result that macroblocks which belong to the same picture or to the same picture difference are quantized with different quantization curves, because the filling state of the buffer memory may be subject to large fluctuations during coding.
The change from one quantization curve to another during coding of a picture is arbitrary in the filling state control mode in the sense that this change must be effected without taking the information contents of the macroblocks still to be coded into account. Thus, parts of the picture which are important to the viewer may be coarsely quantized and other--less important--parts may be quantized in an unnecessarily fine way. Consequently, the filling state control leads to a subjectively and objectively unsatisfactory picture quality.