1. Field of Application
The present invention relates to an apparatus for encoding a video signal to produce an encoded signal for transmission or recording, with the encoded signal containing substantially lower amounts of data than the original video signal. In particular, the invention relates to an apparatus for inter-frame predictive encoding of a video signal.
2. Prior Art Technology
Various methods have been proposed in the prior art for converting a digital video signal to a signal having a lower data rate, for example in order to reduce the bandwidth requirements of a communications link or to reduce the storage capacity required for recording the video signal. Such methods are used for example with moving-image video telephone systems. Basically, such methods can be divided into those which utilize the fact that there is generally a high degree of correlation between successive frames of a video signal, (this fact being used for example for interframe predictive encoding), those which utilize the fact that there is generally high correlation between each pixel of a frame and closely adjacent pixels on the same scanning line or on closely adjacent scanning lines (which fact is used for intra-frame or intra-field encoding in units of blocks of each frame or each field), and methods which use a combination of these two types of correlation. One method known in the prior art for using the generally close correlation between successive frames is to periodically transmit (i.e. at fixed numbers of frame intervals) certain frames, and to omit those frames which are intermediate between the transmitted frames, with the omitted frames being restored by interpolation at the receiving system, based on the information contained in the transmitted frames. Intra-frame block encoding of the transmitted frames may be executed prior to transmission, to further reduce the rate of data transmission. An example of such a method is described in U.S. Pat. No. 4,651,207, in which amounts of change between portions of each transmitted frame and the preceding transmitted frame are derived as motion vectors by the receiving system, and these motion vectors utilized for interpolating the omitted frames. If the movement within the picture that is conveyed by the video signal is comparatively simple then this may provide sufficient accuracy and a high degree of encoding efficiency. However if any substantial change in the picture contents occurs in the interval between two successive transmitted frames, then interpolation will be unsuccessful, so that such a method is of limited application. In addition, if any complex movement occurs in the picture in the interval between two successive transmitted frames, then again interpolation of the omitted frames will not be accurately achieved.
Another known method is to periodically utilize certain frames as reference frames, and to derive prediction error values with respect to a preceding reference frame, for each of the other frames prior to transmission. Here, "prediction error value" signifies an amount of difference between a (digital) value in the original video signal representing a pixel luminance (Y) or color difference (B-Y) or (R-Y) value and a corresponding value of the preceding reference frame. These reference frames, i.e. independent frames, are then encoded and transmitted, while only the prediction error values are encoded and transmitted for the remaining frames (i.e. dependent frames).
An example of a known method of prediction error encoding of a video signal is recursive inter-frame prediction error encoding. With that method, recursive derivation of prediction error values for each frame is executed based on accumulations of past prediction error values. Specifically, a set of prediction values held in a frame memory are successively subtracted from the data values of each frame to thereby obtain prediction error values for that frame, and the resultant prediction error values are encoded and transmitted. At the same time, decoding of the prediction error values is executed, in the same way as decoding at the receiving apparatus, and the recovered prediction error values are added to the corresponding ones of the prediction values that were used in obtaining them, then the results are stored in the frame memory for use as prediction values for the next frame of the video signal. Thus basically, only prediction error values are derived and transmitted with this method. At the receiving apparatus, each frame is recovered by superimposition of prediction error values. Such a recursive inter-frame predictive encoding apparatus is based on a closed loop, which supplies a prediction signal to be subtracted from the signal of the current frame of the input video signal.
Such a predictive encoding method utilizes only the correlation between successive frames of the video signal along the forward direction of the time axis, i.e. between each independent frame and a preceding independent frame. However there is of course similar correlation between each independent frame and the succeeding independent frame. A predictive encoding apparatus which makes use of this fact to enable more accurate predictive encoding operation, by using both the forward and reverse directions of the time axis, has been disclosed by the assignee of the present invention in U.S. Pat. No. 4,985,768, filed Jan. 18, 1990. The basic principles of such an encoding apparatus are illustrated in FIG. 1. Here, for each of the dependent frames 2, 3, 4 and 6, 7, 8, respective prediction error values are derived based on a combination of data values obtained from the preceding and succeeding independent frames, as indicated by the arrows. For example, inter-frame predictive encoding of frame 2 is executed based on the independent frames 1 and 5. This is also true for frames 3 and 4. More precisely, a first prediction signal for frame 2 is derived based on frame 1 as a reference frame, and a second prediction signal for frame 2 is derived based on frame 5 as a reference frame. These two prediction signals are then multiplied by respective weighting factors and combined to obtain a final prediction signal, (i.e. train of prediction values for frame 2) which is subtracted from the signal of frame 2 to obtain a corresponding prediction error signal. In this case greater weight is given to the first prediction signal (since frame 2 will have greater correlation with frame 1 than frame 5). Prediction signals for the other dependent frames are similarly derived. Since in this case correlation between a preceding independent frame and a succeeding independent frame is utilized to obtain prediction error values for each dependent frame, a substantially greater degree of accuracy of prediction is attained than with methods in which only inter-frame correlation along the forward direction of the time axis is utilized.
To increase the coding efficiency with each of such video signal encoding methods, intra-frame processing is also generally utilized, whereby both the data values of the independent frames and the prediction error values are subjected to orthogonal transform processing in units of blocks, to obtain coefficient values which are subjected to quantization, with the results then being encoded for transmission. Each block (e.g 8.times.8 array of values) may consist of a set of luminance values (corresponding to respective pixels) of an independent frame, or chrominance values of an independent frame, or may consist of a set of luminance or chrominance prediction error values (corresponding to respective pixels) of a dependent frame. The characteristics of such a predictive encoding system are basically determined by the block size and the quantization threshold level which determines the size of the quantization steps. The larger the block size, and the higher the quantization threshold level, the greater will be the encoding efficiency, i.e. the lower will be the data rate of the output signal produced from the encoding apparatus. More specifically, in the case of orthogonal transform processing and quantization of prediction error values, if the quantization step size is made relatively large. By using a large threshold level, small values of prediction error will be eliminated from being encoded and transmitted, with only relatively large amounts of prediction error being encoded. That is to say, the quantization operation is executed such that small value coefficients produced from the orthogonal transform processing are effectively reduced to zero in the encoded output signal. Since these small coefficients correspond to spatially small amounts of displacement (in the picture represented by the video signal) which are not visually conspicuous, the effects of eliminating these small coefficient values are not visually conspicuous in a television picture obtained by receiving and decoding such a transmitted encoded video signal. Thus, a small amount of displacement between a data value of a frame and a corresponding data value of a preceding frame will not be encoded as a prediction error value, and only when the magnitude of such an amount of displacement (prediction error) has accumulated to a relatively large value over a number of successive frames (i.e. as a result of continuing movement within the picture) will the predictive error become sufficiently large to be encoded after having been subjected to orthogonal transform conversion and quantized.
However the above methods of encoding a video signal to obtain an output signal having a lower data rate than that of the video signal have various disadvantages. Those methods in which only periodically selected frames are transmitted, with intermediate frames of the video signal being omitted, have been mentioned above. In the case of the recursive type of inter-frame predictive encoding, whereby only prediction error values are encoded and transmitted for each frame, the output data flow is irregular, so that it is necessary to transfer the output data through an output buffer. In practice it is necessary to provide some means for ensuring that the buffer will not overflow. In the prior art it has only been possible to control the data rate of the output encoded signal, to thereby prevent buffer overflow when necessary, by increasing the quantization threshold level. However this has the disadvantage of resulting in distortion of the contents of each block of the transmitted encoded data, if the increase in the quantization threshold level is large in relation to the optimum threshold level.
In the case of methods in which independent frames are periodically encoded and transmitted, with only prediction error values for each of the intermediate frames being derived based on the independent frames and encoded and transmitted, there is the disadvantage that most of the high-frequency components of the prediction error values (as represented in the encoded output signal) are encoded and transmitted. There is often only a low degree of correlation between the high frequency components of adjacent frames of a video signal, and in practice there is little loss in accuracy of recovery of the independent frames if these high frequency components are omitted from the prediction error values. However if low-pass filtering were to be utilized to eliminate these high frequency components, then since that filtering would also be applied to the independent frames, a loss of resolution would result in the image obtained by decoding an output signal transmitted from such a system.
Furthermore, for both the recursive type of inter-frame predictive encoding and the method in which only periodic independent frames and prediction error values derived based on these are encoded, due to the fact that the encoding characteristics will depend upon the orthogonal transform and quantization operations, it is very difficult to establish a balance between time axis resolution and spatial resolution. If the proportion of the prediction error values that are actually encoded (and transmitted) is reduced, e.g. by increasing the quantization level, thereby lowering the time axis resolution, the lower will become the spatial resolution (of the finally obtained reproduced picture).
There is therefore a requirement for an inter-frame predictive encoding apparatus whereby the high frequency components of the prediction error signal can be eliminated, without significantly reducing the resolution of an image obtained by decoding an output signal produced from the apparatus.