1. Field of Application
The present invention relates to an apparatus for encoding a video signal to produce an encoded signal for transmission or recording, with the encoded signal containing substantially lower amounts of data than the original video signal. In particular, the invention relates to an apparatus for inter-frame predictive encoding of a video signal, which is especially applicable to television conferencing systems or to moving-image video telephone systems.
2. Prior Art Technology
Various methods have been proposed in the prior art for converting a digital video signal to a signal containing smaller amounts of data, for example in order to reduce the bandwidth requirements of a communications link, or to reduce the storage capacity required for recording the video signal. Such methods are especially applicable to television conferencing or moving image video telephone systems, and utilize the fact that there is generally a high degree of correlation between successive frames of a video signal, and hence some degree of redundancy if all of the frames are transmitted. One basic method, described for example in U.S. Pat. No. 4,651,207, is to periodically omit one or more frames from being transmitted, and to derive information at the receiving end for interpolating the omitted frames (based on movement components in the transmitted frames). Such a method will provide satisfactory operation only so long as successive frames contain only relatively small amounts of change between one frame and the next. Another basic method known in the prior art is to periodically transmit (i.e. at fixed numbers of frame intervals) frames which are independently encoded, these being referred to in the following as independent frames, while, for each frame occurring between successive independent frames (these being referred to in the following as dependent frames), only amounts of difference between that frame and the preceding independent frame as encoded and transmitted, i.e. inter-frame predictive encoding is executed with the independent frames being used as reference frames. With a more practical form of that method, known as adaptive predictive encoding, such inter-frame predictive encoding is executed only when it is appropriate, that is to say only when there is no great difference between successive frames. When such a large difference is detected, then intra-frame encoding is executed. Example of such inter-frame encoding are described in the prior art for example in "15/30 Mb/s Motion-Compensated Inter-frame, Inter-field and Intrafield Adaptive Prediction Coding" (Oct. 1985) Bulletin of the Society of Television Engineers (Japan), Vol. 39, No. 10. With that method, a television signal is encoded at a comparatively high data rate. Movement-compensation inter-frame prediction, intra-field prediction, and inter-field (i.e. intra-frame) prediction are utilized. Another example is described in "Adapative Hybrid Transform/Predictive Image Coding" (March 1987) Document D-1115 of the 70th Anniversary National Convention of the Society of Information and Communication Engineers (Japan). With that method, switching is executed between inter-frame prediction of each dependent frame based on a preceding independent frame (which is the normal encoding method) and prediction that is based on adjacent blocks of pixels, prediction that is based on the image background, and no prediction (i.e. direct encoding of the original video signal). In the case of the "no-prediction" processing, orthogonal transform intra-frame encoding is executed, while in the case of background prediction, a special type of prediction is utilized which is suitable for a video signal to be used in television conferencing applications. Processing operation is switched between pixel blocks varying in size from 16.times.16 to 8.times.8 elements, as block units.
With such prior art adaptive predictive encoding methods, when a dependent frame is to be decoded (at the receiving end of the system, or after playback from a recording medium) the required data are obtained by cumulative superposition of past data relating to that frame, so that all of the related past data are required. It is necessary to use storage media for decoding which will enable random access operation, to obtain such data. This sets a limit to the maximum size of period of repetition of the independent frames (alternatively stated, the period of resetting of inter-frame predictive encoding operation), since if that period is excessively long then decoding storage requirements and operation will be difficult. However the shorter this resetting period is made, the greater will be the amounts of data contained in the encoded output signal and hence the lower will become the encoding efficiency. Typically, a period of 4 to 8 frames has been proposed for the prior art methods.
FIGS. 1A and 1B are simple conceptual diagrams to respectively illustrate the basic features of the aforementioned inter-frame predictive encoding methods and the method used in the aforementioned U.S. patent application by the assignee of the present invention. A succession of frames of a video signal are indicated as rectangles numbered 1, 2, . . . The shaded rectangles denote independent frames (i.e. independently encoded frames that are utilized as reference frames) which occur with a fixed period of four frame intervals, i.e. inter-frame predictive encoding is assumed to be reset once in every four frames. As indicated by the arrows, prediction operation is executed only along the forward direction of the time axis, so that difference values between a dependent frame and an independent frame (referred to in the following as prediction error values) are always obtained by using a preceding independent frame as a reference frame. Thus, independent frame No. 1 is used to derive prediction error values for each of frames 2, 3, and 4, which are encoded and transmitted as data representing these frames.
Such a prior art prediction method has a basic disadvantage. Specifically, only the correlation disadvantage. Specifically, only the correlation between successive frames of the video signal along the forward direction of the time axis is utilized. However in fact there is generally also strong correlation between successive frames in the opposite direction. The operation of the aforementioned related patent application by the assignee of the present invention utilizes that fact, as illustrated in FIG. 1B. Here, each frame occurring between two successive independent frames is subjected to inter-frame predictive encoding based on these two independent frames, as indicated by the arrows. For example, inter-frame predictive encoding of frame 2 is executed based on the independent frames 1 and 5. This is also true for frames 3 and 4. More precisely, a first prediction signal for frame 2 is derived based on frame 1 as a reference frame, and a second prediction signal for frame 2 is derived based on frame 5 as a reference frame. These two prediction signals are then multiplied by respective weighting factors and combined to obtain a final prediction error signal for frame 2, with greater weight being given to the first prediction signal (since frame 2 will have greater correlation with frame 1 than frame 5). Prediction signals for the other dependent frames are similarly derived, and differences between the prediction signal and a signal of a current frame are derived as prediction errors, then encoded and transmitted. Since in this case correlation between a preceding independent frame and a succeeding independent frame is utilized to obtain prediction signals for each dependent frame, a substantially greater degree of accuracy of prediction is attained than is possible with prior art methods in which only inter-frame correlation along the forward direction of the time axis is utilized.
Prior art methods of adaptive inter-frame predictive encoding can overcome the basic disadvantages described above referring to FIG. 1A, as will be described referring to FIGS. 2A, 2C. In FIGS. 2A and 2C (and also in FIGS. 2B, 2D, described hereinafter) respective numbered rectangles represent successive frames of a video signal. The frames indicated by the # symbol represent independently encoded frames. Of these, frames 1 and 5 are independent frames which occur with a fixed period of four frame intervals, i.e. inter-frame predictive encoding is reset once in every four successive frame intervals in these examples. The white rectangles denote frames whose image contents are mutually comparatively similar. The dark rectangles denote frames whose image contents are mutually comparatively similar, but are considerably different from the contents of the "white rectangle" frames. In FIG. 2A, frame 1 is an independent frame, and frame 2 is a dependent frame whose contents are encoded by inter-frame predictive encoding using frame 1 as a reference frame. There is a significant change (e.g. resulting from a "scene change", or resulting from a new portion of the background of the image being uncovered, for example due to the movement of a person or object within the scene that is being televised) in the video signal contents between frames 2 and 3 of FIG. 2A, so that it becomes impossible to execute inter-frame predictive encoding of frame 3 by using frame 2 as a reference frame. With a prior art method of adaptive inter-frame predictive encoding, this is detected, and results in frame 3 being independently encoded. Frame 3 is then used as a reference frame for inter-frame predictive encoding of frame 4.
Thus, each time that a scene change or other very considerable change occurs in the video signal, which does not coincide with the start of a (periodically occurring) independent frame, independent encoding of an additional frame must be executed instead of inter-frame predictive encoding, thereby resulting in a corresponding increase in the amount of encoded data which must be transmitted or recorded.
In the example of FIG. 2C, with a prior art method of adaptive inter-frame predictive encoding, it is assumed that only one frame (frame 3) is considerably different from the preceding and succeeding frames 1, 2 and 4, 5. This is detected, and frame 3 is then independently encoded instead of being subjected to inter-frame predictive encoding. However since frame 4 is now very different in content from frame 3, it is not possible to apply inter-frame predictive encoding to frame 4, so that it is also necessary to independently encode that frame also. Hence, each time that a single frame occurs which is markedly different from preceding and succeeding frames, it is necessary to independently encode an additional two frames, thereby increasing the amount of encoded data that must be transmitted. Such occurrences of isolated conspicuously different frames such as frame 3 in FIG. 2C can occur, for example, each time that a photographic flash is generated within the images that constitute the video signal.
These factors result in the actual amount of data that must be encoded and transmitted, in actual practice, being much larger than that for the ideal case in which only the periodically occuring independent frames (i.e. frames 1, 5, etc.) are independently encoded, and in which all other frames are transmitted after inter-frame predictive encoding based on these independent frames.
Another basic disadvantage of such a prior art method of adaptive inter-frame predictive encoding occurs when the enclosed output data are to be recorded (e.g. by a video tape recorder) and subsequently played back and decoded to recover the original video signal. Specifically, when reverse playback operation of the recorded encoded data is to be executed, in which playback is executed with data being obtained in the reverse sequence along the time axis with respect to normal playback operation, it would be very difficult to apply such a prior art method, due to the fact that predictive encoding is always based upon a preceding frame. That is to say, prediction values are not contained the playback signal (in the case of reverse playback operation) in the correct sequence for use in decoding the playback data.
The aforementioned related patent application by the assignee of the present invention overcomes this problem of difficulty of use with reverse playback operation, since each dependent frame is predictively encoded based on both a preceding and a succeeding independent frame. However since the described apparatus is not of adaptive type, i.e. inter-frame predictive encoding is always executed for the dependent frames irrespective of whether or not large image content changes occur between successive ones of the dependent frames, it has the disadvantage of a deterioration of the resultant final display image in the event of frequent occurrences of scene changes, uncovering of the background, or other significant changes in the image content.
With a prior art method of adaptive inter-frame predictive encoding as described above, when scene changes occur, or movement of people or objects within the image conveyed by the video signal occurs, whereby new portions of the background of the image are uncovered, then large amounts of additional encoded data are generated, as a result of an increased number of frames being independently encoded rather than subjected to inter-frame predictive encoding. Various methods have been proposed for executing control such as to suppress the amount of such additional data. However this results in loss of image quality.