Field of the Invention
The present invention relates to a video encoding method, a video decoding method, a video encoding apparatus, a video decoding apparatus, a video processing system, a video encoding program, and a video decoding program.
Related Background Art
Video signal encoding techniques are used for transmission and storage-regeneration of video signals. The well-known techniques include, for example, the international standard video coding methods such as ITU-T Recommendation H.263 (hereinafter referred to as H.263), ISO/IEC International Standard 14496-2 (MPEG-4 Visual, hereinafter referred to as MPEG-4), and so on. Another known newer encoding method is a video coding method scheduled fear joint international standardization by ITU-T and ISO/IEC; ITU-T Recommendation H.264 and ISO/IEC International Standard 14496-10 (Joint Final Committee Draft of Joint Video Specification, hereinafter referred to as H.26L).
Since a motion video signal consists of a series of images (frames) varying little by little with time, it is common practice in these video coding methods to implement interframe prediction between a frame retrieved as a target for encoding (current frame) and another frame (reference frame) and thereby reduce temporal redundancy in the video signal. In this case, where the interframe prediction is carried out between the current frame and a reference frame less different from the current frame, the redundancy can be reduced more and encoding efficiency can be increased.
For this reason, as shown in FIG. 6, the reference frame for the current frame A1 can be either a temporally previous frame A0 or a temporally subsequent frame A2 with respect to the current frame A1. The prediction with the previous frame is referred to as forward prediction, while the prediction with the subsequent frame as backward prediction. Bidirectional prediction is defined as a prediction in which one is arbitrarily selected out of the two prediction methods, or as a prediction in which both methods are used simultaneously.
In general, with use of such bidirectional prediction, as in the example shown in FIG. 6, a temporally previous frame as a reference frame for forward prediction and a temporally subsequent frame as a reference frame for backward prediction each are preliminarily stored prior to the current frame.
FIGS. 7A and 7B are diagrams showing (A) decoding and (B) output of the frames in the case of the bidirectional prediction shown in FIG. 6. For example, in the decoding of MPEG-4, where the current frame A1 is decoded by bidirectional interframe prediction, frame A0 being one temporally previous frame and frame A2 being one temporally subsequent frame with respect to the current frame A1 are first decoded as frames decoded by intraframe prediction without use of interframe prediction or as frames decoded by forward interframe prediction, prior to decoding of the current frame A1, and they are retained as reference frames. Thereafter, the current frame A1 is decoded by bidirectional prediction using these two frames AD, A2 thus retained (FIG. 7A).
In this case, therefore, the order of decoding times of the temporally subsequent reference frame A2 and the current frame A1 is reverse to the order of output times of their respective decoded images. Each of these frames A0, A1, and A2 is attached with output time information 0, 1, or 2, and thus the temporal sequence of the frames can be known according to this information. For this reason, the decoded images are outputted in the right order (FIG. 7B). In MPEG-4, the output time information is described as absolute values.
Some of the recent video coding methods permit the foregoing interframe prediction to be carried out using multiple reference frames, instead of one reference frame in the forward direction and one reference frame in the backward direction, so as to enable prediction from a frame with a smaller change from the current frame, as shown in FIG. 8. FIG. 8 shows an example using two temporally previous frames B0, B1 and two temporally subsequent frames B3, B4 with respect to the current frame B2, as reference frames the current frame B2.
FIGS. 9A and 9B are diagrams showing (A) decoding and (B) output of the frames in the case of the bidirectional prediction shown in FIG. 8. For example, in the decoding of H.26L, a plurality of reference frames can be retained within a range up to a predetermined upper bound of the number of reference frames and, on the occasion of carrying out interframe prediction, an optimal reference frame is arbitrarily designated out of them. In this case, where the current frame B2 is decoded as a bidirectionally predicted frame, the reference frames are first decoded prior to the decoding of the current frame B2; the reference frames include a plurality of temporally previous frames (e.g., two frames B0, B1) and a plurality of temporally subsequent frames (e.g., two frames B3, B4) with respect to the current frame B2, which are decoded and retained as reference frames. The current frame B2 can be predicted from a frame arbitrarily designated as the one used for prediction out of those frames B0, B1, B3, and B4 (FIG. 9A).
In this case, therefore, the order of decoding times of the temporally subsequent reference frames B3, B4 and the current frame B2 becomes reverse to the order of their respective output times. Each of these frames B0-B4 is attached with output time information or output order information 0-4, and the temporal sequence of the frames can be known according to this information. For this reason, the decoded images are outputted in the right order (FIG. 9B). The output time information is often described as absolute values. The output order is used where frame intervals are constant.
For carrying out the decoding by the backward prediction using temporally subsequent frames as predictive frames, it is necessary to satisfy the condition that the decoding of the temporally subsequent frames is completed prior to the decoding of the current frame so as to be available as predictive frames. In this case, a delay is incurred before the decoded image of the current frame becomes available, as compared with a frame to which the backward prediction is not applied.
This will be specifically described below with reference to FIGS. 10A to 10C. FIGS. 10A to 10C correspond to the example shown in FIGS. 6, 7A, and 7B. First, encoded data of each frame A0-A2 is decoded in an order necessary for execution of interframe prediction, and it is assumed that intervals of the frames are constant time intervals according to a frame rate and that the time necessary for the decoding operation is negligible for each frame A0-A2, regardless of whether the interframe prediction is applied and regardless of the directions of interframe prediction (FIG. 10A). In practice, the decoding intervals of the frames A0-A2 do not have to be constant and can change depending upon such factors as variation in encoding bits of the frames A0-A2 or the like; however, they can be assumed to be constant on average. The time necessary for the decoding operation is not zero, either, but it will raise no significant problem in the description hereinafter if the difference thereof is not so large among the frames A0-A2.
It is supposed herein that a time when a decoded image of frame A0 without delay due to backward prediction and without reversal of the orders of decoding times and output times with respect to any other frame (a frame without delay and without reversal will be referred to hereinafter as a backward-prediction-nonassociated frame) is obtained, is defined as an output time correlated with the decoded image, and the decoded image is outputted at the output time. Supposing the subsequent frame is the backward predicted frame A1, the decoded image thereof will be decoded after the temporally subsequent frame A2, and a delay is thus made before the decoded image is obtained.
For this reason, if the time when the decoded image is obtained for the backward-prediction-nonassociated frame A0 is defined as a reference of output time, the decoded image of the backward predicted frame A1 is not obtained by the output time correlated therewith (FIG. 10B). Namely, an output time interval between the decoded image of the backward-prediction-nonassociated frame A0 and the decoded image of the backward predicted frame A1 becomes longer by the delay time necessary for execution of backward prediction than the original interval, which leads to unnatural video output.
Therefore, in the case where the backward interframe prediction is applied in video coding, as shown in FIG. 10C, it is necessary to preliminarily delay the output time of the decoded image of the backward-prediction-nonassociated frame A0 by the delay time necessary for execution of the backward prediction as well so as to be able to correctly handle the output time interval to the backward predicted frame A1.
Conventionally, the backward interframe prediction was applied to video encoding under the conditions that encoding was carried out at a high bit rate and the fixed frame rate of 30 frames/second equal to that of TV broadcast signals was always used, like TV broadcasting or accumulation thereof, because backward interframe prediction brings about more options for prediction and hence increase of computational complexity so as to make implementation thereof difficult on simple equipment and because the increase of delay time was not desired in real-time communication involving bidirectional interlocution like video conferences.
In this case, for example, as in MPEG-4, where the use of one temporally subsequent frame as a reference frame for backward prediction, the delay time necessitated in execution of the backward prediction is constant. For example, where the frame rate is 30 frames/second as described above, the delay time is a time interval of each frame, i.e., 1/30 second. Accordingly, the time by which the output time of the decoded image of the backward-prediction-nonassociated frame should be delayed, can be equally set to 1/30 second.