Video image data generally includes a large amount of data. Therefore, devices for handling video image data compress the video image data by encoding the video image data, when transmitting the video image data to another device or when storing the video image data in a storage device.
As a representative standard technology for encoding video images, MPEG (Moving Picture Experts Group phase)-2, MPEG-4, or MPEG-4 AVC/H.264 (H.264 MPEG-4 Advanced Video encoding) developed by the ISO/IEC (International Standardization Organization/International Electrotechnical Commission) is widely used.
As standard encoding technologies described above, there is an inter encoding method for encoding a picture by using information of the picture that is the encoding target and information of pictures before and after the encoding target, and an intra encoding method for encoding a picture by using only information of the picture that is the encoding target.
Generally, the encoding amount of pictures or blocks that have been encoded by the inter encoding method is smaller than the encoding amount of pictures or blocks that have been encoded by the intra encoding method. Therefore, according to the selected encoding mode, the encoding amount of pictures becomes disproportionate within the same sequence. Similarly, according to the selected encoding mode, the encoding amount of blocks becomes disproportionate within the same picture.
Therefore, in order to transmit a data stream including encoded video images by a constant transmission rate even if the encoding amount varies over time, the transmission source device is provided with a transmitting buffer for a data stream, and the transmission destination device is provided with a receiving buffer for a data stream.
A delay caused by these buffers (hereinafter, “buffer delay”) is the main factor causing a delay from when each picture is input in the encoding device until each picture is displayed in a decoding device (hereinafter “codec delay”). By reducing the size of the buffer, the buffer delay and the codec delay are reduced. However, as the size of the buffer decreases, the degree in freedom in allocating the encoding amount for each picture decreases. Consequently, the image quality of a reproduced video image is deteriorated. The degree in freedom in allocating the encoding amount means the extent of variation in the encoding amount.
MPEG-2 and MPEG-4 AVC/H.264 respectively specify VBV (Video Buffering Verifier) and CPB (Coded Picture Buffer), which are operations of a receiving buffer in an ideal decoding device.
A video image encoding device controls the encoding amount so that the receiving buffer of an ideal decoding device does not overflow or underflow. An ideal decoding device is specified to perform instantaneous decoding, where the time taken for a decoding process is zero. For example, there is a technology for controlling a video image encoding device relevant to VBV.
The video image encoding device controls the encoding amount to ensure that data of a picture to be decoded is stored in the receiving buffer at the time when the ideal decoding device decodes the picture, so that the receiving buffer of the ideal decoding device does not overflow or underflow.
The receiving buffer underflows when the video image encoding device transmits a stream by a constant transmission rate, and transmission of data used for decoding the picture is not completed until the time when the video image decoding device decodes and displays the pictures, because there is a large encoding amount for each picture. That is to say, the underflow of the receiving buffer means that data used for decoding a picture is not present in the receiving buffer of the decoding device. In this case, it is not possible for the video image decoding device to perform a decoding process, and therefore frame skip occurs.
The video image decoding device performs a decoding process without causing the receiving buffer to underflow, and thus displays a picture after delaying a stream by a predetermined length of time from the receiving time.
As described above, an ideal decoding device is specified so that the decoding process is instantaneously completed by a processing time of zero. Therefore, assuming that the time of inputting an “i” th picture in the video image encoding device is t(i) and the time of decoding the “i” th picture in the ideal decoding device is dt(i), it is possible to display this “i” th picture at the same time as the decoding time, i.e., at dt(i).
For all pictures, the display time period of the picture {t(i+1)−t(i)} and {dt(i+1)−dt(i)} are equal, and therefore the decoding time dt(i) becomes {dt(i)=t(i)+dly}, which is delayed by a fixed time dly from the input time t(i). Accordingly, the video image encoding device has to complete transmitting data used for decoding to the receiving buffer of the video image decoding device until the time dt(i).
FIG. 1 illustrates an example of the transition of the buffer occupancy amount of the receiving buffer according to the conventional technology. In the example of FIG. 1, the horizontal axis indicates the time and the vertical axis indicates the buffer occupancy amount of the receiving buffer. A line 300 indicated by a solid line indicates the buffer occupancy amount at each time point.
In the receiving buffer, the buffer occupancy amount is recovered at a predetermined transmission rate, and data used for decoding a picture at the decoding time of each picture is extracted from the buffer. In the example of FIG. 1, data of an “i” th picture starts to be input to the receiving buffer at a time at(i), and the last data of the “i” th picture is input at a time ft(i). The ideal decoding device completes decoding the “i” th picture at a time dt(i), and it is possible to display the “i” th picture at the time dt(i).
The ideal decoding device performs instantaneous decoding, while an actual video image decoding device takes a predetermined length of time to perform a decoding process. Generally, the decoding process time for one picture is shorter than the display period of a picture; however, the actual video image decoding device takes an amount of time close to the display period of a picture for performing the decoding process.
The data of the “i” th picture is input to the receiving buffer from the time at(i) to the time ft(i). However, the time at which data used for decoding each block arrives between at(i) and ft(i) is not ensured. Therefore, the actual video image decoding device starts the process of decoding the “i” th picture from the time ft(i). Accordingly, assuming that the maximum processing time to be taken for decoding one picture is ct, it is only possible to ensure that the actual video image decoding device completes the decoding process within the time ft(i)+ct.
The video image encoding device ensures that data used for decoding a picture arrives at the receiving buffer until the time dt(i), i.e., it is ensured that ft(i) comes before dt(i). Thus, when ft(i) is at the latest time, ft(i) becomes the same as dt(i).
In this case, the time at which completion of the decoding process is ensured is dt(i)+ct. To display all pictures at equal intervals, the video image decoding device is to delay the display times of the respective pictures by at least a time ct with respect to the ideal decoding device.
In VBV of MPEG-2 and CPB of MPEG-4 AVC/H.264, the difference between the arrival time of each encoded picture in the video image decoding device and the display time of each encoded picture that has been decoded is expressed as (ft(i)−at(i)+ct). That is to say, it is difficult to achieve a codec delay of less than the time ct, where the codec delay extends from when each picture is input to the encoding device to when the picture is output at the decoding device. That is to say, the time ct is usually the processing time for one picture, and therefore it is difficult to achieve a codec delay of less than the processing time for one picture.
In MPEG-4 AVC/H.264 and the arithmetic encoding system of HEVC (High-Efficiency Video encoding) that is undergoing standardization, compressed data of blocks such as a quantization orthogonal transformation coefficient is binarized, and arithmetic encoding is performed for each bin, and the bits are output.
Patent document 1: Japanese Laid-Open Patent Publication No. 2003-179938
Non-patent document 1: JCTVC-G1103, “High-Efficiency Video Coding (HEVC) text specification Working Draft 5”, Joint Collaborative Team on Video Coding of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, December 2011
In the conventional technology, it is difficult to make a codec delay become the processing time for one picture. However, there is the following method for making the codec delay become less than the processing time for one picture. For example, this method is for assigning each block in a picture to one of a plurality of groups, and assigning a decode start time to each group. A group is, for example, one block line. A block line expresses a line of blocks in the horizontal direction of the picture.
If the amount of information generated in each group is made uniform, the difference in the decode start time of continuous groups matches the processing time for each group, and the time ct becomes the processing time of each group. Thus, as a result, it is possible to decrease the codec delay to the processing time for each group.
In MPEG-4 AVC/H.264, the entropy encoding system is applied as the arithmetic encoding system. In the arithmetic encoding system, after binarizing the compressed data of a predetermined block in a picture that is an encoding target, each bin is processed and a stream is generated. In this case, the delay (hereinafter, “entropy delay”) from when entropy encoding is started for the last bin in the target block, to when the encoded bit row is output from the entropy encoding unit as a stream, is generally not zero.
There may be cases where the encoded bit row corresponding to the last bin of the target block is not generated into a stream unless entropy encoding is started for some of the bins of the compressed data of the next block. That is to say, a difference corresponding to the entropy encoding delay is generated between the bit amount transmitted to a transmitting buffer from the start to the end of a process of encoding all blocks in a certain group (generated information amount of pseudo blocks) and the generated information amount of an actual group.
For this reason, even if the encoding device controls the information amount to comply with the decoding time of the group in accordance with the generated information amount of the pseudo group, the time when all bits of the actual group arrive at the receiving buffer is delayed in proportion to the entropy encoding delay.
Accordingly, depending on the extent of the entropy encoding delay, there may be cases where the restriction according to the operation specification of the receiving buffer is not satisfied.
According to the arithmetic encoding system described in MPEG-4 AVC/H.264 and HEVC, compressed data of blocks such as a quantization orthogonal transformation coefficient is binarized, and arithmetic encoding is performed for each bin, and the bits are output. However, the most delayed value from when one bin is input until a corresponding bit is output may be theoretically infinite.
The arithmetic encoding is performed by obtaining a probability interval [0, 1] corresponding to a bin series based on the probability of 0 or 1 for each bin, expressing the interval by bits, and outputting the bits. For example, when the event probability of 0 of each bin is 0.8 for all bins, the probability interval of a bin series (0, 1, 0) is (0.64, 0.75), and the shortest bit expression “11” of this probability interval (after the decimal point) becomes the bit output.
Due to the characteristics of the arithmetic encoding system, when the probability interval crosses 0.5 at a predetermined time point in the bin series, the probability interval is further narrowed down by subsequent bin series, and a state where it is not possible to determine the bit output arises until the probability interval does not cross 0.5 anymore. Occasionally, when the probability interval of the input bin series continuously crosses 0.5, the delay becomes infinite.
In actual situations, when the last block of a picture is encoded, a process of clearing away (outputting) all non-output bits that have been retained (accumulated) in the arithmetic encoder is performed, and therefore the upper limit of delay is the processing time for one picture.
The timing of outputting a bit corresponding to the last bin of the compressed data of the last block in each group is, in the worst case, the time of encoding the last bin in the compressed data of the last block of the picture. Accordingly, the time of starting to decode the first group in the picture coincides with the time of encoding the last bin of the compressed data of the last block in the picture, and therefore the actual codec delay becomes greater than or equal to the processing time for one picture.
As a method of surely and quickly outputting the bits of the last bin of compressed data of the last block in the group, there is a method of inserting a slice header at the boundary between groups, and clearing away (outputting) all non-output bits (undetermined bits) that have been retained in the arithmetic encoder.
However, by inserting a slice header at the boundary between groups, the encoding efficiency is deteriorated, which is particularly undesirable when the bit rate is low.