Generally, the amount of moving image data is very large. Therefore, when an apparatus for handling moving image data transmits the moving image data to other apparatuses or stores the moving image data in a storage device, the apparatus compresses the amount of data by encoding the moving image data. As typical moving image encoding standards, Moving Picture Experts Group phase 2 (MPEG-2), MPEG-4, and H.264 MPEG-4 Advanced Video Coding (MPEG-4 AVC/H.264), set forth by International Standardization Organization/International Electrotechnical Commission (ISO/IEC), are widely used. Such encoding standards employ an inter-coding method for encoding an encoding target picture, using the encoding target picture and information of pictures before and after the target picture, and an intra-coding method for carrying out the encoding using only information which the encoding target picture has.
The inter-coding method searches for parts of high correlation between the encoding target picture and a reference picture. Then, the inter-coding method encodes a motion vector which represents the difference in position between the encoding target picture and the reference picture, and the differences (referred to as “prediction error”) between the values of corresponding pixels in the two pictures. Generally, the correlation between pictures is high in a moving image, and therefore the prediction error is lower than the original pixel values. As a result, the inter-coding method is able to realize high compression rate. However, in the inter-coding method, the error that is produced when encoding a certain picture spreads to the encoding of pictures that are encoded after the picture. Consequently, when a plurality of pictures are encoded consecutively using the inter-coding method, pictures that are encoded later in the sequence have poorer quality. In addition, when a picture that serves as a reference is not available, such as when encoded moving image data is acquired from the middle, the moving image decoding apparatus cannot decode an inter-picture that is inter-coded, based on the reference picture.
On the other hand, an intra-picture that is encoded by the intra-coding method allows the pixel values to be decoded correctly, without referencing the decoding results of other pictures. Consequently, the intra-coding method is used to encode the first picture in moving image data and pictures in the middle in a predetermined cycle. By making a picture in the middle an intra-picture, even when the moving image decoding apparatus starts the decoding process from the middle of encoded moving image data, the moving image decoding apparatus is able to correctly reconstruct each picture after the intra-picture. Note that, when the decoding operation is started from a given picture and the pixel value can be decoded correctly in the picture or in a picture that is a predetermined time later, the picture where the decoding operation is started is referred to as a “refresh picture.” However, the compression rate of the intra-coding method is generally lower than the compression rate of the inter-coding method. Consequently, to maintain the image quality of an intra-picture and the image quality of an inter-picture after decoding approximately equal, generally, the amount of information of the intra-picture is greater than the amount of information of the inter-picture.
When encoded moving image data is handled, by encoding and decoding the moving image data there is a coding delay. The coding delay refers to the time from when a picture is input in the moving image encoding apparatus to when the picture is output from the moving image decoding apparatus. In applications that transmit and reconstruct moving image data on a real time basis such as bi-directional communication applications like remote television conference systems, reducing the coding delay is very important. To allow the user to use such applications without being inconvenienced, the coding delay is preferably approximately 100 milliseconds or less.
To make the coding delay approximately 100 milliseconds or less, it is preferable to make the encoding order of pictures and the time order of the pictures match and also minimize the buffering delay, which is one of the major causes of delay in transmission and reception of encoded moving image data. The buffering delay is set to transmit an encoded moving image, in which the amount of information varies per picture, on a network having approximately the same transmission speed as the average amount of information in the encoded moving image per unit time. The buffering delay is the time to transmit the picture having the maximum amount of information in an encoded moving image stream, and is, for example, several to twenty frames in time. Consequently, in order to minimize the buffering delay, the amount of information is made substantially equal in each picture. As mentioned earlier, to make the image quality of each picture equal, the amount of information of an intra-picture is greater than the amount of information of an inter-picture. Consequently, simply making the amount of information of an intra-picture approximately the same as the amount of information of inter-picture in order to make the amount of information of each picture equal would only result in a significant drop in the image quality of the intra-picture. Consequently, the image quality of the inter-picture which uses the intra-picture as a reference picture also drops significantly, and, as a result, the image quality drops significantly over the entire moving image data.
As a method of making the amount of information substantially equal in each picture without damaging image quality much, an intra-slice scheme is proposed (see, for example, Japanese Laid-Open Patent Publication No. 2003-179938, Japanese Laid-Open Patent Publication No. 6-113286, Japanese Laid-Open Patent Publication No. 2005-260936, International Publication WO09/037,726 and Japanese Examined Patent Publication No. 6-101841). In the intra-slice scheme, pictures other than the first picture in the encoding target moving image data are not made intra-pictures, and the macroblocks set to be intra-coded are inserted in each picture so as to circulate in the pictures in a predetermined cycle. The macroblocks set to be intra-coded are referred to as “intra-slice.” The technique correctly decodes the part of the pictures in order and therefore is also referred to as “step-by-step refresh.”
For example, when the direction of circulation moves from top to bottom in pictures, the picture in which the intra-slice is at the topmost is the refresh picture, and the pixel values are reconstructed correctly in order from the upper end of the pictures. In the picture where the intra-slice comes at the lowermost, the pixel values of the entire picture are correctly reconstructed. The interval of refresh pictures, i.e., the refresh cycle, is the same as the circulation cycle of intra-slices.
Note that, in practice, macroblocks in intra-slices do not necessarily have to be intra-coded. The macroblocks in the intra-slices have only to be encoded such that, when decoding is started from the refresh picture, the pixel values of the subsequent pictures are guaranteed to be correctly reconstructed. For example, the macroblocks in the intra-slices may be inter-coded using a motion vector which references the region in a picture where the pixel values are guaranteed to be reconstructed correctly (hereinafter the region will be referred to as “clean region”).
However, when moving image data is encoded using the intra-slice scheme, in pictures before the picture that is refreshed by an intra-slice, there is a region where the pixel values are not guaranteed to be reconstructed correctly. Note that the region where the pixel values are not guaranteed to be correctly reconstructed will be hereinafter referred to as “non-clean region.” When moving image data is encoded using the intra-slice scheme, to guarantee that the pixel values in the clean region are reconstructed correctly, there is a constraint that macroblocks in the clean region are unable to reference information of pixels in the non-clean region. For example, when an encoding target macroblock in the clean region is inter-coded, an entire block on the reference picture which the encoding target macroblock references has to be also included in the clean region.
In MPEG-4 AVC/H.264, in order to encode a motion vector, an average value (or median) of the motion vector of the upper or left neighboring macroblock of the encoding target block is calculated as a motion vector predicted value PMVMED. Then, the difference between the predicted value PMVMED and the motion vector of the encoding target macroblock is entropy-coded.
Furthermore, a technique for making the motion vector of the macroblock that is located on the reference picture in the same position as the encoding target macroblock be another PMVCOL, and making the one of PMVMED and PMVCOL that produces the smaller error be a predicted value has been proposed (see, for example, J. Jung and G. Laroche, “Competition-Based Scheme for Motion Vector Selection and Coding,” VCEG-AC06, ITU-T SGI6/Q.6 29th Meeting). In this case, a flag indicating which of PMVMED and PMVCOL is selected is included in encoded moving image data.
In the technique, when a moving image decoding apparatus starts the decoding operation from a refresh picture, since there is no reference picture for the refresh picture on the decoding side, PMVCOL is not acquired, and, as a result, it is not possible to correctly decode the motion vectors in the non-clean region of the refresh picture. Note that the clean region of the refresh picture is intra-coded, and therefore incorrect reconstruction of motion vectors does not occur.
The pictures after the refresh picture also reference the motion vectors of the reference pictures (including the refresh picture), and therefore the influence of incorrect motion vector reconstruction spreads. Consequently, when a macroblock in the clean region references PMVMED and PMVCOL as motion vector predicted values, since PMVMED and PMVCOL are not guaranteed to be decoded correctly, motion vectors are unable to be decoded correctly, and, as a result, the correct pixel values are not reconstructed. Consequently, in the intra-slice scheme, the encoding method that uses one of PMVMED and PMVCOL as a motion vector predicted value is not available for use.
In MPEG-4 AVC/H.264, an intra-prediction coding scheme to perform prediction coding of the encoding target macroblock using information of neighboring reference macroblocks that have been encoded earlier, is employed. When intra-prediction coding of the encoding target macroblock included in the clean region is performed, the reference macroblocks also have to be included in the clean region.
Furthermore, in MPEG-4 AVC/H.264, a deblocking filter process to remove the block distortion that is produced in the block boundaries by applying a low-pass filter in the boundaries with neighboring blocks is employed. Consequently, with the intra-slice scheme, all of the pixels which the deblocking filter applied to the pixels in the clean region reference, also have to be included in the clean region.
As described above, in the intra-slice scheme, all of the pixels referenced by the encoding target macroblock in the clean region also have to be included in the clean region. Due to such constraints, when the intra-slice scheme is applied to conventional coding schemes, the coding rate decreases. One reason is the increase in the amount of information due to inclusion of information representing the boundary between the clean region and the non-clean region (hereinafter referred to as “clean region boundary”) in encoded moving image data streams. For example, in MPEG-4 AVC/H.264, between the macroblocks located in the clean region boundary, slice heeders representing the boundary are inserted. By inserting such slice headers, the amount of information to be included in encoded moving image streams increases, and therefore the coding rate decreases.
Furthermore, context adaptive binary arithmetic coding (CABAC) may be used as an entropy coding scheme for performing variable-length encoding of the quantization coefficient and motion vector of each macroblock. In this case, context is initialized in the positions where the slice headers are inserted, and therefore the rate of entropy coding decreases.
Furthermore, when a motion vector is encoded, the motion vector of a neighboring block is used as a predicted value, and the difference between the motion vector and the predicted value is encoded. However, it is not possible to use the motion vector of a block that is determined to belong to another slice by a slice header, as a predicted value. As a result, the rate of motion vector prediction decreases, and the coding rate decreases even lower.
Furthermore, in order to reduce the buffering delay, intra-slices may be inserted to cross the picture along the vertical direction, and the intra-slices may be set to circulate from the left to the right or from the right to the left. By setting intra-slices in this way, macroblocks to be intra-coded are distributed over each macroblock line, so that the amount of information of each macroblock lines becomes approximately uniform. Consequently, it is possible to limit the capacity of the buffer included in the moving image decoding apparatus corresponding to one macroblock line.
However, when tool kits such as Main Profile and High Profile, widely employed generally in MPEG-4 AVC/H.264, are used, macroblocks are encoded from the top to the bottom of the picture. Consequently, when intra-slices are inserted to cross the picture along the vertical direction, the slice headers have to be inserted per macroblock line, and the number of slice headers to be needed therefore increases. As a result, the coding rate decreases even lower.
Furthermore, to comply with the above constraints, the moving image encoding apparatus does not apply, to the entire picture, the method of encoding the motion vector of the encoding target macroblock using predicted values calculated from the motion vectors of other macroblocks of the reference pictures. Alternatively, the moving image encoding apparatus inserts slice headers for distinguishing the inapplicable macroblock set from other macroblocks, in encoded moving image streams. In either case, since the amount of coding information of motion vector increases, or the amount of information of slice headers to be inserted increases, the coding rate decreases.
Furthermore, when the encoding target picture that is included in the clean region is inter-coded, by the above constraints, the moving image encoding apparatus may not be able to set an optimal reference block that is the most similar to the encoding target picture. For example, when the intra-slice circulation direction differs from the direction of displacement of the image on the picture, part or all of the region on the reference picture that is the most similar to the encoding target picture included in the clean region is included in the non-clean region. In this case, the difference in pixel values between the encoding target picture and the reference block is not necessarily reduced, and therefore the coding rate decreases.
A technique for solving this problem is disclosed in International Publication WO09/037,726. In the technique, when a moving image is decoded and part of the region in a reference picture that is referenced by a motion vector is included in the non-clean region of the reference picture, the moving image decoding apparatus does not reference the pixel values in the non-clean region of the reference picture. Then, instead of the pixel values in the non-clean region of the reference picture, the moving image decoding apparatus uses reference values that are calculated by extrapolation calculation based on the pixel values of one edge of the intra-slices. However, when the region in the reference picture referenced by the motion vector is mostly included in the non-clean region, the differences between the reference values calculated by extrapolation and the corresponding pixel values of the encoding target picture increase, and therefore the coding rate by motion prediction does not improve.