(1) Field of the Invention
The present invention relates to a video encoding apparatus and particularly to the video encoding apparatus that adjusts the amount of codes by skipping encoding of image data.
(2) Related Art
In video encoding according to the MPEG standards, the amount of codes is controlled by predicting the amount of pictures to be accumulated in a receiver buffer of a decoding apparatus. This is referred to as control of the code amount performed by the Video Buffering Verifier (VBV) model.
FIG. 1A shows changes of the predictive accumulation amount of the receiver buffer. As shown in the figure, pictures are inputted in the receiver buffer at a predetermined bit rate. At the time indicated by DTS (Decoding Time Stamp), one picture is outputted from the receiver buffer for being decoded. If the decoding apparatus has a display unit adopting NTSC (National Television System Committee) format, DTS is set at every {fraction (1/30)} seconds when one frame is allocated to one picture, and when one field is allocated to one picture, DTS is set at every {fraction (1/60)} seconds. Also, if the decoding apparatus has a display unit adopting PAL (Phase Alternation Line) format, DTS is set at every {fraction (1/25)} seconds when one frame is allocated to one picture, and when one field is allocated to one picture, DTS is set at every {fraction (1/50)} seconds.
Normally, when D1 denotes the amount of bits generated for decoding a picture at DTS1, the predictive accumulation amount of the receiver buffer decreases from V1 to V1* (=V1xe2x88x92D1) at DTS1 as shown in FIG. 1A.
The VBV model controls the amount of codes so that the predictive accumulation amount of the receiver buffer does not cause an overflow or an underflow.
In some circumstances, i.e. during scene changes, an underflow of the receiver buffer may occur due to a sequence of pictures which each require a large amount of bits, as shown in FIG. 1B. When D3 denotes the amount of bits generated for decoding a picture at DTS3, the predictive accumulation amount of the receiver buffer is below zero at DTS3 since V3xe2x88x92D3 less than 0. This phenomenon happens because the picture to be decoded at DTS3 is yet to be inputted in the receiver buffer. To avoid such an underflow of the receiver buffer, the quantization scale is increased so as to decrease the generation bit amount.
On the other hand, an overflow of the receiver buffer may occur due to a sequence of pictures which each require a small amount of bits, as shown in FIG. 1C. When the predictive accumulation amount of the receiver buffer prior to decoding at DTS3 is V3 (=V2*+R, where R is the amount of bits inputted into the receiver buffer of the decoding apparatus, in other words, the amount of bits transmitted from the encoding apparatus to the receiver buffer, during each time interval of DTS (the time interval between decoding of two consecutive pictures)), V3 exceeds the storage capacity of the receiver buffer. To avoid such an overflow of the receiver buffer, the quantization scale is decreased so as to increase the generation bit amount.
However, when the generation bit amount is significantly decreased by increasing the quantization scale to prevent an underflow, image quality will be deteriorated. To prevent such deterioration, the following methods are conventionally employed along with the adjustment of the quantization scale.
The first method uses so called skipped macroblocks. According to the MPEG standards, encoding is performed on a macroblock basis, the macroblock being a block of 16xc3x9716 pixels. A skipped macroblock is a macroblock composed of a special code indicating to display an image that is identical to the reference image at the location, and its data amount is extremely small. Accordingly, when an underflow is likely to occur, skipped macroblocks are transmitted instead of performing normal encoding of the original image.
However, the above method is problematic because a macroblock that is not a skipped macroblock shows a part of the original image, and a skipped macroblock shows a part of the reference image, which makes the reconstructed image lack in consistency as a whole.
FIG. 2A shows examples of images displayed in a normal case.
FIG. 2B shows examples of images displayed when a skipped macroblock is used. In FIG. 2B, as a skipped macroblock is not used for the upper macroblock of the second frame, a part of the original image of the second frame is displayed in the upper macroblock. On the other hand, as a skipped macroblock is used for the lower macroblock of the second frame, a part of an image displayed in the lower macroblock of the first frame is also displayed in the lower macroblock of the second frame. This makes the reconstructed image in the second frame lack in consistency as a whole.
The second method is to encode pseudo image data. Pseudo image data is image data whose pixel values are medians of possible pixel values. As one example, when a pixel value is expressed in eight-bit, its median is 128. According to the MPEG standards, the difference between each pixel value of the image data and its median is encoded, and so the amount of data resulting from encoding such image data whose pixel values are medians becomes minimum. Accordingly, when an underflow is likely to occur, on a macroblock basis, image data of the macroblock whose pixel values are medians may be encoded instead of performing normal encoding on the macroblock.
However, the problem with the above method is that a gray-colored image is displayed where the image data whose pixel values are medians has been encoded.
FIG. 2C shows examples of images displayed when a pseudo image is used. In the second frame, a part of the original image is displayed in the upper macroblock since normal encoding has been performed thereon. On the other hand, a gray-colored image is displayed in the lower macroblock where pseudo image data has been encoded. This makes the reconstructed image in the second frame lack in consistency as a whole.
By the way, Japanese Patent No. 2871316 discloses a method for skipping encoding of image data of one field or one frame (hereafter referred to as xe2x80x9cskipping a picturexe2x80x9d) FIG. 3 shows the construction of a video encoding apparatus disclosed in Japanese Patent No. 2871316. Referring to the overall construction of this video encoding apparatus, input moving image data is subjected to an encoding process performed by an orthogonal transformation circuit 6 and other units, and generated pictures are stored in a buffer memory 20. When a transmission rate excess judging circuit 24 judges that a transmission rate of each picture exceeds a predetermined threshold, a SKIP code stored in a SKIP code storage memory 22 is outputted, or when the judgment result of the transmission rate excess judging circuit 24 is negative, the picture stored in the buffer memory 20 is outputted.
According to the method described above, when a picture is to be skipped, not only macroblocks occupying a part of the picture but also all the macroblocks except a first and a last macroblock in each slice layer are replaced with skipped macroblocks. This enables a decoding apparatus to display an image that is identical to a previously decoded image, avoiding displaying such an image that lacks in consistency as described above.
However, the above method has the following problems.
The first problem is that the buffer memory 20 shown in FIG. 3 is necessary for temporarily storing generated pictures for the judgment whether the amount of each generated picture exceeds the transmission rate. In more detail, when the judgment is performed on each macroblock as to whether encoding an image of each macroblock is to be skipped as described above, a buffer with a small capacity can be provided for storing data encoded on a macroblock basis. However, with the present method, each picture as a whole is subjected to the judgment as to whether the picture is to be skipped, which requires a buffer with a large capacity for storing encoded data of all the macroblocks included in the picture.
The second problem is that there is a case where the order of images displayed in an interlaced scan format is reversed when some of the pictures are skipped, in a case where one frame is allocated to one picture (frame structure).
FIG. 4A shows images displayed in a normal case where no picture is skipped. Reference numerals 1t and 1b respectively represent a top field and a bottom field of the first frame. In the frame structure, an encoding process is performed on a frame basis, each frame being composed of a top field and a bottom field. In this case, the decoding apparatus performs a decoding process on one frame at every {fraction (1/30)} seconds. As the interlaced format is applied to its display, one field is displayed at every {fraction (1/60)} seconds. That is to say, fields 1t, 1b, 2t, 2b, 3t, 3b, 4t, and 4b are displayed in the stated order.
FIG. 4B shows images displayed in a case where a picture is skipped.
When skipping occurs to a picture B(3) in the frame 2, an image identical to the picture I(1) that is referred to by the picture B(3) is displayed in the frame 2 where the original image is encoded to be the picture B(3).
Accordingly, the top field of the frame 2 displays the same image as the top field 1t of the frame 1. The bottom field of the frame 2 displays the same image as the bottom field 1b of the frame 1. As they are displayed in the interlaced format, the fields 1t, 1b, 1t, 1b, 3t, 3b, 4t, 4b are displayed in the stated order respectively at {fraction (1/60)} seconds. In this way, the field it is displayed after the field 1b, causing a reversal in the display order of.
In view of the above problems, the object of the present invention is to provide a video encoding apparatus that makes a special buffer for the judgment whether a whole picture is to be skipped unnecessary and that does not cause a reversal in the display order of images even when a whole picture is skipped.
The above object can be achieved by a video encoding apparatus that encodes a sequence of image data while predicting an accumulation amount of a receiver buffer in a decoding apparatus, each image data forming one frame or one field, the video encoding apparatus including: a comparing unit for comparing, prior to encoding of each image data, a predictive accumulation amount with a predetermined threshold, the predictive accumulation amount being an amount of data predicted to be accumulated in the receiver buffer by the time when data obtained by encoding the image data is decoded; and skipping unit for (a) canceling the encoding of the image data and (b) using a proxy code as data that is fetched from the receiver buffer at the decoding time, if the amount of data is below the predetermined threshold, the proxy code indicating to display image data that is identical to previously decoded image data.
With this construction, the judgment as to whether encoding of each image data is to be skipped or not is performed prior to the encoding of the image data, and so a special buffer for temporarily storing encoded data of the image data for the judgment as to whether the predictive accumulation amount of the receiver buffer is below a threshold becomes unnecessary.
Here, the video encoding apparatus may further include a threshold setting unit for setting a threshold for a picture type of a picture that is obtained by encoding the image data, the picture type being one of an I-picture, a P-picture, and a B-picture
With this construction, an appropriate threshold for each picture type can be set in consideration of a different code amount due to a different encoding method of each of an I-picture, a P-picture, and a B-picture.
Here, the skipping unit (a) uses an all-skip B-picture as the proxy code when the picture type is a B-picture, the all-skip B-picture being a B-picture in which macroblocks in all slice layers except a first macroblock and a last macroblock in each slice layer are skipped macroblocks, and (b) uses an all-skip P-picture as the proxy code when the picture type is an I-picture or a P-picture, the all-skip P-picture being a P-picture in which macroblocks in all slice layers except a first macroblock and a last macroblock in each slice layer are skipped macroblocks.
With this construction, using skipped macroblocks of MPEG, a proxy code indicating to display image data that is identical to previously decoded image data can be generated.
Here, the threshold is a predictive code amount of the picture.
According to this construction, a predictive code amount of a picture of each of an I-picture, a P-picture, and a B-picture is set as the threshold for the picture. This ensures that an underflow of the receiver buffer is prevented.
Here, the threshold setting unit calculates a variance of pixel values of the image data, and sets a higher threshold for a higher calculated variance.
With this construction, the higher the variance of the pixels contained in the image data, the larger the code amount of the image data in general. Therefore, by setting a higher threshold for a higher variance, preventing an underflow of the receiver buffer is further ensured.
Here, the threshold setting unit sets a higher threshold for a higher activity ACT of an original image, the activity ACT being a sum of activities act of all macroblocks included in the original image, an activity act being expressed by the equation actj=1+VARj where VARj is a minimum value among variances of pixel values of the original image in each of eight blocks that form a macroblock j, the eight blocks being composed of four blocks in a frame DCT mode and four blocks in a field DCT mode, and act is an activity of the macroblock j.
With this construction, the higher the activity of image data, the larger the code amount of the image data in general. Therefore, by setting a higher threshold for a higher activity of the image data, preventing an underflow of the receiver buffer is further ensured.
Here, the threshold setting unit sets a predictive code amount of each of an I-picture and a P-picture as the threshold for each of an I-picture and a P-picture, and sets a value larger than a predictive code amount of a B-picture as the threshold for a B-picture.
With this construction, the threshold for a B-picture is set as a value larger than the predictive code amount of a B-picture. This makes a B-picture more likely be skipped, and accordingly, it becomes less likely that an I-picture or a P-picture that is referred to by other pictures is skipped. This can prevent same images from being displayed consecutively.
Here, when Mxe2x89xa72, M representing an appearance cycle of an I-picture or a P-picture, the threshold setting unit sets: a threshold Ti of an I-picture as Ti=Ei; a threshold Tp of a P-picture as Tp=Ep; a threshold Tb(i) of a B-picture immediately preceding an I-picture in an encoding order as Tb(i)=Eb+(Eixe2x88x92R) when (Eixe2x88x92R)xe2x89xa70, and Tb(i)=Eb when (Eixe2x88x92R) less than 0; and a threshold Tb(p) of a B-picture immediately preceding a P-picture in the encoding order as Tb(p)=Eb+(Epxe2x88x92R) when (Epxe2x88x92R)xe2x89xa70, and Tb(p)=Eb when (Epxe2x88x92R) less than 0, where Ei is the predictive code amount of an I-picture, Ep is the predictive code amount of a P-picture, Eb is the predictive code amount of a B-picture, and R is a transmission bit amount during each decoding time interval.
With this construction, the threshold for a B-picture immediately preceding an I-picture or a P-picture in the encoding order is set based on a predictive code amount of an I-picture or a B-picture, to prevent an I-picture or a P-picture from being skipped.
Here, when Mxe2x89xa73, M representing an appearance cycle of an I-picture or a P-picture, the threshold setting unit sets: a threshold Ti of an I-picture as Ti=Ei; a threshold Tp of a P-picture as Tp=Ep; a threshold Tb2(i) of a B-picture B2(i) immediately preceding an I-picture in an encoding order as Tb2 (i)=Eb+(Eixe2x88x92R) when (Eixe2x88x92R)xe2x89xa70, and Tb2(i)=Eb when (Eixe2x88x92R) less than 0; a threshold Tb1(i) of a B-picture B1(i) immediately preceding a B-picture B2(i) in the encoding order as Tb1(i)=Eb+(Tb2(i)xe2x88x92R) when (Tb2(i)xe2x88x92R)xe2x89xa70, and Tb1(i)=Eb when (Tb2(i)xe2x88x92R) less than 0; a threshold Tb2(p) of a B-picture B2(p) immediately preceding a P-picture in the encoding order as Tb2(p)=Eb+(Epxe2x88x92R) when (Epxe2x88x92R)xe2x89xa70, and Tb2(p)=Eb when (Epxe2x88x92R) less than 0; and a threshold Tb1(p) of a B-picture B1(p) immediately preceding a B-picture B2(p) in the encoding order as Tb1(p)=Eb+(Tb2(p)xe2x88x92R) when (Tb2(p)xe2x88x92R)xe2x89xa70, and Tb1(p)=Eb when (Tb2(p)xe2x88x92R) less than 0, where Ei is the predictive code amount of an I-picture, Ep is the predictive code amount of a P-picture, Eb is the predictive code amount of a B-picture, and R is a transmission bit amount during each decoding time interval.
With this construction, the threshold for a B-picture immediately preceding an I-picture or a P-picture in the encoding order and for a B-picture immediately preceding the B-picture in the encoding order are set based on a predictive code amount of an I-picture or a B-picture, to prevent an I-picture or a P-picture from being skipped.
Here, when Mxe2x89xa73, M representing an appearance cycle of an I-picture or a P-picture, the threshold setting unit sets the threshold of a B-picture B2 immediately preceding an I-picture in an encoding order, higher than the threshold of a B-picture B1 immediately preceding the B-picture B2 in the encoding order.
With this construction, skipping a B-picture to avoid skipping an I-picture or a P-picture is likely to occur immediately before an I-picture or a P-picture. As a result, unnecessary skipping of a B-picture at a point where it is highly uncertain whether an I-picture or a P-picture needs to be skipped, that is to say, at a point where the prediction of the accumulation in the receiver buffer at the decoding time involves a number of predictions, can be avoided.
Here, when Mxe2x89xa73, M representing an appearance cycle of an I-picture or a P-picture, the threshold setting unit sets: a threshold Ti of an I-picture as Ti=Ei; a threshold Tp of a P-picture as Tp=Ep; a threshold Tb2(i) of a B-picture B2(i) immediately preceding an I-picture in an encoding order as Tb2(i)=Eb+(Eixe2x88x92R) when (Eixe2x88x92R)xe2x89xa70, and Tb2(i)=Eb when (Eixe2x88x92R) less than 0; a threshold Tb1(i) of a B-picture B1(i immediately preceding a B-picture B2(i) in the encoding order as Tb1(i)=Dbskip+(Tb2(i)xe2x88x92R)) when Dbskip+(Tb2(i)xe2x88x92R)xe2x89xa7Eb, and Tb1(i)=Eb when Dbskip+(Tb2(i)xe2x88x92R)) less than Eb; a threshold Tb2(p) for a B-picture B2(p) immediately preceding a P-picture in the encoding order as Tb2(p)=Eb+(Epxe2x88x92R) when (Epxe2x88x92R)xe2x89xa70, and Tb2(p)=Eb when (Epxe2x88x92R) less than 0; and a threshold Tb1(p) of a B-picture B1(p) immediately preceding a B-picture B2(p) in the encoding order as Tb1(p)=Dbskip+(Tb2(p)xe2x88x92R) when Dbskip+(Tb2(p)xe2x88x92R)xe2x89xa7Eb, and Tb1(p)=Eb when Dbskip+(Tb2(p)xe2x88x92R) less than Eb where Ei is the predictive code amount of an I-picture, Ep is the predictive code amount of a P-picture, Eb is the predictive code amount of a B-picture, R is a transmission bit amount during each decoding time interval, and Dbskip is a code amount of an all-skip B-picture.
With this construction, a B1-picture is skipped only when a B2-picture is skipped to avoid skipping an I-picture or a P-picture. Therefore, unnecessary skipping of a B1-picture at a point where it is highly uncertain whether an I-picture or a P-picture needs to be skipped, that is to say, at a point where the prediction of the accumulation in the receiver buffer at the decoding time the P-picture involves a number of predictions, can be avoided.
The above object can also be achieved by a video encoding apparatus that encodes a sequence of image data while predicting an accumulation amount of a receiver buffer in a decoding apparatus, each image data forming one frame or one field, the video encoding apparatus including: a threshold setting unit for setting, after encoding each image data, a threshold according to a picture type of the encoded image data in a case where Mxe2x89xa72, M representing an appearance cycle of an I-picture or a P-picture, as the following, a threshold Ti in a case where the image data is encoded to be an I-picture is set as Ti=Di, where Di is a predictive code amount of the I-picture, a threshold Tp in a case where the image data is encoded to be a P-picture is set as Tp=Dp, where Dp is a predictive code amount of the P-picture, a threshold Tb(i) in a case where the image data is encoded to be a B-picture immediately preceding an I-picture in an encoding order is set as Tb(i)=Dp+(Eixe2x88x92R) when (Eixe2x88x92R)xe2x89xa70, and Tb(i)=Dp when (Eixe2x88x92R) less than 0, where Db is a predictive code amount of the B-picture, Ei is a predictive code amount of the I-picture, and R is a transmission bit amount during each decoding time interval, and a threshold Tb(p) in a case where the image data is encoded to be a B-picture immediately preceding a P-picture in the encoding order is set as Tb(p)=Dp+(Epxe2x88x92R) when (Epxe2x88x92R)xe2x89xa70, and Tb(p)=Dp when (Epxe2x88x92R) less than 0, where Db is a predictive code amount of the B-picture, Ep is a predictive code amount of the P-picture, and R is a transmission bit amount during each decoding time interval; a comparing unit for comparing, after encoding the image data, a predictive accumulation amount with the threshold, the predictive accumulation amount being an amount of data predicted to be accumulated in the receiver buffer by the time when data obtained by encoding the image data is decoded; and a skipping unit for using a proxy code as data that is fetched from the receiver buffer at the decoding time, if the amount of data is below the threshold, the proxy code indicating to display image data that is identical to previously decoded image data.
With this construction, it becomes less likely that an I-picture or a P-picture is skipped. Also, a threshold is not set as a predictive code amount of a picture but as a value larger than an actual code amount of the picture. This ensures that an underflow of the receiver buffer is prevented.
The above object can further be achieved by a video encoding apparatus that encodes a sequence of image data in a frame structure, each image data forming one frame, including: a comparing unit for comparing, prior to encoding of each image data, one of (a) a predictive accumulation amount of a receiver buffer in a decoding apparatus and (b) an accumulation amount of an output buffer, with a predetermined standard value, the predictive accumulation amount being an amount of data predicted to be accumulated in the receiver buffer; a skipping unit for (a) canceling the encoding of the image data in the frame structure and (b) substituting a proxy code indicating to display two fields that each are identical to one of a top field and a bottom field of previously decoded image data, for data that is obtained by encoding a top field and a bottom field of the image data, if the amount of data is below the predetermined standard value.
Usually, when encoding of image data is skipped, the allocation of pictures is switched to the field structure, and so each field of skipped image data can refer to either a top field and a bottom field. According to the above construction, however, the frame structure is maintained even when the encoding of the image data is skipped, and therefore, a reversal in the display order of images can be prevented.
Here, the skipping unit uses the proxy code indicating to display the two fields that each are identical to the field that is the nearest, in a display order, to each of the top field and the bottom field of the image data which has been canceled to be encoded in the frame structure, the field being selected out of the top field and the bottom field of the previously decoded image data.
With this construction, each field in the image data for which encoding has been skipped uses a nearest field in the display order as a reference field. As a reference field of the top field in the image data for which encoding has been skipped is never displayed after a reference field of the bottom field in the skipped image data, the reversal in the display order of images can be surely prevented.
Here, when the image data which has been canceled to be encoded in the frame structure is encoded to be a B-picture, the skipping unit uses two all-skip B-pictures that each are a B-picture in which macroblocks in all slice layers except a first macroblock and a last macroblock in each slice layer are skipped macroblocks as the proxy code, and when the image data which has been canceled to be encoded in the frame structure is encoded to be an I-picture or a P-picture, the skipping unit uses two all-skip P-pictures that each are a P-picture in which macroblocks in all slice layers except a first macroblock and a last macroblock in each slice layer are skipped macroblocks as the proxy code.
With this construction, using skipped macroblocks of MPEG, a proxy code indicating to display image data that is identical to previously decoded image data can be generated.
Here, (a) each all-skip P-picture used by the skipping unit when the image data which has been canceled to be encoded in the frame structure is encoded to be an I-picture or a P-picture uses a bottom field of a previously displayed I-picture or P-picture as a reference field, (b) each all-skip B-picture used by the skipping unit when the image data which has been canceled to be encoded in the frame structure is encoded to be a B-picture B1 uses a top field of a previously displayed I-picture or P-picture as a reference field, and (c) each all-skip B-picture used by the skipping unit when the image data which has been canceled to be encoded in the frame structure is encoded to be a B-picture B2 uses a top field of a successively displayed I-picture or P-picture as a reference field.
With this construction, an all-skip picture that designates an appropriate reference field is selected depending on a picture type, and so the reversal in the display order can be avoided easily and appropriately.
Here, when the image data which has been canceled to be encoded in the frame structure is encoded to be an I-picture or a P-picture, the skipping unit cancels encoding of image data encoded to be a B-picture immediately following the I-picture or the P-picture in the display order, and substitutes two all-skip B-pictures that each use a bottom field of a previously displayed picture as a reference field, for data obtained by encoding the top field and the bottom field of the image data.
With this construction, when skipping a P-picture or an I-picture occurs, each field in image data of a B-picture that refers to the P-picture or the I-picture refers to same fields that are referred to by the fields in the skipped image data. Accordingly, the reversal in the display order can be prevented.