1. Technical Field
The present invention relates to an image encoding apparatus, an image encoding method, and an image encoding program using image encoding technology. In particular, the present invention relates to an image encoding apparatus, an image encoding method, and an image encoding program capable of selecting an entropy encoding scheme.
2. Background Art
An image encoding apparatus encodes external input image data in accordance with a predetermined image encoding scheme to generate a bit stream. H.264/AVC is known as one of image encoding schemes used for such encoding process (see Non-patent document 1). This scheme conforms to MPEG (Moving Picture Experts Group)-4 Written Standards Part 10, and Joint model scheme is known as its encoding reference model. An image encoding apparatus based on the Joint model scheme, which is related to the present invention, is called “related art image encoding apparatus” hereinafter.
FIG. 9 shows a structure of a related art image encoding apparatus. The related art image encoding apparatus 100 includes an image frame buffer 102 to successively store image frames which constitute a target image data 101 to be compressed. An image data 103 is divided into and outputted as macro blocks having a predetermined image area size from the image frame buffer 102, and inputted to a macro block encoder 104 in which the image data is encoded in units of macro blocks. A code amount controller 105 and a decoded picture buffer 106 are connected to the macro block encoder 104. The macro block encoder 104 outputs an encoded bit stream 107. The definitions of the macro block and picture is explained later.
The macro block encoder 104 of such image encoding apparatus 100 includes a macro block buffer 111 to receive the image data 103, a predicting device 112 connected to the macro block buffer 111, a calculating device 113 to subtract the output from the predicting device 112 from the output of the macro block buffer 111, a conversion and quantization device 114 to convert and quantize the calculation result from the calculating device 113 under the control of the code amount controller 105, an entropy encoder 115 and inverse-conversion and inverse-quantization device 116 arranged on the output side of the conversion and quantization device 114, and an adder 117 arranged on the output side of the inverse-conversion and inverse-quantization device 116.
Assume that the image data 101 inputted to the image encoding apparatus 100 has a QCIF (Quarter Common Intermediate Format). The QCIF is one of image signal formats defined in the ITU (International Telecommunication Union).
FIG. 10 shows an image frame in the QCIF image format. The image flame in the QCIF is composed of macro blocks of 176 blocks in wide by 144 blocks in high. One image frame is composed of one frame picture in the progressive scanning. Furthermore, one image frame is composed of two field pictures in the interlace scanning. They are called simply “picture(s)” in the following explanation.
Each macro block, which is the unit constituting the picture, is composed of brightness pixels in 16×16 pixels, and color-difference pixels of Cr (color-difference signals) and Cb (color-difference signals), each in 8×8 pixels. FIG. 10 shows a brightness position (x) and color-difference position (o) of an 8×8 pixel block on a pixel-by-pixel basis when the macro block is divided into 16 parts.
The macro block encoder 104 shown in FIG. 9 encodes the image data 103 in units of macro blocks. In this case, the encoding is successively performed with raster scanning diagonally from the upper left of the picture to the lower right in similar manner to the raster scan for a television system.
Firstly, the macro block buffer 111 of the macro block encoder 104 reads a target image data to be encoded in macro blocks, temporally stores it, and supplies it to the conversion and quantization device 114 arranged at the subsequent stage. At this point, the calculating device 113 subtracts a prediction image 122 outputted from the predicting device 112 from the image 121 read from the macro block buffer 111 in macro blocks, and the supplies the prediction error image 123 (i.e., the calculation result) to the conversion and quantization device 114.
Incidentally, there are two types of the prediction error image 123, i.e., a prediction image generated based on inter-frame prediction and a prediction image generated based on intra-frame prediction. The inter-frame prediction is generated using the correlation between separate image frames, i.e., using the image of an image frame which was encoded and reconstructed before and which has a different display time from the current target image frame to be encoded. On the other hand, the intra-frame prediction is generated using the correlation within the image frame, i.e., using the image of an image frame which was encoded and reconstructed before the current target image frame to be encoded and has the same display time as the current target image frame.
A set (slice) of macro blocks which can be encoded using the intra-frame prediction alone is called “I-slice” hereinafter. Furthermore, a slice of macro blocks which can be encoded using both the intra-frame prediction and inter-frame prediction is called “P-slice”. Furthermore, a slice of macro blocks which can be encoded using the inter-frame prediction where not only the image of one image frame but also the images of two image frames can be used simultaneously is called “B-slice”.
Furthermore, a picture which can be encoded using the I-slice is called “I-picture”, and a picture which can be encoded using both the I-slice and P-slice is called “P-picture”. Furthermore, a picture which can be encoded using not only the I-slice and P-slice but also the B-slice is called “B-picture”.
The conversion and quantization device 114 frequency-converts the prediction error image 123 in smaller units than macro blocks. It converts the prediction error image 123 from a spatial domain to a frequency domain. In the AVC (Advanced Video Coding) Standards, a frequency conversion in units of 8×8 blocks or 4×4 blocks is applicable to brightness pixels. A prediction error image converted to a frequency domain is called “conversion coefficient” hereinafter. This conversion coefficient is quantized based on a parameter 125 supplied from the code amount controller 105, and supplied to the entropy encoder 115 as code data 126. This quantized conversion coefficient is called “quantized value” in the specification.
The code data 126 is also supplied to the inverse-conversion and inverse-quantization device 116. The inverse-conversion and inverse-quantization device 116 inverse-quantizes the quantized value supplied from the conversion and quantization device 114, and further inverse-frequency-converts it to the original spatial domain. Then, the adder 117 adds the prediction image 122 supplied from the predicting device 112 to the prediction error image which was restored to the spatial domain to obtain a decoded image 128. This decoded image 128 is stored in the decoded picture buffer 106 for subsequent encoding.
The entropy encoder 115 entropy-encodes the inputted code data 126 and outputs a bit stream 107. The term “entropy-encoding” means compression of data in which a code having different length is assigned depending on the occurring probability of the data. Since the present invention closely relates to the entropy encoder 115, the detail of it will be explained later.
The predicting device 112 supplies a generating parameter of a prediction image to the entropy encoder 115 as code data 129. The generating parameter may includes, for example, a prediction mode indicating the type of prediction such as inter-frame prediction and intra-frame prediction, a index of a decoded frame used in inter-frame prediction, a motion vector used in inter-frame prediction, and intra-frame prediction direction used in intra-frame prediction.
As explained above, the decoded picture buffer 106 stores the decoded image 128 supplied from the inverse-conversion and inverse-quantization device 116. Then, it manages decoded image pictures reconstructed from the decoded image 128 (which is simply called “decoded pictures” hereinafter).
The code amount controller 105 monitors a bit stream 131 outputted from the entropy encoder 115 to encode a picture with a desired bit number. Then, if the bit number of the bit stream 131 is larger than the desired bit number, it outputs a parameter indicating the increase of a quantizing step size as a quantizing parameter 125. On the other hand, if the bit number of the bit stream 131 is smaller than the desired bit number, it outputs a parameter indicating the decrease of a quantizing step size as the quantizing parameter 125.
The entropy encoder 115 also monitors a symbol number (bin number) which is inputted to an arithmetic encoder (which is explained later) in the case where CABAC (Context-based Adaptive Binary Arithmetic Coding) is used as entropy encoding with a entropy encoding selecting signal 132 (the detail of the CABAC is explained later). Then, the quantizing parameter will be adjusted such that the ratio between a bit number and a bin number satisfies the ratio specified in the above-mentioned AVC Standards.
FIG. 11 shows the specific structure of this entropy encoder. The entropy encoder 115 includes a first selector 141 to receive the image code data 126 outputted from the conversion and quantization device 114 shown in FIG. 9, a CABAC device 142 having an input side connected to one of the output sides of the first selector 141, a VLC (Variable Length Coding) device 143 having an input side connected to the other output side of the first selector 141, and a second selector 144 to selectively receive one of the outputs from the CABAC device 142 and VLC device 143 which are used as the devices for these two types of coding.
An entropy encoding mode selecting signal 132 is provided to the entropy encoder 115 for the switching of the first selector 141 and second selector 144. The entropy encoding mode selecting signal 132 is a signal to select one of the CABAC device 142 and VLC device 143. In this manner, in the AVC (Advanced Video Coding) Standards, entropy encoding is performed on the code data 126 of macro blocks on a picture-by-picture basis by selecting one of the coding by CABAC device 142 or the coding by VLC device 143.
In the entropy encoder 115 shown in FIG. 11, the CABAC device 142 receives a code data 1261 outputted from the first selector 141, and outputs a bit stream 1071 to the second selector 144. The CABAC device 142 also outputs a bin number data 145 representing a bin number. The specific structure of the CABAC device 142 is explained later.
The VLC device 143 receives a code data 1262 outputted from the first selector 141, and outputs a bit stream 1072 to the second selector 144. The specific structure of the VLC device 143 is explained later. The second selector 144 also outputs a bit number data 146 representing a bit number as well as the bit stream 107.
Incidentally, the CABAC device 142 achieves higher encoding efficiency than the VLC device 143. However, the CABAC device 142 requires larger processing effort than the VLC device 143. Therefore, in general, the CABAC device 142 is used for a higher profile (e.g., High profile or Main profile) which supports complicated process. Meanwhile, the VLC device 143 is used for a lower profile (e.g., Base-line profile) which does not support complicated process. However, since code data in higher level layers than a macro block layer has a relatively smaller code data amount which occupies a bit stream, the VLC device 143 is used for code data in such layers to prioritize comparability among each profile.
FIG. 12 shows the specific structure of a CABAC device. The CABAC device 142 includes a binarization device 151 to receive a code data 1261 through the first selector 141 shown in FIG. 11 and converting it to binary, and a switch 153 to switch the output of the binary outputted from the binarization device 151. A bin 155 outputted from the switch 153 as a binary symbol is supplied to an arithmetic encoder 156 and a context calculator 157.
The binarization device 151 is adapted to convert inputted code data 1261 to a binary string in accordance with procedure specified in the AVC Standards and output it as a binary string data 152. The arithmetic encoder 156 encodes the binary string of the bin 155 which is successively supplied from the switch 153 to binary arithmetic code by using a dominant symbol 158 and a state number 159 supplied from the context calculator 157. Furthermore, it successively supplies an updated dominant symbol 158 and an updated state number 159 to the context calculator 157. The term “state number 159” means a table number of a table storing a value corresponding to the occurring probability of dominant symbol specified in the AVC Standards.
The context calculator 157 supplies stored dominant symbol 158 and state number 159 corresponding to the bin 155 which is successively supplied from the switch 153 as a symbol. Furthermore, it also stores the dominant symbol 158 and state number 159 which are updated by the binary arithmetic encoding at the arithmetic encoder 156.
The switch 153 outputs a bin number data 145 to the outside of the CABAC device 142. Furthermore, the arithmetic encoder 156 outputs a bit stream 1071 to the outside of the CABAC device 142. As shown in FIG. 11, the bit stream 1071 is supplied to the second selector 144 in parallel to a bit stream 1072 outputted from the VLC device 143.
FIG. 13 shows the specific structure of a VLC device. The VLC device 143 includes a variable length encoder 161 to receive a code data 1262 through the first selector 141 shown in FIG. 11, and a table selector 162.
The variable length encoder 161 encodes the code data 1262 to variable length code in accordance with a table specified by the table selector 162, and outputs a bit stream 1072. The table selector 162 contains variable length encoding tables (not shown) corresponding to the types of code data 1262 such as a prediction mode, a quantized value, and the like. Then, it supplies a table 163 selected from variable length encoding tables to the variable length encoder 161.
In accordance with the image encoding apparatus 100 described above which is related to the present invention, when the VLC device 143 (FIG. 11) is used for entropy encoding, the completion time of the picture encoding by this image encoding apparatus 100 is determined by the amount of inputted code data.
On the other hand, when the CABAC device 142 shown in FIG. 12 is used as the block for entropy encoding, the completion time of the picture encoding by this image encoding apparatus 100 is determined by the number of the bin 155 inputted to the arithmetic encoder 156. Incidentally, in the case where a block other than the CABAC device 142 is used, the process will be always completed within finite length of time since the amount of input data is finite.
In the H.264/AVC Standards, the following two items are used as restrictions to limit the processing amount of entropy encoding per picture on the decoding side. (a) Encoding must be controlled such that the bit number of a picture becomes equal to or less than a value specified in the H.264/AVC Standards. (b) Encoding must be controlled such that the ratio between the bit number and bin number of a picture becomes equal to or less than a value specified in the H.264/AVC Standards.
Since the VLC device 143 shown in FIG. 13 allows process in units of the code data 1262, its process is simpler than that of the CABAC device 142 shown in FIG. 12. Therefore, assuming that the process is always completed within a certain time period, encoding of the image encoding apparatus 100 needs to be controlled such that the bin number of a picture does not exceeds the maximum bin number for which the arithmetic encoder 156 can process during the processing time.
A following method is one of exemplary methods for such encoding controlling. (i) For a picture having a large bin number, there is a method of encoding using PCM (Pulse-Code Modulation) mode in which a input image in macro blocks is directly outputted as a bit stream (Patent document 1 and Non-patent document 2). (ii) There is another method in which CABAC device 142 and a VLC device 143 are operated in parallel for the code data of a picture. Then, if the process of the CABAC device 142 is completed within a predetermined time, the encoding output from this CABAC device 142 is selected for the encoding. If not, the encoding output from this VLC device 143 is selected for the encoding (Non-patent document 3).    [Patent document 1] Japanese unexamined application publication No. 2004-135251 (paragraphs 121-124, and FIG. 1).    [Non-patent document 1] ISO/IEC 14496-10 Advanced Video Coding.    [Non-patent document 2] “PCM ENCODING METHOD CONFORMING TO H.264 MB UPPER LIMIT BIT USING LOCAL DECODE IMAGE” by Chono et al, PCSJ (Picture Coding Symposium of Japan), pp. 5-17, November 2006.    [Non-patent document 3] “ENTROPY ENCODING TECHNIQUE IN H.264/AVC FOR HIGH-RESOLUTION” by Okamoto et al, The Institute of Electronics, Information and Communication Engineers General Conference, D-11-2, March 2007.