(1) Field of the Invention
The present invention relates to an image coding apparatus which performs orthogonal transformation, quantization, inverse quantization, inverse orthogonal transformation, and intra-prediction on blocks into which a macroblock has been divided.
(2) Description of the Related Art
H.264 (also called MPEG-4 AVC) has been standardized as a system for realizing nearly twice the coding efficiency of conventional image coding systems such as MPEG-2 and MPEG-4 (refer to ITU-T Recommendation H.264(March 2005); “Advanced video coding for generic audiovisual services”, ITU-T. H.264 is a hybrid system based on orthogonal transformation and motion compensation, and in that respect is similar to conventional systems. However, with H.264, there is a high degree of freedom regarding what coding tools to use when coding each element (blocks, macroblocks, and so on), and high coding efficiency is realized through the collective effects of those coding tools.
FIG. 1 is a block diagram showing the configuration of a conventional image coding apparatus. To make the descriptions easier, only constituent elements related to intra prediction are shown in FIG. 1, and thus a motion prediction unit, a selection unit that selects either intra prediction or motion prediction, a deblocking filter, and the like are omitted from FIG. 1.
This conventional image coding apparatus includes a block division unit 11, a subtraction unit 12, an orthogonal transformation unit (T) 13, a quantization unit (Q) 14, a coding unit 15, an inverse quantization unit (iQ) 16, an inverse orthogonal transformation unit (iT) unit 17, an addition unit 18, a frame memory 19, an intra prediction unit (IPD) 20, and a rate control unit 21.
Considering a moving picture that is made up of continuous pictures (single coding units that include both frame and field), each picture is, as shown in FIGS. 2A to 2C, made up of a single luminance signal (a Y signal 31) and two chrominance signals (a Cr signal 32 and a Cb signal 33), in the case where the pictures are in 4:2:0 format; the image size of the chrominance signals is half that of the luminance signal in both the vertical and horizontal directions.
In addition, each picture is divided into blocks, and coding is performed on a block-by-block basis. These blocks are called “macroblocks.” A macroblock is made up of a single Y signal block 41, shown in FIG. 3A, which is of 16×16 pixels; and a Cr signal block 42 and a Cb signal block 43, shown in FIGS. 3B and C, which are of 8×8 pixels respectively, and which spatially match the Y signal block 41 (refer to ITU-T Recommendation H.264(March 2005); “Advanced video coding for generic audiovisual services”, ITU-T.
Each picture is divided by the block division unit 11 into input macroblocks, and the input macroblocks are inputted into the subtraction unit 12. For each pixel in each position, the subtraction unit 12 subtracts the pixel value in a predicted macroblock generated by the intra prediction unit (IPD) 20 from the pixel value in the inputted macroblock, and outputs the resultant as a differential macroblock. The differential macroblock is inputted into the orthogonal transformation unit (T) 13, which performs orthogonal transformation on the differential macroblock. It should be noted that while the size of the block on which orthogonal transformation is performed is 8×8 pixels in the MPEG system, 4×4 pixels is the basic size used in H.264.
The orthogonal transformation unit (T) 13 first divides the differential macroblock into 24 4×4 pixel blocks (“51-0” to “51-15”, “52-0” to “52-3” and “53-0” to “53-3”), as shown in FIGS. 4A to C, and then performs orthogonal transformation on each pixel block. Note that in the case where the differential macroblock is made up of intra 16×16 pixels, as shall be described later, the orthogonal transformation unit (T) 13 further configures, per signal element, orthogonal blocks (“51-16”, “52-4”, and “53-4”) in which only the DC element of each 4×4 orthogonally-transformed block is gathered, and performs orthogonal transformation on these blocks. Each transform coefficient within the orthogonally-transformed block is inputted into the quantization unit (Q) 14.
The quantization unit (Q) 14 quantizes the transform coefficients within each orthogonally-transformed block in accordance with quantization parameters inputted from the rate control unit 21. The quantized orthogonal transform coefficients are inputted into the coding unit 15 and coded. With H.264, the coding unit 15 codes the quantized orthogonal transform coefficients through variable-length coding; for example, through Context-based Adaptive Variable-Length Coding (CAVLC) or Context-based Adaptive Binary Arithmetic Coding (CABAC).
The coding unit 15 codes the quantized orthogonal transform coefficients in the above manner, codes macroblock type information and a prediction mode, which shall be mentioned later, and outputs the resultant as a stream.
The quantized orthogonal transform coefficients are supplied to the coding unit 15, and are also inputted into the inverse quantization unit (iQ) 16. The inverse quantization unit (iQ) 16 performs inverse quantization on the quantized orthogonal transform coefficients in accordance with quantization parameters inputted from the rate control unit 21. An orthogonally-transformed block is thereby reconstructed. The reconstructed orthogonally-transformed block is inverse orthogonally-transformed by the inverse orthogonal transform unit (iT) 17, and a differential macroblock is thereby reconstructed. The reconstructed differential macroblock is inputted, along with the predicted macroblock generated by the intra prediction unit (IPD) 20, into the addition unit 18.
For each pixel in each position, the addition unit 18 adds the pixel value in the reconstructed differential macroblock with the pixel value in the predicted macroblock, thereby generating a reproduction macroblock. As this reproduction macroblock is used in intra prediction, it is stored in the frame memory 19.
Next, a prediction method and prediction modes used when the intra prediction unit (IPD) 20 generates a predicted macroblock shall be described.
Intra prediction is a method for predicting pixel values within a macroblock using coded pixels within a frame. With the H.264 coding system, two types of block sizes are prepared as basic units for prediction. These types are macroblock types called “intra 4×4 prediction” and “intra 16×16 prediction.”
Furthermore, there are 9 types of prediction modes for intra 4×4 prediction macroblock types, and 4 types of prediction modes for intra 16×16 macroblock types, and the prediction modes can be selected on a macroblock-by-macroblock basis (for example, in intra 4×4 prediction, per every 4×4 pixel macroblock).
FIG. 5A is a diagram showing an arrangement of pixels to be predicted (16 pixels, or “a” to “p”) and pixels (reconstructed adjacent pixels, of which there are 12 pixels, or “A” to “L”) used in prediction (decoded after coding and reproduced), in the intra 4×4 prediction type. Here, the pixels to be predicted (“a” to “p”) are pixels within the macroblock to be coded that has been outputted by the block division unit 11; the reconstructed adjacent pixels (“A” to “L”) are pixels of a block or macroblock reproduced after being decoded, and are read out from the memory 19.
FIG. 5B is a diagram showing prediction directions in intra 4×4 prediction. The pixel values of pixels to be predicted are calculated using pixel values of the reconstructed adjacent pixels, in accordance with a prediction direction, using a standardized arithmetic expression (refer to ITU-T Recommendation H.264(March 2005); “Advanced video coding for generic audiovisual services”, ITU-T. Prediction directions are identified by mode numbers (mode 0 to mode 8). FIGS. 5C to 5K each show a mode number and a corresponding prediction direction. With a block 60 in mode 0 shown in FIG. 5C, the prediction direction is vertical; with a block 61 in mode 1 shown in FIG. 5D, the prediction direction is horizontal; and with a block 62 is mode 2 shown in FIG. 5E, the prediction uses an average (DC). In addition, with a block 63 in mode 3 shown in FIG. 5F, the prediction direction is diagonal down-left; with a block 64 in mode 4 shown in FIG. 5G, the prediction direction is diagonal down-right; and with a block 65 in mode 5 shown in FIG. 5H, the prediction direction is vertical-right. Finally, with a block 66 in mode 6 shown in FIG. 5I, the prediction direction is horizontal-down; with a block 67 in mode 7 shown in FIG. 5J, the prediction direction is vertical-left; and with a block 68 in mode 8 shown in FIG. 5K, the prediction direction is horizontal-up.
Intra 4×4 prediction is applied to the luminance signal. For example, if the prediction value of a pixel is “P”, the prediction values P in each mode are as shown below. Here, adjacent pixels “A” to “M” shown in FIGS. 5C to K and used in prediction are reconstructed pixels that have already been reproduced after being decoded. However, the value of the pixel “D” may be substituted for pixels “E” to “H” temporarily in the case where pixels “E” to “H” have not yet been reconstructed or belong to a different slice or different frame from the 4×4 block.
In mode 0 (vertical), as shown by the block 60 in FIG. 5C, it is possible to predict the values of each pixel within the block 60 when reference pixels “A”, “B”, “C”, and “D” are present; each prediction value P is calculated as follows:
a, e, i, m:P = Ab, f, j, n:P = Bc, g, k, o:P = Cd, h, l, p:P = D
In mode 1 (horizontal), as shown by the block 61 in FIG. 5D, it is possible to predict the values of each pixel within the block 61 when reference pixels “I”, “J”, “K”, and “L” are present; each prediction value P is calculated as follows:
a, b, c, d:P = Ie, f, g, h:P = Ji, j, k, l:P = Km, n, o, p:P = L
In mode 2 (DC), as shown by the block 62 in FIG. 5E, the prediction value P for each pixel in the block 62 is as follows when reference pixels “A”, “B”, “C”, “D”, “I”, “J”, “K”, and “L” are present:P=(A+B+C+D+I+J+K+L+4)>>3
The prediction value P for each pixel in the block 62 is as follows when only reference pixels “I”, “J”, “K”, and “L” are present:P=(I+J+K+L+2)>>2
In addition, the prediction value P for each pixel in the block 62 is as follows when only reference pixels “A”, “B”, “C”, and “D” are present:P=(A+B+D+C+2)>>2
Furthermore, the prediction value P for each pixel in the block 62 is as follows when none of reference pixels “A”, “B”, “C”, “D”, “I”, “J”, “K”, and “L” are present:P=128
In mode 3 (diagonal down-left), as shown by the block 63 in FIG. 5F, the prediction value P for each pixel in the block 63 is as follows when reference pixels “A”, “B”, “C”, “D”, “E”, “F”, “G”, and “H” are present:
a:P = (A + 2B + C + 2)>>2b, e:P = (B + 2C + D + 2)>>2c, f, i:P = (C + 2D + E + 2)>>2d, g, j, m:P = (D + 2E + F + 2)>>2h, k, n:P = (E + 2F + G + 2)>>2l, o:P = (F + 2G + H + 2)>>2p:P = (G + 3H + 2)>>2
In mode 4 (diagonal down-right), as shown by the block 64 in FIG. 5G, the prediction value P for each pixel in the block 64 is as follows when reference pixels “A”, “B”, “C”, “D”, “I”, “J”, “K”, “L”, and “M” are present:
a, f, k, p:P = (A + 2M + I + 2)>>2b, g, l:P = (M + 2A + B + 2)>>2c, h:P = (A + 2B + C + 2)>>2d:P = (B + 2C + D + 2)>>2e, j, o:P = (M + 2I + J + 2)>>2i, n:P = (I + 2J + K + 2)>>2m:P = (J + 2K + L + 2)>>2
In mode 5 (vertical-right), as shown by the block 65 in FIG. 5H, the prediction value P for each pixel in the block 65 is as follows when reference pixels “A”, “B”, “C”, “D”, “I”, “J”, “K”, “L”, and “M” are present:
a, j:P = (M + A + 1)>>1b, k:P = (A + B + 1)>>1c, l:P = (B + C + 1)>>1d:P = (C + D + 1)>>1e, n:P = (I + 2M + A + 2)>>2f, o:P = (M + 2A + B + 2)>>2g, p:P = (A + 2B + C + 2)>>2h:P = (B + 2C + D + 2)>>2i:P = (J + 2I + M + 2)>>2m:P = (K + 2J + I + 2)>>2
In mode 6 (horizontal-down), as shown by the block 66 in FIG. 5I, the prediction value P for each pixel in the block 66 is as follows when reference pixels “A”, “B”, “C”, “D”, “I”, “J”, “K”, “L”, and “M” are present:
a, g:P = (M + I + 1)>>1e, k:P = (I + J + 1)>>1i, o:P = (J + K + 1)>>1m:P = (K + L + 1)>>1f, l:P = (M + 2I + J + 2)>>2j, p:P = (I + 2J + K + 2)>>2n:P = (J + 2K + L + 2)>>2b, h:P = (I + 2M + A + 2)>>2c:P = (B + 2A + M + 2)>>2d:P = (C + 2B + A + 2)>>2
In mode 7 (vertical-left), as shown by the block 67 in FIG. 5J, the prediction value P for each pixel in the block 67 is as follows when reference pixels “A”, “B”, “C”, “D”, “E”, “F”, “G”, and “H” are present:
a:P = (A + B + 1)>>1b, i:P = (B + C + 1)>>1c, j:P = (C + D + 1)>>1d, k:P = (D + E + 1)>>1l:P = (E + F + 1)>>1e:P = (A + 2B + C + 2)>>2f, m:P = (B + 2C + D + 2)>>2g, n:P = (C + 2D + E + 2)>>2h, o:P = (D + 2E + F + 2)>>2p:P = (E + 2F + G + 2)>>2
In mode 8 (horizontal-up), as shown by the block 68 in FIG. 5K, the prediction value P for each pixel in the block 68 is as follows when reference pixels “I”, “J”, “K”, and “L” are present:
a:P = (I + J + 1)>>1e, c:P = (J + K + 1)>>1i, g:P = (K + L + 1)>>1b:P = (I + 2J + K + 2)>>2f, d:P = (J + 2K + L + 2)>>2j, h:P = (K + 3L + 2)>>2k, l, m, n, o, p:P = L
In addition, regarding the luminance signal, 4 prediction modes (mode 0 (vertical) (A); mode 1 (horizontal) (B); mode 2 (DC average) (C); and mode 3 (plane) (D)) are defined for intra 16×16 prediction in the H.264 standard, and are shown in FIGS. 6A through D (refer to ITU-T Recommendation H.264(March 2005); “Advanced video coding for generic audiovisual services”, ITU-T. Hence, there is a total of 13 prediction modes, including the intra 4×4 prediction modes mentioned above, from which the optimal prediction mode can be selected and used in coding.
Regarding the chrominance signals, 4 prediction modes (prediction modes using the same prediction directions as in intra 16×16 prediction for the luminance signal; however, mode 0 is DC, mode 1 is horizontal, mode 2 is vertical, and mode 3 is plane) are defined for an 8×8 pixel block, and it is possible to code the chrominance signals independently from the luminance signal.
In intra prediction, intra 8×8 prediction is added for the luminance signal as a Fidelity Range Extension. Intra 8×8 prediction has been introduced in combination with the addition of encoding tools for orthogonal transformation of 8×8 pixels, with the goal of improving the coding efficiency of high-definition moving pictures. With intra 8×8 prediction, macroblocks are divided into 4 blocks, each block is smoothed with a 3-tap low-pass filter, and prediction is carried out using one of the 9 modes, in the same manner as in intra 4×4 prediction (refer to ITU-T Recommendation H.264(March 2005); “Advanced video coding for generic audiovisual services”, ITU-T.
Note that for each predicted block predicted in each mode of each prediction type, the position and size of that block is compared to the position and size of the corresponding target block outputted by the block division unit 11, and an evaluation value is calculated for each predicted block based on an evaluation function that, for example, sums the absolute value of the difference between the two blocks. Based on each calculated evaluation value, a predicted block of the best prediction mode, which is the prediction mode estimated to have the lowest coding amount, is selected, and that predicted block is outputted to the subtraction unit 12 and the adding unit 18.
In addition, the intra prediction unit (IPD) 20 outputs information relating to the mode number of the selected prediction mode to the coding unit 15.
In H.264 coding, each 4×4 pixel block included in a macroblock is, by default, coded in the zigzag raster scan order indicated by the numbers in the blocks in FIG. 7. With intra prediction, it is necessary to code and decode the images in the surrounding blocks in advance in order to predict a certain block. For example, to carry out intra prediction on the number 6 block in FIG. 7 through all intra prediction modes (9 modes in intra 4×4 prediction), reference pixels of the decoded image in the number 3 (left), number 1 (lower left), number 4 (upper), and number 5 (lower right) blocks are necessary. In other words, in order to predict the abovementioned certain block, a series of processes, or intra prediction (IPD), orthogonal transformation (T), quantization (Q), inverse quantization (iQ), and inverse orthogonal transformation (iT), must first end for the surrounding blocks.
However, if coding is performed in the default zigzag raster scan order shown in FIG. 7, intra prediction cannot begin until decoding of all the shaded blocks shown in FIG. 8A, or blocks 1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, and 15, has finished. Note that for blocks 3, 11, 7, 13, and 15, the upper-right block cannot, by nature, be referred to (coding and decoding of the upper-right block is later timewise); therefore, the values of pixels furthest to the right in the upper block may be used as the reference pixels for the upper-right block.
Accordingly, as shown by the processing timeline for the predicted blocks in FIG. 8B, downtime arises before starting the IPD (the intra prediction processing series) block processing and the TQiQiT (orthogonal transformation (T), quantization (Q), inverse quantization (iQ), and inverse orthogonal transformation (iT) processing series) block processing, respectively. This downtime becomes an interference when parallelizing (pipelining) the IPD and TQiQiT processing series, and becomes a problem when attempting to speed up coding in H.264.
As a response to these issues, the following technology has been disclosed in Patent Reference 1 (Japanese Laid-Open Patent Application No. 2004-140473): blocks are not sequentially processed in the default zigzag raster scan order shown in FIG. 7; rather, the blocks positioned to the left of and above the predicted block used in prediction are processed two or more places previous in order to the predicted block, which makes pipelining possible.
FIG. 9A can be given as an example of the processing order of predicted blocks in the technology denoted in Patent Reference 1. In FIG. 10, when a macroblock composed of 16 pixels×16 lines is divided into 16 blocks composed of 4 pixels×4 lines and the blocks are processed, assuming the position (address) of each block is defined as (X, Y), where X, Y=0, 1, 2, 3, FIG. 9A shows that the blocks should be processed in the following order: (0, 0), (1, 0), (0, 1), (2, 0), (1, 1), (3, 0), (2, 1), (0, 2), (3, 1), (1, 2), (0, 3), (2, 2), (1, 3), (3, 2), (2, 3), (3, 3). However, even if processing is performed in the order shown in FIG. 9A, it can be seen in FIG. 9B that downtime arises when starting intra prediction for six of the blocks, or blocks 1, 2, 6, 10, 14, and 15. Moreover, even with a different processing order, if the rules concerning processing order denoted in Patent Reference 1 are followed, at least six blocks interfere with the parallelization (pipelining) when performing intra prediction with reference to surrounding blocks using all intra prediction modes.
Accordingly, considering the prediction mode, shown in FIG. 5A, that uses the reference pixels “E” to “H” positioned above the upper-right block, and which is one of the default prediction modes in intra prediction, interference with the parallelization (pipelining) processing on blocks 2, 6, 10, and 14 is avoided in the technology denoted in Patent Reference 1 by changing prediction modes to a mode that does not use the reference pixels “E” to “H”. In other words, interference with parallelization (pipelining) is avoided by using a prediction mode not specified in the H.264 standard.
Furthermore, a static value (for example, 128) is used as the prediction value for the remaining blocks 1 and 15, as can be seen in FIG. 11A. Or, pixel values of a decoded image in the block located two blocks to the left are used, as can be seen in FIG. 11B. Through this, interference with the parallelization (pipelining) processing is avoided.
FIG. 12 is a block diagram showing the configuration of the image coding apparatus disclosed in Patent Reference 1. A predicted block control unit 192 causes an intra prediction unit (IPD) 20 to perform intra prediction using a prediction mode not specified in the H.264 standard in the order shown in FIG. 18A.
It should be noted that with the technology denoted in Patent Reference 1, the image coding apparatus sequentially processes blocks using a prediction mode not specified in the H.264 standard and in an order different from the default zigzag shaped raster scan order, and outputs data. Therefore, the image decoding apparatus is provided with a means for restoring the data of each block to the default zigzag raster scan order and a means for decoding the data of the blocks predicted through the prediction mode not specified in the H.264 standard.
The technology denoted in Patent Reference 1 pipelines the intra prediction and speeds up processing in intra prediction in the H.264 coding system by processing the blocks located above and to the left of a predicted block used in intra prediction two or more places previous to the predicted block, rather than sequentially processing the predicted blocks in the default order. Furthermore, a prediction mode for intra prediction that does not use the reference pixels of a block to the upper-right of the predicted block is provided. For blocks that still cannot be predicted, two methods are suggested: giving the reference pixels a static value, or copying referable pixels located two blocks away and using the copied pixels in prediction.
The image coding apparatus denoted in Patent Reference 1 codes blocks by processing blocks in an order different from the default order of the H.264 standard, using an intra prediction prediction mode different from those specified in the standard, and inserting reference values not allowed in the standard. Therefore, a means for decoding coded data that is different from that specified in the H.264 standard is also required in the image coding apparatus. Accordingly, an image decoding apparatus compliant with the H.264 standard cannot decode data outputted from the image coding apparatus of Patent Reference 1.