Image encoding techniques have been applied to many video devices that are familiarly used. Target products to which the techniques are applied may be, for example, DVD for enjoying video contents, such as movies, hard disk recorders in which video contents, such as TV broadcasts, are to be recorded, digital TV encoding systems, DVD cameras, and mobile phones that can carry out image processing. As such, the image encoding techniques are applied very widely.
Non Patent Literature 1 discloses (i) an image encoding device that encodes an image by use of a spatial correlation or a temporal correlation in the image, and (ii) an image decoding device that decodes encoded data that is encoded by the image encoding device. The following explains about techniques of these devices that use the spatial correlation, with reference to FIGS. 65 through 70.
(Arrangement of Image Encoding Device 300)
With reference to FIG. 65, the following describes an image encoding device 300 according to a conventional technique.
FIG. 65 is a block diagram showing essential part of an arrangement of the image encoding device 300 according to the conventional technique. The image encoding device 300 includes a difference computing section 1, an orthogonal transformation section 2, a quantization section 3, an entropy encoding section 304, an inverse quantization section 6, an inverse orthogonal transformation section 7, an addition computing section 8, a memory 9, an intra prediction section 310, and a prediction mode determining section 311 (FIG. 65). The following describes each of the constituent components of the image encoding device 300.
(Difference Computing Section 1)
The difference computing section 1 outputs prediction residual data, which is a difference between a target block to be encoded (M×M pixel block) and a predicted image generated from the intra prediction section 310.
(Orthogonal Transformation Section 2 and Inverse Orthogonal Transformation Section 7)
The orthogonal transformation section 2 carries out orthogonal transformation with respect to the prediction residual data received from the difference computing section 1, and then outputs the resultant. The inverse orthogonal transformation section 7 carries out inverse orthogonal transformation with respect to an orthogonal transform coefficient of the prediction residual data received from the inverse quantization section 6, and outputs the resultant. As a method of the orthogonal transformation and the inverse orthogonal transformation, discrete cosine transform, Hadamard transform, discrete Fourier transform, discrete sine transform, Haar transform, slant transform, or Karhunen-Loeve transform can be used.
(Quantization Section 3 and Inverse Quantization Section 6)
The quantization section 3 carries out quantization with respect to an orthogonal transformation coefficient of the prediction residual data received from the orthogonal transformation section 2, and then outputs the resultant. The inverse quantization section 6 carries out inverse quantization with respect to a quantization coefficient of the prediction residual data received from the quantization section 3, and then outputs the resultant. The quantization and the inverse quantization are carried out by use of scalar quantization or vector quantization.
(Entropy Encoding Section 304)
The entropy encoding section 304 carries out entropy encoding with respect to the prediction residual data and information on an encoding mode, such as a block type, prediction mode information, and a quantization parameter. The “entropy encoding” indicates: (a) variable length coding, such as arithmetic coding, Huffman coding, and Golomb coding; or (b) fixed length coding.
(Addition Computing Section 8)
The addition computing section 8 adds the prediction residual data subjected to the inverse quantization and the inverse orthogonal transformation, to a predicted image generated by the intra prediction section 310 so as to generate a locally decoded image. The addition computing section 8 then output the locally decoded image.
(Memory 9)
The memory 9 receives and then stores the locally decoded image therein.
(Intra Prediction Section 310)
The intra prediction section 310 carries out intra prediction indicated by prediction mode information, by use of a locally decoded image stored in the memory 9, and generates a predicted image. The “intra prediction” indicates prediction in a screen or prediction in a frame, that is, prediction carried out by use of a spatial correlation in the image. At this time, the intra prediction is carried out on a target block (M×M pixel block) to be encoded, by use of 9 types of predetermined prediction methods respectively indicated by Modes 0 through 8. FIG. 66 shows examples of prediction modes. FIG. 66 shows prediction modes for use in the intra prediction according to the conventional technique. In the examples in FIG. 66, a block size is a 4×4 pixel block. Further, pixels A through M are pixels which have been already encoded and which are used for prediction of the target block to be encoded.
The following describes each of the modes more specifically. Mode 0 carries out a spatial prediction toward a vertical direction with the use of the pixels A through D. Mode 1 carries out the spatial prediction toward a horizontal direction with the use of the pixels I through L. Prediction Mode 2 carries out a DC prediction with the use of the pixels A through D and the pixels I through L. Mode 3 carries out the spatial prediction toward a diagonal down left direction with the use of the pixels A through H. Mode 4 carries out the spatial prediction toward a diagonal down right direction with the use of the pixels A through D and the pixels I through M. Mode 5 carries out the spatial prediction toward a vertical right direction with the use of the pixels A through D, the pixels I through K, and the pixel M. Mode 6 carries out the spatial prediction toward a horizontal down direction with the use of the pixels A through C and the pixels I through M. Mode 7 carries out the spatial direction toward a vertical left direction with the use of the pixels A through E. Mode 8 carries out the spatial prediction toward a horizontal up direction with the use of the pixels I, J, K, and L. The intra prediction section 310 generates prediction pixels by use of a prediction method corresponding to any one of the modes.
(Prediction Mode Determining Section 311)
The prediction mode determining section 311 determines which one of the plurality of prediction modes shown in FIG. 66 is to be used for prediction of the target block to be encoded, based on an inputted original image of the target block to be encoded and a locally decoded image received from the memory 9, and supplies information on a determined prediction mode (hereinafter referred to as prediction mode information) to the intra prediction section 310 and the entropy encoding section 304.
Generally, a method for evaluating a prediction residual cost (hereinafter referred to as a prediction residual cost minimization method) or a rate distortion optimization method is used for determining a prediction mode to be used.
The prediction residual cost minimization method is a method in which (i) a degree of similarity (hereinafter referred to as a prediction residual cost) is found between the inputted original image of the target block to be encoded and a predicted image corresponding to each of the prediction modes, generated on the basis of a locally decoded image received from the memory 9, and then (ii) a prediction mode having the smallest prediction residual cost is selected among the prediction modes. A measure S of the prediction residual cost is represented by any one of the followings: a sum of absolute values of prediction residual data; a square sum of prediction residual data; a sum of absolute values of transform coefficients of prediction residual data; and a square sum of transform coefficients of prediction residual data. These values are calculated by use of the following equations (1) through (4).
                    Math        .                                  ⁢        1                                                            S        =                              ∑                          i              ,                              j                ∈                block                                                                                    ⁢                                          ⁢                                                                f                ⁡                                  (                                                            x                      +                      i                                        ,                                          y                      +                      j                                                        )                                            -                              p                ⁡                                  (                                                            x                      +                      i                                        ,                                          y                      +                      j                                                        )                                                                                                    (        1        )                                S        =                              ∑                          i              ,                              j                ∈                block                                                                                    ⁢                                          ⁢                                    {                                                f                  ⁡                                      (                                                                  x                        +                        i                                            ,                                              y                        +                        j                                                              )                                                  -                                  p                  ⁡                                      (                                                                  x                        +                        i                                            ,                                              y                        +                        j                                                              )                                                              }                        2                                              (        2        )                                S        =                              ∑                          i              ,                              j                ∈                block                                                                                    ⁢                                          ⁢                                                T              ⁢                              {                                                      f                    ⁡                                          (                                                                        x                          +                          i                                                ,                                                  y                          +                          j                                                                    )                                                        -                                      p                    ⁡                                          (                                                                        x                          +                          i                                                ,                                                  y                          +                          j                                                                    )                                                                      }                                                                                    (        3        )                                S        =                              ∑                          i              ,                              j                ∈                block                                                                                    ⁢                                          ⁢                      T            ⁢                                          {                                                      f                    ⁡                                          (                                                                        x                          +                          i                                                ,                                                  y                          +                          j                                                                    )                                                        -                                      p                    ⁡                                          (                                                                        x                          +                          i                                                ,                                                  y                          +                          j                                                                    )                                                                      }                            2                                                          (        4        )            
In the equations (1) through (4), f(x, y) indicates an original image, p(x, y) indicates a predicted image, x and y indicate a target block to be encoded, and i and j indicate a position of a pixel in the target block to be encoded. T{ } indicates an orthogonal transformation operation, such as discrete cosine transform, discrete sine transform, and Hadamard transform.
In the rate distortion optimization method, a prediction mode is selected according to the following steps. Initially, predicted images corresponding to the respective prediction modes are generated on the basis of a locally decoded image received from the memory 9. Then, prediction residual data is found from each of the predicted images thus generated and the inputted original image of the target block to be encoded. Subsequently, the prediction residual data thus obtained is temporarily encoded. Then, (a) a prediction error D between the original image of the target block and a decoded image of the target block and (b) an encoding amount R necessary for encoding the target block are calculated with respect to each of the prediction modes. Finally, a prediction mode that minimizes an encoding cost J calculated according to the following equation (5) based on D and R, is selected from the prediction modes.
Math. 2J(mode|q)=D(mode|q)+λ(q)·R(mode|q)  (5)
The prediction error D is an error between prediction residual data that has not been quantized and prediction residual data that has quantized. The encoding amount R is a total of an encoding amount of prediction residual data and an encoding amount of prediction mode information.
In the equation (5), “mode” indicates a prediction mode, and q indicates a quantization parameter. Further, λ is a weighting factor dependent on the quantization parameter q, and is generally calculated according to the following equation (6):
Math. 3λ(q)=0.85×2(q-12)/3  (6)
(Encoding Process of Image Encoding Device 300)
The following describes how the image encoding device 300 operates.
Initially, a target image to be encoded (hereinafter referred to as a target block to be encoded), which is divided into blocks by a predetermined block size (M×M pixel block), is supplied to the image encoding device 300, as an input image.
The prediction mode determining section 311 determines a prediction mode to be used for prediction for the target block to be encoded, in accordance with the prediction residual cost minimization method or the rate distortion optimization method, based on the target block to be encoded and a locally decoded image of an adjacent block that has been already encoded, which locally decoded image is received from the memory 9. The prediction mode determining section 311 then supplies prediction mode information to the intra prediction section 310 and the entropy encoding section 304.
The intra prediction section 310 carries out intra prediction indicated by the prediction mode information thus received, based on the locally decoded image of the adjacent block that has been encoded, which locally decoded image is received from the memory 9, so as to generate a predicted image (M×M pixel block) of the target block to be encoded. The intra prediction section 310 then supplies the predicted image to the difference computing section 1.
Subsequently, the difference computing section 1 calculates prediction residual data that is a difference between the target block to be encoded and the predicted image thus generated, and supplies the prediction residual data to the orthogonal transformation section 2.
The prediction residual data from the difference computing section 1 is supplied to the orthogonal transformation section 2 and then to the quantization section 3 so as to be subjected to orthogonal transformation and then quantization. The prediction residual data is then supplied to the entropy encoding section 304 and to the inverse quantization section 6.
The prediction residual data thus subjected to the orthogonal transformation and the quantization is supplied to the inverse quantization section 6 and then to the inverse orthogonal transformation section 7 so as to be subjected to inverse quantization and inverse orthogonal transformation, respectively. The prediction residual data is then supplied to the addition computing section 8.
The addition computing section 8 combines the prediction residual data thus subjected to the inverse quantization and the inverse orthogonal transformation, with the predicted image corresponding to the prediction mode applied to the target block, so as to synthesize a locally decoded image (M×M pixel block) of the target block. The locally decoded image thus synthesized is then supplied to the memory 9.
The locally decoded image of the target block, which is supplied from the addition computing section 8, is stored in the memory 9, and is used for intra prediction of an adjacent block to be encoded subsequently to the target block.
The entropy encoding section 304 carries out an encoding process, such as variable length coding, with respect to encode parameters, such as the prediction mode information, received from the prediction mode determining section 311 and the prediction residual data subjected to the orthogonal transformation and the quantization. Subsequently, the entropy encoding section 304 outputs encoded data of the target block.
The image encoding device 300 repeats the above process with respect to all target blocks to be encoded, which constitute a target image to be encoded.
As described above, the image encoding device 300 selects a corresponding one of the 9 types of predetermined prediction methods from Modes 0 through 8 illustrated in FIG. 66, depending on a characteristic of an image. This can attain high encoding efficiency. More specifically, the DC prediction of Mode 2 is effective for prediction for a planar image region. Further, Modes 0, 1, and 3 through 8 are effective for prediction for an image including an edge in a specific direction.
(a) and (b) of FIG. 67 illustrate concrete examples of the intra prediction in the image encoding device 300. (a) of FIG. 67 illustrates an example in which prediction residual data F43 is obtained as a difference between (i) an image F41 having vertical stripes in which edges are formed in a vertical direction and (ii) a predicted image F42 generated, based on a decoded image, according to a vertical direction prediction (Mode 0 in FIG. 66). Further, (b) of FIG. 67 illustrates an example in which prediction residual data F46 is obtained as a difference between (i) an image F44 having diagonal stripes in which edges are formed in a diagonal right direction and (ii) a predicted image F45 generated, based on a decoded image, according to a diagonal down right direction prediction (Mode 4 in FIG. 66).
In the cases of (a) and (b) of FIG. 67, the predicted images F42 and F45respectively reproduce patterned parts in the images F41 and F44, which are the original images. Therefore, when a difference between the original image and the predicted image is calculated, the patterned part in the original image and the patterned part in the predicted image cancel each other out. As a result, pixels, in the original image, which cannot be subtracted by the predicted image remains as prediction residual data. In (a) and (b) of FIG. 67, a prediction direction is identical with a direction of the pattern in the original image. Consequently, it is possible to effectively reduce the prediction residual data.
(Predictive Encoding Method of Prediction Mode Information)
The following deals with a predictive encoding method of prediction mode information.
The intra prediction of a 4×4 pixel block or a 8×8 pixel block in the image encoding device using a spatial correlation, disclosed in Non Patent Literature 1, uses the 9 types of prediction modes as illustrated in FIG. 68. In Non Patent Literature 1, when prediction mode information of a target block (B1) is encoded, an estimation value is determined in such a manner that (a) a value of a prediction mode applied to a block B2, which is adjacently positioned on the left side of the block B1, and (b) a value of a prediction mode applied to a block B4, which is adjacently positioned on the upper side of the block B1, are compared, and the prediction mode having a smaller value between them is taken as the estimation value. Then, the estimation value is compared with a value of a prediction mode of the block B1. If the values are identical, a flag “1” is encoded. In the other cases, a flag “0” and relative prediction mode information indicative of which one of the remaining 8 types of prediction modes (i.e., the prediction modes except for the estimation value) corresponds to the prediction mode of the block B1 are encoded.
Further, in Patent Literature 1, when the prediction mode information of the target block is encoded, a prediction mode applied to the target block is estimated by use of an estimation value 1 determined by the after-mentioned first prediction means and an estimation value 2 determined by the after-mentioned second prediction means.
The first prediction means determines, as the estimation value 1, a prediction mode that is most used among prediction modes applied to (i) blocks in an encoded region C2 (for example, a macro block, a slice) which is positioned on the left side of a target block C1, or alternatively (ii) blocks in an encoded region C3 which is positioned on the upper side of the target block C1. Further, the second prediction means uses a decoded image in an L-shaped region B5 (a shaded part with diagonal down-right lines) on encoded blocks B2, B3 and B4 which are adjacent to a target block B1, as illustrated in FIG. 69. More specifically, predicted images corresponding to 9 types of intra prediction methods shown in FIG. 68 are generated with respect to a decoded image of an L-shaped region B6 (a shaded part with diagonal down-left lines), with the use of the decoded image of the L-shaped region B5. Then, a prediction mode whose difference between the decoded image of the L-shaped region B6 and the predicted image is smallest among those prediction modes is regarded as an estimation value 2.
(Arrangement of Image Decoding Device 350)
Finally, a conventional image decoding device is described as below with reference to FIG. 71. FIG. 71 is a block diagram showing an arrangement of a conventional image decoding device 350. The image decoding device 350 includes an inverse quantization section 6, an inverse orthogonal transformation section 7, an addition computing section 8, a memory 9, an entropy decoding section 305, and an intra prediction section 310 (FIG. 71). Among these constituent components, the inverse quantization section 6, the inverse orthogonal transformation section 7, the addition computing section 8, the memory 9, and the intra prediction section 310 have been described above. Therefore, the following deals with only the entropy decoding section 305.
(Entropy Decoding Section 305)
The entropy decoding section 305 carries out a decoding process (for example, a variable length decoding process) with respect to encoded data indicative of prediction residual data of a target block to be decoded and encode parameters, such as prediction mode information, of the target block to be decoded.
(Decoding Process of Image Decoding Device 350)
The following describes a decoding process of an image in the image decoding device 350. Initially, the entropy decoding section 305 carries out entropy decoding with respect to received encoded data of the target block to be decoded, and outputs prediction residual data of the target block to be decoded and encode parameters, such as prediction mode information, of the target block to be decoded. The prediction residual data thus decoded is supplied to the inverse quantization section 6 and then to the inverse orthogonal transformation section 7 so as to be subjected to inverse quantization and inverse orthogonal transformation, respectively.
Then, the intra prediction section 310 generates a predicted image (M×M pixel block) of the target block to be decoded with the use of a locally decoded image, stored in the memory 9, of an adjacent block that has been already decoded.
Subsequently, the addition computing section 8 adds the prediction residual data thus subjected to the inverse quantization and the inverse orthogonal transformation, to the predicted image generated by the intra prediction section 310, so as to generate a decoded image of the target block. The memory 9 stores the decoded image (M×M pixel block) thus generated therein. The stored decoded image is used for intra prediction of an adjacent block(s). The image decoding device 350 repeats the above process with respect to all target blocks to be decoded, which constitute a target image to be decoded.