The present invention relates to a coding device, a coding method, a program of the coding method, and a recording medium recorded with the program of the coding method, which are all applicable to cases of transmitting or recording image data compressed by orthogonal transform and motion compensation, for example. The invention aims to improve, with consideration given to visual characteristics, the image quality much better than a previous technique by correcting a quantization scale of every macroblock based on an activity and a prediction residual in such a manner as to improve the image quality utilizing the visual characteristics.
For transmission and recording of moving images in broadcast stations and general households, recently popular devices are those efficiently transmitting and storing image data by effectively utilizing redundancy of the image data. Such devices are configured to compress the image data by orthogonal transform, e.g., discrete cosine transform, and motion compensation with an MPEG (Moving Picture Experts Group) compliant scheme, for example.
Such an MPEG compliant scheme includes MPEG2 (ISO/IEC 13818-2), which is defined as being a general-purpose image coding scheme. As is defined to be ready for both interlaced scanning and progressive scanning or standard-resolution images and high-definition images, MPEG2 is currently popular for a wide range of professional and consumer uses. Specifically, MPEG2 promises high compression rates with high image quality by compressing standard-resolution image data, e.g., 720×480 pixels, of interfaced scanning scheme to bit rates of 4 to 8 [Mbps], or by compressing high-resolution image data, e.g., 1920×1088 pixels, of interfaced scanning scheme to bit rates of 18 to 22 [Mbps].
MPEG2, however, is a high-image-quality coding scheme suitable for broadcast use but not for an enhanced-compression coding scheme with which the code amount is less than that of MPEG1. With the recent widespread use of portable terminals, the needs for such an enhanced-compression coding scheme with the less code amount than that of MPEG1 are expected to be increased. To deal with such needs, an MPEG4 coding scheme is certified as international standards by ISO/IEC (International Organization for Standardization/international Electrotechnical Commission) 14496-2 on December 1998.
For such a coding scheme, standardization is promoted for H26L (ITU-T Q6/16 VCEG) that is originally developed for image coding for videoconference. Although H26L requires a large amount of computation compared with MPEG2 and MPEG4, but promises a high coding efficiency compared with MPEG2 and MPEG4. As a part of activities relating to MPEG4, standardization of another coding scheme is promoted as Joint Model of Enhanced-Compression Video Coding, and is certified as international standards in March 2003 under the name of H264 and MPEG4 Part10 (AVC: Advanced Video Coding). For the scheme, H26L is used as a basis to establish various functions, and the scheme promises the much higher coding efficiency.
FIG. 4 is a block diagram showing a coding device based on the AVC. This coding device 1 subjects image data to a coding process by intra coding and inter coding. That is, the coding device 1 selects any optimum prediction mode from a plurality of intra prediction modes and a plurality of inter prediction modes. A prediction value of the selected prediction mode is then subtracted from image data so that differential data, i.e., prediction error data, is generated. The resulting differential data is subjected to orthogonal transform, quantization, and variable-length coding so that the image data is coded by intra coding and inter coding.
That is, in this coding device 1, an analog/digital conversion circuit (A/D) 2 subjects a video signal S1 to A/D conversion, and outputs image data D1. An image sorting buffer 3 receives the image data D1 provided by the A/D 2, applies frame sorting to the image data D1, and outputs the sorting result. Such frame sorting is applied based on the GOP (Group of Pictures) structure relating to a coding process in the coding device 1.
A subtracter 4 receives the image data D1 provided by the image sorting buffer 3, and with intra coding, generates and outputs differential data D2. The differential data D2 is of a difference between the image data D1 and a prediction value generated by an intra prediction circuit 5. With the inter coding, the subtracter 4 generates and outputs another differential data D2, which is a difference between the image data D1 and a prediction value generated by a motion prediction/compensation circuit 6. An orthogonal transform circuit 7 receives the output data D2 of the subtracter 4, and outputs transform coefficient data D3 being a process result of orthogonal transform, e.g., discrete cosine transform or Karhunen-Loeve transform.
A quantization circuit 8 quantizes the transform coefficient data D3, and outputs the quantization result. For such quantization, used is a quantization scale under the rate control of a rate control circuit 9. A reverse coding device 10 applies reverse coding, e.g., variable-length coding, arithmetic coding, or others to the data provided by the quantization circuit 8, and outputs the result. The reverse coding device 10 acquires information from the intra prediction circuit 5 and the motion prediction/compensation circuit 6, sets thus acquired information to header information of the output data D4, and outputs the result. The acquiring information includes information about an intra prediction mode relating to intra coding, information about a motion vector relating to inter coding, or others.
A storage buffer 11 stores therein the output data D4 provided by the reverse coding device 10, and outputs the data by the transmission speed of the subsequent transmission path. The rate control circuit 9 monitors the code amount to be generated as a result of the coding process by monitoring the free space of the storage buffer 11. Based on the monitor result, the rate control circuit 9 changes the quantization scale in the quantization circuit 8, thereby exercising control over the code amount to be generated by the coding device 1.
An inverse quantization circuit 13 applies inverse quantization to the output data of the quantization circuit 8, and reproduces the input data of the quantization circuit 8. An inverse orthogonal transform circuit 14 applies inverse orthogonal transform to the output data of the inverse quantization circuit 13, thereby reproducing the input data of the orthogonal transform circuit 7. A deblock filter 15 eliminates any block distortion observed in the output data of the inverse orthogonal transform circuit 14, and outputs the result. A frame memory 16 adds, as appropriate, a prediction value to the output data of the deblock filter 15, and records the result as reference image information. The prediction value is the one generated, as appropriate, by the intra prediction circuit 5 or the motion prediction/compensation circuit 6.
The motion prediction/compensation circuit 6 detects, with inter coding, a motion vector of the image data provided by the image sorting buffer 3 using a prediction frame of the reference image information stored in the frame memory 16. Based on thus detected motion vector, the motion prediction/compensation circuit 6 applies motion compensation to the reference image information stored in the frame memory 16 so that prediction image information is generated. A prediction value of the resulting prediction image information is then forwarded to the subtracter 4.
With intra coding, the intra prediction circuit 5 determines an intra prediction mode based on the reference image information stored in the frame memory 16. Based on the determination result, the intra prediction circuit 5 generates a prediction value for the prediction image information from the reference image information, and outputs the resulting value to the subtracter 4.
As such, in the coding scheme, the differential data D2 is generated by inter coding as a result of motion compensation relating to inter prediction, and another differential data D2 is generated by intra coding as a result of intra prediction. These differential data D2 is then subjected to orthogonal transform, quantization, and variable-length coding before transmission.
FIG. 5 is a block diagram showing a decoding device that decodes the coded data D4 through the coding process as such. In this decoding device 20, a storage buffer 21 stores the coded data D4 provided over the transmission path for a temporary basis before output. A reverse decoding circuit 22 applies a decoding process, e.g., variable-length decoding or arithmetic decoding, to the output data of the storage buffer 21, and reproduces the input data of the reverse coding device 10 in the coding device 1. At this time, if this output data is intra-coded data, the information stored in the header about the intra prediction mode is decoded for transmission to an intra prediction circuit 23. On the other hand, if this output data is inter-coded data, the information stored in the header about the motion vector is decoded for transmission to a motion prediction/compensation circuit 24.
The inverse quantization circuit 25 applies inverse quantization to the output data of the reverse decoding circuit 22, thereby reproducing the transform coefficient data D3 provided to the quantization circuit 8 of the coding device 1. The inverse orthogonal transform circuit 26 receives the transform coefficient data provided by the inverse quantization circuit 25, and applies thereto fourth-order inverse orthogonal transform. This accordingly reproduces the differential data D2 to be provided to the orthogonal transform circuit 7 of the coding device 1.
An adder 27 receives the differential data D2 provided by the inverse orthogonal transform circuit 26, and with intra coding, adds, to the differential data D2, a prediction value in a prediction image to be generated by the intra prediction circuit 23, and outputs the addition result. With inter coding, on the other hand, the adder 27 adds, to the differential data D2, a prediction value in a prediction image provided by the motion prediction/compensation circuit 24, and outputs the addition result. In this manner, the adder 27 reproduces the input data of the subtracter 4 in the coding device 1.
A deblock filter 28 eliminates any block distortion observed in the output data of the adder 27, and outputs the result. An image sorting buffer 29 applies frame sorting, based on the GOP structure, to the image data provided by the deblock filter 28, and outputs the result. A digital/analog conversion circuit (D/A) 30 subjects the output data of the image sorting buffer 29 to D/A conversion, and outputs the conversion result.
A frame memory 31 records and stores therein the output data of the deblock filter 28 as reference image information. The motion prediction/compensation circuit 24 applies motion compensation to the reference image information stored in the frame memory 31, and generates a prediction value of a prediction image. For such motion compensation, used is information about a motion vector notified by the reverse decoding circuit 22. The resulting prediction value is forwarded to the adder 27. The intra prediction circuit 23 generates, with intra coding, a prediction value of a prediction image based on the reference image information stored in the frame memory 31 in the intra prediction mode, which is notified by the reverse decoding circuit 22. The resulting prediction value is forwarded to the adder 27.
For intra coding relating to such a coding process, an intra 4×4 prediction mode and an intra 16×16 prediction mode are ready for use. Herein, with the AVC, the differential data D2 is subjected to orthogonal transform on the basis of a 4×4 pixel block, and the intra 4×4 prediction mode is of generating a prediction value relating to the intra prediction on the basis of a block for orthogonal transform. On the other hand, the intra 16×16 prediction mode is of generating a prediction value relating to the intra prediction on the basis of a plurality of blocks for orthogonal transform. A setting is so made that two of these blocks are set to the horizontal direction, and two to the vertical direction.
In the intra 4×4 prediction mode, as shown in FIG. 6, with respect to a block for generating a prediction value, i.e., a 4×4 pixel block including pixels a to p, the neighboring 13 pixels A to M are partially set as prediction pixels for use for generation of a prediction value, and using the prediction pixels, a prediction value is generated. Note here that these 13 pixels A to M include four pixels A to D, four pixels E to H, four pixels I to L, and a pixel M. The pixels A to D are vertically arranged in a row from a scanning start edge of the block, and the pixels E to H are arranged in a row subsequent to the pixel D located at the scanning end edge of the block. The pixels I to L are horizontally arranged in a row from the scanning start edge of the block, and the pixel M is located above the pixel 1, being one of the four horizontally-arranged pixels I to L, at the scanning start edge.
In the intra 4×4 prediction mode, with the relative relationship among these 13 pixels A to M and 4×4 pixels a to p for use for generation of a prediction value, as shown in FIGS. 7 and 8, prediction modes of 0 to 8 are defined.
More specifically, as indicated by arrows in FIG. 9A, the mode 0 is of generating a prediction value using the vertically-adjacent pixels A to D. In the mode 0, out of 4×4 pixels a to p generating a prediction value, the vertically-adjacent pixels a, e, i, and m on the first column are set with the pixel A thereabove as a prediction pixel. For the pixels b, f, j, and n on the subsequent 2nd column, the pixel B thereabove is set as a prediction pixel. For the pixels c, g, k, and o on the subsequent third column, the pixel C thereabove is set as a prediction pixel, and for the pixels d, h, l, and p on the subsequent fourth column, the pixel D thereabove is set as a prediction pixel. The pixel values of these prediction pixels A to D are each set as a prediction value of their corresponding pixels a to p. Note here that the mode 0 is applied only when the prediction pixels A to D are considered significant in this mode.
As shown in FIG. 9B, similarly, the mode 1 is of generating a prediction value using the horizontally-adjacent pixels I to L. In the mode 1, out of 4×4 pixels a to p generating a prediction value, the horizontally-adjacent pixels a to d on the first line are set with the pixel I adjacent to the left as a prediction pixel. For the pixels e to h on the subsequent 2nd line, the pixel J adjacent to the left is set as a prediction pixel. For the pixels i to l on the subsequent third line, the pixel K adjacent to the left is set as a prediction pixel, and for the pixels m to p on the subsequent fourth line, the pixel L adjacent to the left is set as a prediction pixel. The pixel values of these prediction pixels I to L are each set as a prediction value of their corresponding pixels a to p. Note here that the mode 1 is applied only when the prediction pixels I to L are considered significant in this mode.
As shown in FIG. 9C, out of the 13 pixels A to M, the mode 2 is of generating a prediction value by the vertically-adjacent pixels A to D, and the horizontally-adjacent pixels I to L in the block. When these pixels A to D and I to L are all considered significant, the prediction values are generated for the pixels a to p by the following Equation 1.(A+B+C+D+I+J+K+L+4)>>3  (1)
In the mode 2, when the pixels A to D are not all considered significant, a prediction value is generated by Equation 2, when the pixels I to L are not all considered significant, a prediction value is generated by Equation 3, and when these pixels A to D and I to L are not all considered significant, a prediction value is set to 128.(I+J+K+L+2)>>2  (2)(A+B+C+D+2)>>2  (3)
As shown in FIG. 9D, out of the 13 pixels A to M, the mode 3 is of generating a prediction value by the horizontally-adjacent pixels A to H. The mode 3 is applied only when these pixels A to D and I to L out of these pixels A to H are all considered significant, and the prediction values are generated for the pixels a to p by the following Equation 4.a:(A+2B+C+2)>>2b,e:(B+2C+D+2)>>2c,f,i:(C+2D+E+2)>>2d,g,j,m:(D+2E+F+2)>>2h,k,n:(E+2F+G+2)>>2l,o:(F+2G+H+2)>>2p:(G+3H+2)>>2  (4)
As shown in FIG. 9E, out of the 13 pixels A to M, the mode 4 is of generating a prediction value by the pixels A to D and I to M adjacent to the 4×4 pixel block including the pixels a to p. The mode 4 is applied only when these pixels A to D and I to M are all considered significant, and the prediction values are generated for the pixels a to p by the following Equation 5.m:(J+2K+L+2)>>2i,n:(I+2J+K+2)>>2e,j,o:(M+2I+J+2)>>2a,f,k,p:(A+2M+I+2)>>2b,g,l:(M+2A+B+2)>>2c,h:(A+2B+C+2)>>2d:(B+2C+D+2)>>2  (5)
As shown in FIG. 9F, similarly to the mode 4, out of the 13 pixels A to M, the mode 5 is of generating a prediction value by the pixels A to D and I to M adjacent to the 4×4 pixel block including the pixels a to p. The mode 5 is applied only when these pixels A to D and I to M are all considered significant, and the prediction values are generated for the pixels a to p by the following Equation 6.a,j:(M+A+1)>>1b,k:(A+B+1)>>1c,l:(B+C+1)>>1d:(C+D+1)>>1e,n:(I+2M+A+2)>>2f,o:(M+2A+B+2)>>2g,p:(A+2B+C+2)>>2h:(B+2C+D+2)>>2i:(M+2I+J+2)>>2m:(I+2J+K+2)>>2  (6)
As shown in FIG. 9G, similarly to the modes 4 and 5, out of the 13 pixels A to M, the mode 6 is of generating a prediction value by the pixels A to D and I to M adjacent to the 4×4 pixel block including the pixels a to p. The mode 6 is applied only when these pixels A to D and I to M are all considered significant, and the prediction values are generated for the pixels a to p by the following Equation 7.a,g:(M+I+1)>>1b,h:(I+2M+A+2)>>2c:(M+2A+B+2)>>2d:(A+2B+C+2)>>2e,k:(I+J+1)>>1f,l:(M+2I+J+2)>>2i,o:(J+K+1)>>1j,p:(I+2J+K+2)>>2m:(K+L+1)>>1n:(J+2K+L+2)>>2  (7)
As shown in FIG. 9H, out of the 13 pixels A to M, the mode 7 is of generating a prediction value by the four pixels A to D located above the 4×4 pixel block including the pixels a to p, and the four pixels E to G subsequent to the four pixels A to D. The mode 7 is applied only when these pixels A to D and I to M are all considered significant, and the prediction values are generated for the pixels a to p by the following Equation 8.a:(A+B+1)>>1b,i:(B+C+1)>>1c,j:(C+D+1)>>1d,k:(D+E+1)>>1l:(E+F+1)>>1e:(A+2B+C+2)>>2f,m:(B+2C+D+2)>>2g,n:(C+2D+E+2)>>2h,o:(D+2E+F+2)>>2p:(E+2F+G+2)>>2  (8)
As shown in FIG. 9I, out of the 13 pixels A to M, the mode 8 is of generating a prediction value by the four pixels I to L adjacent to the left of the 4×4 pixel block. The mode 8 is applied only when these pixels A to D and I to M are all considered significant, and the prediction values are generated for the pixels a to p by the following Equation 9.a:(I+J+1)>>1b:(I+2J+K+2)>>2c,e:(J+K+1)>>1d,f:(J+2K+L+2)>>2g,i:(K+L+1)>>1h,j:(K+3L+2)>>2k,l,m,n,o,p:L  (9)
In the intra 16×16 prediction mode, as shown in FIG. 10, in a block B including 16×16 pixels P(0,15) to P(15,15) for use for generation of a prediction value, prediction pixels are the pixels P(0,15) to P(15,15) configuring the block B, and pixels P(0,−1) to P(15,−1) and P(−1, 0) to P(−1,15) adjacent to above and to the left of the block B. By these prediction pixels, prediction values are generated.
In the intra 16×16 prediction mode, as shown in FIG. 11, prediction modes of 0 to 3 are defined. The mode 0 is applied only when the pixels adjacent to the above of the block B, i.e., P(0,−1) to P(15,−1)(P(x,−1); x,y=−1 to 15), are considered significant. As indicated by the following Equation 10, prediction values are generated for the pixels P(0,15) to P(15,15) configuring the block B. As shown in FIG. 12A, by the pixel values of pixels P(0,−1) to P(15,−1) adjacent to the block B, the prediction values are generated for the pixels vertically arranged in a row in the block B.Pred(x,y)=P(x,−1)x,y=0 . . . 15  (10)
The mode 1 is applied only when the pixels adjacent to the left of the block B, i.e., P(−1,0) to P(−1,15)(P(−1,y); x,y=−1 to 15) are considered significant. As indicated by the following Equation 11, prediction values are generated for the pixels P(0,15) to P(15,15) configuring the block B. As shown in FIG. 12B, by the pixel values of pixels P(−1,0) to P(−1,15) adjacent to the block B, the prediction values are generated for the pixels horizontally arranged in a row in the block B.Pred(x,y)=P(−1,y); x,y=0 . . . 15  (11)
In the mode 2, when the pixels adjacent to above and to the left of the block B, i.e., P(0,−1) to P(15,−1), and P(−1,0) to P(−1,15) are all considered significant, the prediction values are calculated by the following Equation 12. As shown in FIG. 12C, an average value of the pixel values of the pixels P(0,−1) to P(15,−1), and P(−1,0) to P(−1,15) is used as a basis to generate a prediction value for each of the pixels configuring the block B.
                                          Pred            ⁡                          (                              x                ,                y                            )                                =                                    [                                                                    ∑                                                                  x                        ′                                            =                      0                                        15                                    ⁢                                      P                    ⁡                                          (                                                                        x                          ′                                                ,                                                  -                          1                                                                    )                                                                      +                                                      ∑                                                                  y                        ′                                            =                      0                                        15                                    ⁢                                      P                    ⁡                                          (                                                                        -                          1                                                ,                                                  y                          ′                                                                    )                                                                      +                16                            ]                        ⪢            5                          ⁢                                  ⁢                              with            ⁢                                                  ⁢            x                    ,                      y            =                          0              ⁢                                                          ⁢              …              ⁢                                                          ⁢              15                                                          (        12        )            
In the mode 2, out of the pixels adjacent to above and to the left of the block B, i.e., (0,−1) to P(15,−1) and P(−1,0) to P(−1,15), when the pixels adjacent to above, i.e., (−1,0) to P(−1, 15) are not considered significant, the Equation 13 is applied so that a prediction value is calculated for each of the pixels using an average value of the adjacent pixels on the significant side. When the pixels P(−1,0) to P(−1,15) adjacent to the left are not considered significant, the Equation 14 is applied so that a prediction value is also calculated for each of the pixels configuring the block B using an average value of the adjacent pixels on the significant side. When none of the pixels adjacent to above or to the left of the block B, i.e., P(0,−1) to P(15,−1) and P(−1,0) to P(−1,15), are not all considered significant, a prediction value is set to 128.
                                          Pred            ⁡                          (                              x                ,                y                            )                                =                                    [                                                                    ∑                                                                  y                        ′                                            =                      0                                        15                                    ⁢                                      P                    ⁡                                          (                                                                        -                          1                                                ,                                                  y                          ′                                                                    )                                                                      +                8                            ]                        ⪢            4                          ⁢                                  ⁢                              with            ⁢                                                  ⁢            x                    ,                      y            =                          0              ⁢                                                          ⁢              …              ⁢                                                          ⁢              15                                                          (        13        )                                                      Pred            ⁡                          (                              x                ,                y                            )                                =                                    [                                                                    ∑                                                                  x                        ′                                            =                      0                                        15                                    ⁢                                      P                    ⁡                                          (                                                                        x                          ′                                                ,                                                  -                          1                                                                    )                                                                      +                8                            ]                        ⪢            4                          ⁢                                  ⁢                              with            ⁢                                                  ⁢            x                    ,                      y            =                          0              ⁢                                                          ⁢              …              ⁢                                                          ⁢              15.                                                          (        14        )            
The mode 3 is applied only when the pixels adjacent to above and to the left of the block B, i.e., P(0,−1) to P(15,−1) and P(−1,0) to P(−1,15), are all considered significant, and prediction values are calculated by the following Equation 15. As shown in FIG. 12D, a prediction value is generated for each of the pixels by computation in the diagonal direction. Herein, Clip1 denotes clipping for a value range of 0 to 255.
                                          Pred            ⁡                          (                              x                ,                y                            )                                =                      Clip            ⁢                                                  ⁢            1            ⁢                          (                                                (                                      a                    +                                          b                      ·                                              (                                                  x                          -                          7                                                )                                                              +                                          c                      ·                                              (                                                  y                          -                          7                                                )                                                              +                    16                                    )                                ⪢                5                            )                                      ⁢                                  ⁢                  a          =                      16            ·                          (                                                P                  ⁡                                      (                                                                  -                        1                                            ,                      15                                        )                                                  +                                  P                  ⁡                                      (                                          15                      ,                                              -                        1                                                              )                                                              )                                      ⁢                                  ⁢                  b          =                                    (                                                5                  ·                  H                                +                32                            )                        ⪢            6                          ⁢                                  ⁢                  c          =                                    (                                                5                  ·                  V                                +                32                            )                        ⪢            6                          ⁢                                  ⁢                  H          =                                    ∑                              x                =                1                            8                        ⁢                          x              ·                              (                                                      P                    ⁡                                          (                                                                        7                          +                          x                                                ,                                                  -                          1                                                                    )                                                        -                                      P                    ⁡                                          (                                                                        7                          -                          x                                                ,                                                  -                          1                                                                    )                                                                      )                                                    ⁢                                  ⁢                  V          =                                    ∑                              y                =                1                            8                        ⁢                          y              ·                              (                                                      P                    ⁡                                          (                                                                        -                          1                                                ,                                                  7                          +                          y                                                                    )                                                        -                                      P                    ⁡                                          (                                                                        -                          1                                                ,                                                  7                          -                          y                                                                    )                                                                      )                                                                        (        15        )            
As such, in pictures of I, P, and B, the intra prediction device 5 of the coding device 1 receives the image data D1 provided by the image sorting buffer 3, and selects an optimum prediction mode by so-called intra prediction using the reference image information stored in the frame memory 16. For intra coding with the selected prediction mode, a prediction value in this selected prediction mode is generated using the reference image information, and the resulting value is forwarded to the subtracter 4. The prediction mode is also notified to the reverse coding device 10 for transmission together with the coded data D4. In response thereto, the intra prediction device 23 of the decoding device 20 calculates a prediction value using the information about the prediction mode provided thereto together with the coded data D4, and outputs the resulting value to the adder 27.
With inter coding, on the other hand, as shown in FIG. 13, using multiple reference frames, any of a plurality of reference frames Ref is selected for a process target frame Org for motion compensation. Such motion compensation can be performed with a high accuracy, and the data compression efficiency can be increased. This is applicable even if the immediately-preceding frame secretly has any portion corresponding to a block for motion compensation, or even if the entire pixel values temporarily show some change in the immediately-preceding frame by flash, for example.
As shown in (A1) of FIG. 14, a block for motion compensation is subjected to motion compensation with reference to a block of 16×16 pixels, and tree-structured motion compensation is supported by variable MCBlock Size. As shown in (A2) to (A4) of FIG. 14, the block of 16×16 pixels is divided into two in the horizontal or vertical direction. The resulting sub-blocks of 16×8 pixels, 8×16 pixels, and 8×8 pixels can be separately set with a motion vector and a reference frame for motion compensation. As shown in (B1) to (B4) of FIG. 14, the sub-block of 8×8 pixels can be divided into, to a further degree, blocks of 8×8 pixels, 8×4 pixels, 4×8 pixels, and 4×4 pixels. These blocks can be separately set with a motion vector and a reference frame for motion compensation.
The motion compensation is performed using a 6-tap FIR (Finite Impulse Response) filter with an accuracy of ¼ pixel. In FIG. 15, pixels denoted by uppercase characters are each with an accuracy of pixel, and pixels denoted by lowercase characters are with an accuracy of ½ pixel or ¼ pixel. With motion compensation, first of all, tap inputs of the 6-tap FIR filter are each weighted by values of 1, −5, 20, 20, −5, and 1, and then are sequentially subjected to computation of the following Equation 16. In this manner, the pixel values b and h of pixels with an accuracy of ½ pixel are calculated between any adjacent pixels in the horizontal or vertical direction.b1=(E−5F+20G+20H−5H+J)h1=(A−5C+20G+20M−5R+T)b=Clip1((b1+16)>>5)h=Clip1((h1+16)>>5)   (16)
Using the pixel value b or h calculated as such with an accuracy of ½ pixel, the tap inputs of the δ-tap FIR filter are each weighted by values of 1, −5, 20, 20, −5, and 1, and then are sequentially subjected to computation of the following Equation 17. In this manner, a pixel value j of a pixel with an accuracy of ½ pixel is calculated between any adjacent pixels in the horizontal or vertical direction.j1=cc−5dd+20h+20m−5ee+ff orj1=aa−5bb+20b+20s−5gg+hh j=Clip1((j1+512)>>10)  (17)
By linear interpolation using the pixel values b, h, and j calculated as such with an accuracy of ½ pixel, pixel values a, d, e, and others are calculated with the accuracy of ¼ pixel. Note here that, the process of normalization relating to the weight addition in the Equations 16 and 17 is executed after completion of interpolation entirely in the vertical and horizontal directions.
As shown in FIG. 16, as to a color-difference signal, a pixel value with an accuracy of a few pixels is directly calculated from a pixel of integer accuracy by the computation of the following equation 18, i.e., computation of linear interpolation. In FIG. 16, characters dx and dy denote interpolation coefficients in the horizontal and vertical directions, respectively, and the characters A to D each denote a pixel value.
                    v        =                                                            (                                  s                  -                                      d                    x                                                  )                            ⁢                              (                                  s                  -                                      d                    y                                                  )                            ⁢              A                        +                                                            d                  x                                ⁡                                  (                                      s                    -                                          d                      y                                                        )                                            ⁢              B                        +                                          (                                  s                  -                                      d                    x                                                  )                            ⁢                              d                y                            ⁢              C                        +                                          d                x                            ⁢                              d                y                            ⁢              D                        +                                          s                2                            /              2                                            s            2                                              (        18        )            
As such, in the P and B pictures, the motion prediction/compensation device 6 in the coding device 1 uses a plurality of prediction frames to detect a motion vector for every macroblock and sub-block with an accuracy of ¼ pixel. Herein, the prediction frames are those defined by levels and profiles of the coding process using the reference image information stored in the frame memory 16. The detection results are then searched for a motion vector of a reference frame showing the smallest prediction residual. Using the reference frame detected as such, the reference image information stored in the frame memory 16 is subjected to motion compensation with an accuracy of ¼ pixel so that so-called inter prediction is executed. If with inter coding by such inter prediction, the pixel value as a result of motion compensation is forwarded to the subtracter 4 as a prediction value, and a notification is made to the reverse coding device 10 about the reference frame, the block, and the motion vector for transmission together with the coded data D4.
On the other hand, the motion prediction/compensation device 24 of the decoding device 20 subjects the reference image information to motion compensation with the accuracy of ¼ pixel, and generates a prediction value. Such motion compensation is applied using the reference frame and the motion vector transmitted together with the coded data D4, and the reference image information is the one stored in the frame memory 16. In the P and B pictures, the coding device 1 selects either intra coding or inter coding based on the result of the intra prediction derived by the intra prediction device 5, and the result of the inter prediction derived by the motion prediction/compensation device 6, for example. Based on the selection result, the intra prediction device 5 and the motion prediction/compensation device 6 output prediction values derived by intra and inter prediction, respectively.
As shown in FIG. 17, for coding an interlaced video signal, the coding process with the AVC defines a pair of macroblocks adjacent to each other in the vertical direction in a frame, i.e., a macroblock pair. The macroblock pair can be subjected to the coding process in a field mode or a frame mode.
On the other hand, the rate control by the rate control device 9 is exercised by the technique of TM5 (MPEG-2 Test Model 5). Here, the rate control with TM5 is executed by controlling a quantization scale of the quantization device 8 by execution of the process procedure shown in FIG. 18. That is, in the rate control device 9, when the process is started, the procedure goes to step 1. In step 1, as to pictures configuring a GOP, a target code amount is calculated for any not-yet-processed picture so that a bit allocation is made to the picture. Herein, with TM5, the code amount for allocation to each of the pictures is calculated based on the following two assumptions.
The first assumption is that the product of an average quantization scale and the generated code amount remains constant in the individual picture type as long as the screen remains the same. Herein, the average quantization scale is the one used for coding the pictures. As such, by the rate control, parameters Xi, Xp, and Xb (global complexity measure) are updated by the following Equation 19 for every picture type after the coding process is executed to each of the pictures. Here, the parameters Xi, Xp, and Xb are those representing the screen complexity. As such, under the rate control with TM5, using such parameters Xi, Xp, and Xb, the relationship is assumed between a quantization scale code and the generated code amount at the time of the coding process for the next picture.Xi=SiQi Xp=SpQp Xb=SbQb  (19)
Here, the variables provided with numerical subscripts in Equation 19 each denote I picture, P picture, and B picture. The characters Si, Sp, and Sb each denote the generated coded bit amount as a result of the coding process applied to the pictures, and the characters Qi, Qp, and Qb each denote an average quantization scale code at the time of coding the pictures. The initial values of the parameters Xi, Xp, and Xb are calculated by the following Equation 20 using a target code amount bit rate [bit/sec].Xi=160×bit_rate/115Xp=60×bit_rate/115Xb=42×bit_rate/115  (20)
The second assumption is that the entire image quality is typically maximized when the following Equation is satisfied between ratios Kp and Kb. The ratio Kp is of a quantization scale code of a P picture to a quantization scale of an I picture, and a rate Kb is of a quantization scale code of a B picture to a quantization scale of the I picture.Kp=1.0; Kb=1.4  (21)
That is, this assumption means that the entire image quality is maximized by making the quantization scale of the B picture remain 1.4 times of the quantization scale of the I and P pictures. The B picture is coarsely quantized compared with the I and P pictures so that the code amount for allocation to the B picture is saved. Thus saved code amount is allocated to the I and P pictures so that the image quality of the I and P pictures is improved. This accordingly improves the image quality of the B picture for reference use of the I and P pictures so that the entire image quality can be maximized.
As such, the rate control device 9 calculates allocation bit amounts Ti, Tp, and Tb for each of the pictures by the computation of the following Equation 22. Note here that the characters Np and Nb denote the number of P and B pictures, respectively, which are not yet coded in a GOP being a process target.
                                          T            i                    =                      max            ⁢                          {                                                R                                      1                    +                                                                                            N                          p                                                ⁢                                                  X                          p                                                                                                                      X                          i                                                ⁢                                                  K                          p                                                                                      +                                                                                            N                          b                                                ⁢                                                  X                          b                                                                                                                      X                          i                                                ⁢                                                  K                          b                                                                                                                    ,                                  bit_rate                  ⁢                                      /                                    ⁢                                      (                                          8                      ×                      picture_rate                                        )                                                              }                                      ⁢                                  ⁢                              T            p                    =                      max            ⁢                          {                                                R                                                            N                      p                                        +                                                                                            N                          b                                                ⁢                                                  K                          p                                                ⁢                                                  X                          b                                                                                                                      K                          b                                                ⁢                                                  X                          p                                                                                                                    ,                                  bit_rate                  ⁢                                      /                                    ⁢                                      (                                          8                      ×                      picture_rate                                        )                                                              }                                      ⁢                                  ⁢                              T            b                    =                      max            ⁢                          {                                                R                                                            N                      b                                        +                                                                                            N                          p                                                ⁢                                                  K                          b                                                ⁢                                                  X                          p                                                                                                                      K                          p                                                ⁢                                                  X                          b                                                                                                                    ,                                  bit_rate                  ⁢                                      /                                    ⁢                                      (                                          8                      ×                      picture_rate                                        )                                                              }                                                          (        22        )            
Based on the two assumptions described above, the rate control device 9 estimates the code amount to be generated for each of the pictures. At this time, for any picture of a picture type different from that of a target for code amount allocation, the rate control device 9 estimates how much larger the code amount to be generated by the picture is than the code amount of the target picture for the code amount allocation under conditions of image quality maximization. With the estimation result, it is estimated how many pictures of a picture type being the target for code amount allocation are the equivalent of the not-yet-coded picture in the GOP. With the estimation result, the bit amount is calculated for allocation to each of the pictures. In this case, before calculating the bit amount for allocation, the rate control device 9 sets a lower limit to the code amount that is constantly needed for header or others.
As such, every time completing coding a picture, the rate control device 9 goes through the computation of the following Equation 23, and corrects a bit amount R for allocation to the not-yet-coded picture(s) in the GOP using an actually-generated code amount S.R=R−Si,p,b  (23)
For the picture at the head of the GOP, instead of the computation of Equation 23, the following Equation 24 is used to calculate the bit amount R for allocation to the not-yet-coded picture(s) in the GOP. In Equation 24, the character N on the right side denotes the number of pictures in a GOP, and the character R on the right side denotes a bit amount left unprocessed for the GOP, i.e., value 0 at the head of the sequence.R=bit_rate×N/picture_rate+R  (24)
Under the rate control with TM5, the procedure goes to step 2, and the rate control is exercised using virtual buffer control. With such rate control, three types of a virtual buffer are set separately to each of the picture types to establish a matching between the bit amounts Ti, Tp, and Tb calculated in step 1 for allocation to each of the pictures, and the actually-generated code amount. Based on the capacities of the virtual buffers, the quantization scale of the quantization device 8 is calculated by feedback control on a macroblock basis.
Calculated first is the occupancy of these three types of virtual buffer by the computation of the following Equation 25. Herein, characters d0i, d0p, and d0b denote an initial occupancy of the respective virtual buffers, a character Bj denotes the bit amount generated from the head of the picture to the j-th macroblock, and MB#cnt denotes the number of macroblocks in a picture.
                                          d            j            i                    =                                    d              0              i                        +                                          ∑                                  n                  =                  0                                                  j                  -                  1                                            ⁢                              Bit                n                                      -                                                            T                  i                                ×                                  (                                      j                    -                    1                                    )                                            MB_cnt                                      ⁢                                  ⁢                              d            j            p                    =                                    d              0              p                        +                                          ∑                                  n                  =                  0                                                  j                  -                  1                                            ⁢                              Bit                n                                      -                                                            T                  p                                ×                                  (                                      j                    -                    1                                    )                                            MB_cnt                                      ⁢                                  ⁢                              d            j            b                    =                                    d              0              b                        +                                          ∑                                  n                  =                  0                                                  j                  -                  1                                            ⁢                              Bit                n                                      -                                                            T                  b                                ×                                  (                                      j                    -                    1                                    )                                            MB_cnt                                                          (        25        )            
Based on the calculation result of Equation 25, the quantization scale is calculated for the j-th macroblock by the following Equation 26.
                              Q          j                =                                            d              j                        ×            31                    r                                    (        26        )            
Herein, a character r denotes a reaction parameter, which controls the feedback response. With TM5, the reaction parameter r, and the initial values d0i, d0p, and d0b of the virtual buffers at the head of the sequence are calculated by the following Equation 27.
                              r          =                                    2              ×              bit_rate                        picture_rate                          ⁢                                  ⁢                              d            0            i                    =                      10            ×                          r              /              31                                      ⁢                                  ⁢                              d            0            p                    =                                    K              p                        ⁢                          d              0              i                                      ⁢                                  ⁢                              d            0            b                    =                                    K              b                        ⁢                          d              0              i                                                          (        27        )            
Under the rate control with TM5, the procedure goes to step 3. In step 3, the quantization scale derived in step 2 is corrected with consideration given to the visual characteristics, thereby performing optimum quantization with consideration given to the visual characteristics. Herein, the optimum quantization is performed by correcting the quantization scale derived in step 2 based on the activity of each of the macroblocks, i.e., any flat portion where quality deterioration is easily perceived is finely quantized, and any pattern-complicated portion where image deterioration is not relatively easily perceived is coarsely quantized.
Herein, the activity is calculated by the following Equation 28 for every macroblock of the size of 16×16 pixels. For four blocks of 8×8 pixels configuring a macroblock, used are pixel values of eight blocks, i.e., four blocks in a frame DCT mode, and four blocks in a field DCT mode. The resulting value indicates the smoothness of the brightness level in the corresponding macroblock.
                                          act            j                    =                      1            +                                          min                                                      sblk                    =                    1                                    ,                  8                                            ⁢                              (                                  var                  ⁢                                                                          ⁢                  sblk                                )                                                    ⁢                                  ⁢                              var            ⁢                                                  ⁢            sblk                    =                                    1              64                        ⁢                                          ∑                                  k                  =                  1                                64                            ⁢                                                (                                                            P                      k                                        -                                          P                      _                                                        )                                2                                                    ⁢                                  ⁢                              P            _                    =                                    1              64                        ⁢                                          ∑                                  k                  =                  1                                64                            ⁢                              P                k                                                                        (        28        )            
In Equation 28, the character Pk denotes a pixel value in a brightness signal block of an original image. This Equation 28 takes a minimum value for the purpose of preventing image quality deterioration by performing quantization with more steps if the macroblock has any flat portion.
Using the following Equation 29, the rate control device 9 normalizes the resulting activity calculated by Equation 28, and derives a normalized activity Nactj whose value falls in a value range from 0.5 to 2. Herein, avg act denotes an average value of an activity actj in the precedingly-coded picture.
                              Nact          j                =                                            2              ×                              act                j                                      +                          avg              ⁢                                                          ⁢              act                                                          act              j                        +                          2              ×              avg              ⁢                                                          ⁢              act                                                          (        29        )            
Using the normalized activity Nactj, the computation of the following Equation 30 is performed, and the quantization scale Qj derived in step 2 is corrected so that the quantization device 8 is put under the control.mquantj=Q1×Nactj  (30)
As such, under the rate control with TM5, based on the two assumptions described above, the code amount is allocated to each of the pictures, and by extension, to each of the macroblocks. The quantization scale is then controlled under the feedback control with which the allocated code amounts are sequentially corrected based on the actually-generated code amount so that the coding process is successively executed.
As to such rate control with TM5 in step 2 of FIG. 18, Patent Document 1 (JP-A-2003-61096) describes a method of improving the image quality through control over the code amount to be allocated to each of the macroblocks using the residual of a motion vector.
Under the rate control in step 3 with TM5, the quantization scale derived in step 2 is corrected using the activity of a macroblock, and the quantization scale is corrected with consideration given to the visual characteristics so that the image quality is improved.
With such a method, I pictures are indeed increased in image quality with consideration fully given to the visual characteristics. With inter prediction for P and B pictures, however, there remains a problem of not being able to improve the image quality always appropriately. More specifically, if with any pattern-complicated portion where quality deterioration is not relatively easily noticed, the deterioration becomes conspicuous and is perceived if the portion is not in motion. As a result, with the process in step 3 in the previous method with TM5, the quality deterioration is easily perceived for any letter portions such as captions.
It is thus desirable to provide a coding device, a coding method, a program of the coding method, and a recording medium recorded with the program of the coding method with which, with consideration given to the visual characteristics, the image quality can be improved much better than a previous technique.