1. Field of the Invention
The present invention relates to a method and apparatus for coding image information, a method and apparatus for decoding image information, a method and apparatus for coding and decoding image information, and a system for coding and transmitting image information, for use in receiving, via a network medium such as satellite broadcasting, cable television, or the Internet, image information (bit stream) compressed by means of an orthogonal transform such as a discrete cosine transform or a Karhunen-Loeve transform and motion compensation according to the MPEG (Moving Picture Experts Group) standard or the standard H.26x, or for use in processing image information on a storage medium such as an optical disk, a magnetic disk, or a flash memory.
2. Description of the Related Art
In recent years, techniques of transmitting or storing digital image information in a highly compressed form have been popular in various apparatuses used in information distribution such as broadcasting and also in home use apparatuses. In a typical technique based on the MPEG standard, image information is compressed using redundancy of the image information by means of an orthogonal transform such as a discrete cosine transform and motion compensation.
MPEG2 (ISO/IEC13818-2) is a standard for general-purpose image information coding. The MPEG2 standard is designed to deal with image information in various forms and fashions such as an interlaced image, a sequentially scanned image, a standard-resolution image, and a high-resolution image, and the MPEG2 is employed in a wide range of applications including professional applications and consumer applications. The MPEG2 compression scheme allows an interlaced standard-resolution image with 720×480 pixels to be converted into a compressed image at a bit rate of 4 to 8 Mbps and an interlaced high-resolution image with 1920×1088 pixels to be converted into a compressed image at a bit rate of 18 to 22 Mbps, with a high compression ratio while maintaining high image quality.
The MPEG2 standard has been designed to code image information with high quality for use mainly in broadcasting, and the MPEG2 standard does not support coding at lower bit rates (higher compression rates) than are supported by the MPEG1 standard. That is, coding with very high compression ratios is not supported by the MPEG2 standard. However, with increasing popularity of portable terminals, there is an increasing need for coding with high compression ratios at low bit rates. To meet such a need, MPEG4 standard has been established. The image information coding scheme based on MPEG4 was employed as an international standard (ISO/IEC 14 496-2) in December 1998.
In recent years, work for establishing the H.26L standard (ITU-T Q6/16 VCEG) for coding of image information for use in video conferences has been done. It is known that the H.26L standard provides high coding efficiency compared with the conventional coding schemes such as MPEG2 or MPEG4 coding, although H.26L needs a greater amount of computation in coding and decoding. As one of activities associated with MPEG4, efforts are now being made to establish a higher-compression coding standard (Joint Model of Enhanced-Compression Video Coding) based on H.26L, which will support some functions which are not supported by the H.26L standard.
Referring to FIG. 19, a conventional image information coding apparatus using an orthogonal transform such as a discrete cosine transform or a Karhunen-Loeve transform and motion compensation is described below.
As shown in FIG. 19, the conventional image information coding apparatus 201 includes an analog-to-digital converter 211, a frame rearrangement buffer 212, an adder 213, an orthogonal transformer 214, a quantizer 215, a lossless coder 216, a storage buffer 217, a dequantizer 218, an inverse orthogonal transformer 219, a frame memory 220, a motion prediction compensator 221, and a rate controller 222.
In FIG. 19, the analog-to-digital converter 211 converts an input image signal into a digital signal. The frame rearrangement buffer 212 rearranges frames depending on the GOP (Group of Pictures) structure of the compressed image information output from the image information coding apparatus 201. When the frame rearrangement buffer 212 receives a frame to be intra-coded, the frame rearrangement buffer 212 supplies the image information of the entire frame to the orthogonal transformer 214. The orthogonal transformer 214 performs an orthogonal transform such as a discrete cosine transform or a Karhunen-Loeve transform on the image information and supplies resultant transform coefficients to the quantizer 215. The quantizer 215 quantizes the transform coefficients received from the orthogonal transformer 214.
The lossless coder 216 performs lossless coding by means of variable length coding or arithmetic coding on the quantized transform coefficients and supplies the resultant coded transform coefficients to the storage buffer 217. The storage buffer 217 stores the received coded transform coefficients. The coded transform coefficients are output as compressed image information from the storage buffer 18.
The behavior of the quantizer 215 is controlled by the rate controller 222. The quantizer 215 also supplies the quantized transform coefficients to the dequantizer 218. The dequantizer 218 dequantizes the received transform coefficients. The inverse orthogonal transformer 219 performs an inverse orthogonal transform on the dequantized transform coefficients thereby producing decoded image information and stores the resultant decoded image information into the frame memory 220.
On the other hand, image information of those frames to be interframe-coded is supplied from the frame rearrangement buffer 212 to the motion prediction compensator 221. At the same time, the motion prediction compensator 221 reads image information to be referred to from the frame memory 220 and performs motion prediction compensation to produce reference image information. The motion prediction compensator 221 supplies the reference image information to the adder 213. The adder 213 produces a difference signal indicating the difference between the image information and the reference image information. At the same time, the motion prediction compensator 221 also supplies the motion vector information to the lossless coder 216.
The lossless coder 216 performs lossless coding by means of variable length coding or arithmetic coding on the motion vector information thereby producing information to be put in a header of the compressed image information. The other processes are performed in a similar manner to compressed image information to be intra-coded, and thus they are not described herein in further detail.
Referring to FIG. 20, an image information decoding apparatus corresponding to the above image information coding apparatus 201 is described below.
As shown in FIG. 20, the image information decoding apparatus 241 includes a storage buffer 251, a lossless decoder 252, a dequantizer 253, an inverse orthogonal transformer 254, an adder 255, a frame rearrangement buffer 256, a digital-to-analog converter 257, a motion prediction compensator 258, and a frame memory 259.
In FIG. 20, compressed image information input to the storage buffer 251 is transferred to the lossless decoder 252 after being temporarily stored in the storage buffer 251. The lossless decoder 252 decodes the received compressed image information by means of variable length decoding or arithmetic decoding in accordance with the format of the compressed image information and supplies the resultant quantized transform coefficients to the dequantizer 253. In a case in which the frame supplied to the lossless decoder 252 is an interframe-coded frame, the lossless decoder 252 also decodes the motion vector information described in the header of the compressed image information and supplies the resultant decoded information to the motion prediction compensator 258.
The dequantizer 253 dequantizes the quantized transform coefficients supplied from the lossless decoder 252 and supplies the resultant transform coefficients to the inverse orthogonal transformer 254. The inverse orthogonal transformer 254 performs an inverse orthogonal transform such as an inverse discrete cosine transform or an inverse Karhunen-Loeve transform on the transform coefficients in accordance with the predetermined format of the compressed image information.
In a case in which a given frame is an intra-coded frame, the image information subjected to the inverse orthogonal transform is stored in the frame rearrangement buffer 256. The image information stored in the frame rearrangement buffer 256 is supplied to the digital-to-analog converter 257, which converts the received image information into analog form and outputs the resultant analog image information.
On the other hand, in a case in which the frame being processed is an interframe-coded frame, the motion prediction compensator 258 produces an reference image on the basis of the motion vector information subjected to the lossless decoding process and the image information stored in the frame memory 259. The resultant reference image is supplied to the adder 255. The adder 255 adds the received reference image to the output of the inverse orthogonal transformer 254. The other processes are performed in a similar manner to intraframe-coded frames, and thus they are not described in further detail herein.
The MPEG2 standard does not include detailed definition of quantization, and only dequantization is defined in detail. Therefore, in practical quantization processing, quantization characteristics are varied by varying some parameters associated with quantization so as to achieve high image quality or accomplish coding so as to reflect visual characteristics. The dequantization process according to the MPEG2 standard is described below.
In quantization of DC coefficients of intra macroblocks according to the MPEG2 video standard, the quantization accuracy can be specified on a picture-by-picture basis. In quantization of the other coefficients, the quantization accuracy of each coefficient can be controlled by multiplying each element of a quantization matrix, which can be specified on a picture-by-picture basis, by a quantization scale which can be specified on a macroblock-by-macroblock basis.
DC coefficients of each intra macroblock are dequantized in accordance with equation (1) described below.F″[0][0]=intra_dc_mult×QF[0][0]  (1)
In equation (1), F″[0][0] denotes a representative quantization value of a DC coefficient, and QF[0][0] denotes a level number of the representative quantization value of the DC coefficient. intra_dc_mult denotes a value which is defined, as shown in FIG. 21, depending on a parameter intra_dc_precision which can be set to specify the quantization accuracy of DC coefficients on the picture-by-picture basis.
In the MPEG1 standard, intra_dc_precision is allowed only to be 0, and the corresponding accuracy (8 bits) is not high enough to code an image whose luminance level varies gradually while maintaining high image quality. In the MPEG2, to avoid the above problem, quantization accuracy for DC coefficients as high as 8 to 11 bits can be specified via intra_dc_precision, as shown in FIG. 21. However, the highest quantization accuracy is allowed only in the 4:2:2: format, and the quantization accuracy is limited to the range from 8 to 10 bits except for the high profile for use in applications which need high image quality.
The other coefficients of each intra macroblock are dequantized in accordance with equation (2) described below.F″[u][v]=((2×QF[u][v]+k)×W[w][u][v])×quantiser_scale)/32  (2)
In equation (2), F″[u][v] denotes a representative quantization value of a (u, v)-coefficient and QF[u][v] denotes a level number of the representative quantization value of the (u, v)-coefficient. The value of k is given by the following equation (3)
                    k        =                  {                                                                      0                  ⁢                                      :                                                                                                for                  ⁢                                                                          ⁢                  intra                  ⁢                                                                          ⁢                  macroblocks                                                                                                                          Sign                    ⁡                                          (                                                                        QF                          ⁡                                                      [                            u                            ]                                                                          ⁡                                                  [                          v                          ]                                                                    )                                                        ⁢                                      :                                                                                                for                  ⁢                                                                          ⁢                  non                  ⁢                                      -                                    ⁢                  intra                  ⁢                                                                          ⁢                  macroblock                                                                                        (        3        )            
In equation (2) described above, W[w][u][v] denotes a quantization matrix and quantiser_scale denotes a quantization scale. The quantization characteristic are controlled by those parameters.
The parameter k has a value of 1, 0, or −1 in non-intra macroblocks, depending on the sign of QF[u][v]. For example, when QF[u][v] has a value of −2, −1, 0, 1, or 2, F″[u][v] has a value of −5 m, −3 m, 0, 3 m, or 5 m (where m is a constant). Thus, there is a dead zone near 0.
The quantization matrix defines relative quantization accuracy for discrete cosine coefficients within a block. Use of the quantization matrix allows discrete cosine coefficients to be quantized with a greater quantization step in a high-frequency range, in which a large quantization step does not result in significant visually perceptible degradation, than in a low-frequency range in which a large quantization step results in visually perceptible degradation. That is, it becomes possible to vary the quantization characteristic so as to match the visual characteristics. The quantization matrix can be set on a picture-by-picture basis.
In the case of the 4:2:0 format according to MPEG1 or MPEG2, two types of quantization matrices can be set: one is for intra macroblocks and the other for non-intra macroblocks. In the 4:2:2 format and the 4:4:4 format, two types of quantization matrices can be defined independently for each of the luminance signal and the color difference signal, and thus a total of four quantization matrices can be defined. w(0, 1, 2, 3) in W[w][u][v] denotes one of 4 matrices.
In the MPEG2 standard, the default values of the quantization matrix for intra macroblocks are defined as shown in FIG. 22, and those for non-intra macroblocks as shown in FIG. 23. As described earlier, the quantization matrices can be set on the picture-by-picture basis. However, when no quantization matrix is set, the default values described above are employed. When the default values are employed, as can be seen from FIGS. 22 and 23, weighting is performed only for intra macroblocks.
In the MPEG2 Test Model 5 (ISO/IEC JTC/SC29/WG11/N0400), the quantization matrix for non-intra macroblocks are defined as shown in FIG. 24. Unlike the quantization matrix shown in FIG. 22, the quantization matrix shown in FIG. 24 has weighted values.
A parameter quantiser_scale is a parameter to control the amount of data generated by quantization, by scaling the quantization characteristic, wherein the quantization scale is given by a parameter quantiser_scale which is determined by a parameter q_scale_type set on the picture-by-picture basis and a parameter quantiser_scale_code set on the macroblock-by-macroblock basis. FIG. 25 shows the relationships among those parameters.
As shown in FIG. 25, when q_scale_type=0, quantization is performed in a linear fashion. In this case, as with MPEG1, quantiser_scale (2 to 62) is set to be equal to 2 times quantiser_scale_code (1 to 31).
On the other hand, when q_scale_type=1, quantization is performed in a nonlinear fashion. In this mode, quantiser_scale is varied in small steps when quantiser_scale_code has a small value while quantiser_scale is varied in large steps when quantiser_scale_code has a large value, and thus quantiser_scale_code (1 to 31) is converted into quantiser_scale having a greater range (1 to 112) than in the linear quantization. This mode was newly introduced when the MPEG2 standard was established to make it possible to perform fine control of the quantization scale in a small quantization scale range at high rates, and to employ a large quantization scale when a very complicated image is coded. That is, the mode of q_scale_type=1 allows the bit rate to be controlled in a more optimal fashion than can be by MPEG1.
In H.26L, in contrast to MPEG2, coding is performed on the basis of 4×4 discrete cosine transform. More specifically, when quantized pixel values or quantized difference pixel values are given as (a, b, c, d), and transform coefficients are given as (A, B, C, D), a discrete cosine transform is performed in accordance with the following formula.
                    {                                                            A                =                                                      13                    ⁢                    a                                    +                                      13                    ⁢                    b                                    +                                      13                    ⁢                    c                                    +                                      13                    ⁢                    d                                                                                                                          B                =                                                      17                    ⁢                    a                                    +                                      7                    ⁢                    b                                    -                                      7                    ⁢                    c                                    +                                      17                    ⁢                    d                                                                                                                          C                =                                                      13                    ⁢                    a                                    -                                      13                    ⁢                    b                                    -                                      13                    ⁢                    c                                    +                                      13                    ⁢                    d                                                                                                                          D                =                                                      7                    ⁢                    a                                    -                                      17                    ⁢                    b                                    +                                      17                    ⁢                    c                                    -                                      7                    ⁢                    d                                                                                                          (        4        )            
If coefficients obtained via the transform are represented by (a′, b′, c′, d′), processing corresponding to an inverse discrete cosine transform is performed according to equation (5)
                    {                                                                              a                  ′                                =                                                      13                    ⁢                    A                                    +                                      17                    ⁢                    B                                    +                                      13                    ⁢                    C                                    +                                      7                    ⁢                    D                                                                                                                                            b                  ′                                =                                                      13                    ⁢                    A                                    +                                      7                    ⁢                    B                                    -                                      13                    ⁢                    C                                    -                                      17                    ⁢                    D                                                                                                                                            c                  ′                                =                                                      13                    ⁢                    A                                    -                                      7                    ⁢                    B                                    -                                      13                    ⁢                    C                                    +                                      17                    ⁢                    D                                                                                                                                            d                  ′                                =                                                      13                    ⁢                    A                                    -                                      17                    ⁢                    B                                    +                                      13                    ⁢                    C                                    -                                      7                    ⁢                    D                                                                                                          (        5        )            
Thus, between a′ and a, there is a relationship represented by equation (6).a′=676a  (6)
The relationship between a′ and a represented by equation (6) arises from the fact that equations (4) and (5) are not normalized. Normalization is performed when a shift operation is performed after dequantization, as will be described in detail later.
In H.26L, a parameter QP used in quantization and dequantization is defined such that QP takes a value in the range of 0 to 31 and the quantization step size is increased by 12% each time QP increases by 1. In other words, the quantization step size increases by a factor of 2 each time QP increases by 6.
The values of QP embedded in compressed image information are for the luminance signal, and thus they are denoted by QPluma. On the other hand, in contrast to QPluma, QP for the color difference signal, that is, QPchroma takes following values.
QPluma: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31
QPchroma: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 17, 18, 19, 20, 20, 21, 22, 22, 23, 23, 24, 24, 25, 26
Hereinafter, QPluma will be referred to simply as QP unless distinction is necessary.
In H.26L, two arrays A(QP) and B(QP) for use in quantization/dequantization are defined as described below.
A(QP=0, . . . , 31): 620, 553, 492, 439, 391, 348, 310, 276, 246, 219, 195, 174, 155, 138, 123, 110, 98, 87, 78, 69, 62, 55, 49, 44, 39, 35, 31, 27, 24, 22, 19, 17
B(QP=0, . . . , 31): 3881, 4351, 4890, 5481, 6154, 6914, 7761, 8718, 10987, 12339, 13828, 1 5523, 17435, 19561, 21873, 24552, 27656, 30847, 34870, 38807, 43747, 491 03, 54683, 61694, 68745, 77615, 89113, 100253, 109366, 126635, 141533
Between the arrays A(QP) and B(QP), there is a relationship represented by equation (7).A(QP)×B(QP)×6762=240  (7)
Using the array A(QP) in equation (7), the coefficient K is quantized according to equation (8).LEVEL=(K×A(QP)+f×220)/220  (8)
In equation (8), |f| has a value in the range of 0 to 0.5, wherein the sign of f is equal to the that of K.
Dequantization is performed as shown in equation (9).K′=LEVEL×B(QP)  (9)
After calculating equation (9), 20-bit shifting and rounding are performed on the coefficient K′. The sequential process including the orthogonal transform and the quantization is designed such that no overflow occurs when the process is performed in 32 bits.
Note that the standard for the quantization/dequantization is provisional, and the overflow-free data length will probably be 16 bits in the final version of the standard.
In quantization/dequantization according to the H.26L standard, unlike the MPEG2 standard, weighting of orthogonal transform coefficients using a quantization matrix is not allowed, and thus it is impossible to efficiently perform quantization on the basis of visual characteristics.
The above-described quantization according to the H.26L corresponds to MPEG2-based quantization: 2.5019, 2.8050, 3.1527, 3.5334, 3.9671, 4.4573, 5.0037, 5.6201, and 6.3055. However, in the MPEG2, the dynamic range of the nonlinear quantization is 1 to 112, and thus the range of quantization according to MPEG2 cannot be entirely covered by quantization according to H.26L.
This causes a spurious contour line to be created in an image including a part whose pixel value varies gradually. Another problem is that high-efficient compression is impossible at low bit rates.
In view of the above, it is an object of the present invention to provide a technique of preventing a spurious contour line from being created in an image including a part with gradually varying pixel values and a technique of performing high-efficient compression at low bit rates.