The capture, storage, and transmission of digital images and digital video have become widespread. Compression has made this possible by significantly reducing the number of bits required to represent an image without seriously degrading it. But there still remains a need for greater compression and better image quality. The digital image and video compression standards, such as JPEG, MPEG, and H.263, are powerful tools that try to meet the need for compression. But there is a limit to how much compression is possible without artifacts becoming too annoying. Even at low levels of compression, when artifacts and distortions may not be visible to the human eye, they are still present in the reconstructed image and can cause problems when further processing is attempted.
Transforms, such as the discrete cosine transform (DCT), are a key part of many of today's image and video compression algorithms. At low levels of compression, the reconstructed images and frames of video are visually indistinguishable from the originals. However, as the degree of compression increases, the visibility of annoying artifacts and distortions increases. These artifacts limit the size to which an image or video can be compressed and still be pleasing to look at.
There exists a great need for ways to reduce or remove the coding artifacts and improve the quality of block transform coded images, particularly those compressed using the block DCT. There are methods for addressing these problems that are known in the art, but they have their shortcomings. New and better ways to effectively and efficiently reduce block-coding artifacts are required.
The term image is understood to mean a still image, a frame or field of video, a graphics image, or any other two-dimensional set of visual data. Grey-scale images are typically represented as one value for each pixel of a two-dimensional array of pixels. Color images are typically represented as a triplet of values at each pixel; for example, there is one value for each of the color components red, green, and blue. An alternative representation is a luminance component, denoted Y, and two color-difference components, such as Cb and Cr.
In a typical block DCT (discrete cosine transform) compression algorithm, the input image is partitioned into contiguous nonoverlapping blocks of size N1×N2 pixels, where each of N1 and N2 are typically equal to 8. FIG. 1 illustrates an image partitioned into 16 nonoverlapping blocks in such a manner. Each block is transformed using the forward two-dimensional orthogonal type-2 even discrete cosine transform (DCT-2), hereafter referred to simply as the DCT, whose transform coefficients XII(m1,m2) are given by the expression:
            X      II        ⁡          (                        m          1                ,                  m          2                    )        =            2                                    N            1                    ⁢                      N            2                                ⁢          k      ⁡              (                  m          1                )              ⁢          k      ⁡              (                  m          2                )              ⁢                  ∑                              n            1                    =          0                                      N            1                    -          1                    ⁢                          ⁢                        ∑                                    n              2                        =            0                                              N              2                        -            1                          ⁢                                  ⁢                              x            ⁡                          (                                                n                  1                                ,                                  n                  2                                            )                                ⁢                      cos            (                                          π                ⁢                                                                  ⁢                                                      m                    1                                    ⁡                                      (                                                                  n                        1                                            +                                              1                        2                                                              )                                                                              N                1                                      )                    ⁢                      cos            (                                          π                ⁢                                                                  ⁢                                                      m                    2                                    ⁡                                      (                                                                  n                        2                                            +                                              1                        2                                                              )                                                                              N                2                                      )                              
The multiplier k(μ) is 1√{square root over (2)} when μ=0, and is 1 when μ=1, 2, . . . , N1−1 or μ=1, 2, . . . , N2−1. For a given block, the input data x(n1,n2), n1=0, 1, . . . , N1−1, n2=0, 1, . . . , N2−1, are transformed into coefficients XII(m1,m2), m1=0, 1, . . . , N1−1, m2=0, 1, . . . , N2−1, where N1 and N2 are the dimensions of the block. Because this is an orthogonal transform, the equation for the inverse transform is easily derived from the equation for the forward transform.
The coefficients XII(m1,m2) are quantized using an operation such as:Xq(m1,m2)=round(XII(m1,m2)/Q(m1,m2))where the Q(m1,m2), m1=0, 1, . . . , N1−1, m2=0, 1, . . . , N2−1, constitute a quantization matrix whose elements define the size of the quantization interval for each coefficient and round(•) is a function that rounds its argument to the nearest integer. The operation of quantization replaces each coefficient, e.g., XII(m1,m2), by a quantization level Xq(m1,m2). A quantization level is a value that represents the quantization interval into which the original coefficient falls.
Quantization is an irreversible operation and it is here that the losses of lossy compression occur. These quantization losses are responsible for many of the artifacts that will appear in the decoded image. The other cause of artifacts is the independent processing of the nonoverlapping blocks of the image. Because no data is shared between blocks that are quantized and coded independent of one another, there can arise significant disparities between decoded blocks and these disparities may be perceived as annoying blockiness.
While blockiness problems occur as a result of DCT coding in the above-described manner, similar blockiness problems arise when other transforms, e.g., DFT (discrete Fourier transform) or DST (discrete sine transform) are used to encode image data.
As part of many image encoding processes, quantization is followed by entropy coding, a lossless process that removes redundancy. The entropy-coded bits for the luminance and color components of a color image, for example, are then packaged together with some informational overhead to yield a compliant compressed image.
Decompression is normally achieved by performing the inverse of the compression operations in the reverse order. The entropy-coded bits are entropy decoded to recover the quantized coefficients. These coefficients are dequantized using an operation such as:Xd(m1,m2)=Xq(m1,m2)Q(m1,m2)where the Q(m1,m2), m1=0, 1 . . . , N1−1, m2=0, 1, . . . , N2−1, is the same Q(m1,m2) that was used for compression. The net effect of quantization and dequantization according to the above equations is to replace each original DCT coefficient with the value at the midpoint of its quantization interval. The dequantized DCT coefficients are transformed back to image data using the inverse DCT.
Also known in the art is the two-dimensional orthogonal type-1 even discrete cosine transform (DCT-1), whose transform coefficients XI(m1,m2) are given by the expression:
            X      I        ⁡          (                        m          1                ,                  m          2                    )        =            2                                    N            1                    ⁢                      N            2                                ⁢          k      ⁡              (                  m          1                )              ⁢          k      ⁡              (                  m          2                )              ⁢                  ∑                              n            1                    =          0                          N          1                    ⁢                          ⁢                        ∑                                    n              2                        =            0                                N            2                          ⁢                                  ⁢                              x            ⁡                          (                              n                1                            )                                ⁢                      k            ⁡                          (                              n                2                            )                                ⁢                      x            ⁡                          (                                                n                  1                                ,                                  n                  2                                            )                                ⁢                      cos            ⁡                          (                                                π                  ⁢                                                                          ⁢                                      m                    1                                    ⁢                                      n                    1                                                                    N                  1                                            )                                ⁢                      cos            ⁡                          (                                                π                  ⁢                                                                          ⁢                                      m                    2                                    ⁢                                      n                    2                                                                    N                  2                                            )                                          
The multiplier k(μ) is 1/√{square root over (2)} when μ=0 or μ=N1 or μ=N2, and is 1 when μ=1, 2, . . . , N1−1 or μ=1, 2, . . . , N2−1. For a given block, the input data x(n1,n2), n1=0, 1, . . . , N1, n2=0, 1, . . . , N2, are transformed into coefficients XI(m1,m2), m1=0, 1, . . . , N1, m2=0, 1, . . . , N2, where N1 and N2 are the dimensions of the block. The DCT-1 is an orthogonal transform with the additional property that the same equation can be used to compute both forward and inverse transforms. The DCT-1 operates on a block of size (N1+1)×(N2+1). The DCT-1 is not used in typical block-DCT compression, but will be used in some embodiments of the present invention.
One known approach to improving the quality of compressed images is to apply some type of filter after decompression. A lowpass filter can be used to reduce the visibility of blockiness but it also smoothes details of the image and makes the image appear less sharp. The filtering can be restricted to operate only where blockiness occurs at the block boundaries, but doing so can add significant complexity to the implementation and will still cause unwanted smoothing of image detail that is at the block boundaries. The filtering can be done conditionally, but that adds complexity and reduces the blockiness removal effectiveness. Image quality improvements have been achieved with iterative restoration and other linear and nonlinear post-processing techniques, but these methods all add substantial complexity and latency that reduce their usefulness.
Another approach to reducing compression artifacts is to modify the encoder, such as by using overlapping blocks instead of the nonoverlapping blocks as defined in the standards. This approach has the disadvantages of increasing the complexity of the encoder and increasing the amount of data to be compressed. Another shortcoming is that the compressed image would no longer be compliant with the standard based on nonoverlapping blocks. As a result, a standard decoder would not be able to decode the encoded image. Instead, a more complex special purpose decoder would be required.
There are some approaches that can be integrated into the decoding process to reduce artifacts. One known noteworthy approach modifies the quantized DC coefficient, i.e., the coefficient at indices (m1,m2)=(0,0) of each DCT block. When using the known technique, each quantized DC coefficient is replaced with the average of itself and the quantized DC coefficients of the eight adjacent blocks. The modified coefficient is then clamped to stay within the quantization interval of the original value. FIG. 2 illustrates this processing. The processing shown in FIG. 2 reduces DC block artifacts but leaves many others unimproved.
The known method 200 shown in FIG. 2 begins in step 210 wherein JPEG-compressed encoded image data 260, e.g., blocks of entropy encoded quantized DCT coefficients, begins to be processed. In step 220 entropy decoding is performed on the JPEG compressed encoded image data 260 to yield 8×8 blocks of quantized DCT coefficients. Next, in step 230 each quantized DC coefficient in a block of quantized coefficients is replaced with the mean of itself and quantized DC coefficients of eight adjacent blocks. In step 240 each replaced quantized coefficient is clamped to the quantization interval of the original DC coefficient that was replaced by the mean value. Processing ends in step 250 with blocks of quantized DCT coefficients, including the clamped mean DC coefficient values, being available for further processing.
There is another approach, included in an informative annex of the JPEG image compression standard, that operates on the first five quantized AC coefficients, in zigzag scan order, of each block. A prediction is computed for each of these coefficients as a function of the DC coefficients of this block and the eight neighboring blocks. The values used for prediction are DC values that have been recovered directly from the data stream or they may be processed DC values, an example of such processing being the above mentioned filtering. In such an implementation the AC coefficient is replaced with its predicted value only if the original AC coefficient is zero. Also, the new coefficients are clamped to stay within the quantization interval centered on zero. FIG. 3 illustrates this processing. This approach reduces only some of the AC artifacts.
The method 300 illustrated in FIG. 3 begins in step 310 wherein JPEG compressed encoded image data 330, e.g., blocks of entropy encoded quantized DCT coefficients, begins to be processed. In step 312 entropy decoding is performed on the JPEG compressed encoded image data 330 to yield 8×8 blocks of quantized DCT coefficients including both AC and DC DCT coefficients. The blocks of AC coefficients are then processed beginning with step 313.
In step 313 a determination is made if there are more blocks of AC coefficients to be processed. If, in step 313 it is determined there are no more blocks to be processed, then processing ends in step 328. If, in step 313 it is determined there are more blocks to be processed, then processing continues with step 314 wherein the next block to be processed, hereafter called the current block, is obtained. From step 314 operation proceeds to step 316 wherein a determination is made if there are more coefficients in the current block to be processed.
Step 316 determines there are no more AC coefficients in the block to be processed if all the coefficients of the block have been processed or, in some embodiments, if a pre-selected number of coefficients, e.g., the first five AC coefficients in zigzag scan order, have already been processed. If, in step 316 it is determined that there are no additional AC coefficients in the current block to be processed, then processing of the current block stops and operation proceeds from step 316 back to step 313. However, if in step 316 it is determined that there are additional AC coefficients in the current block to be processed, then processing continues with step 318 wherein the next AC coefficient to be processed is obtained. Operation proceeds to step 320 wherein the AC coefficient is examined to determine if it is zero. If the retrieved AC coefficient is nonzero, operation proceeds from step 320 back to step 316 resulting in nonzero valued AC coefficients being left unaltered.
If in step 320 it was determined that the AC coefficient to be processed has a value of zero, operation proceeds to step 322 wherein a predicted AC coefficient value is computed from DC coefficients. Then, in step 324 the predicted AC coefficient value is clamped to a quantization interval centered on zero. Next, in step 326, the AC coefficient value being processed is replaced in the block of coefficients being processed with the clamped value produced in step 324. Operation proceeds from step 326 to step 316 with all of the AC coefficients to be processed in each block ultimately being subjected to the described processing.
Another noteworthy approach in the prior art involves replacing the typically used inverse DCT with the inverse DCT-1 and using the output from the inverse DCT-1 to reconstruct the image. This approach is described in the paper by S. A. Martucci, “A new approach for reducing blockiness in DCT image coders”, Proceedings of 1998 IEEE ICASSP. For each 8×8 block of the image, an additional row and column of zeros is appended to the block to increase the size of the block to 9×9. Then an inverse 9×9 DCT-1 is applied to generate a block of 9×9 pixels. The first and last rows are each scaled by the factor √{square root over (2)}. Then, the first and last columns are scaled by the factor √{square root over (2)}. The output image is assembled by overlapping by one row and one column each 9×9 block with its neighbors. The values at the overlaps are combined by averaging. The final output image is reduced to correct size by deleting either the top or bottom row of the image and the first or last column of the image. FIG. 4 illustrates this processing.
The method 400 shown in FIG. 4 begins in step 410 wherein JPEG-compressed encoded image data 430, e.g., blocks of entropy encoded quantized DCT coefficients, begins to be processed. In step 412 entropy decoding and dequantization is performed on the JPEG compressed encoded image data 430 to yield 8×8 blocks of dequantized DCT coefficients. Next, in step 414 each block is increased in size to 9×9 by appending a row and a column of zeros.
In step 416 an inverse 9×9 DCT-1 is applied to each block of augmented DCT coefficients to generate a corresponding 9×9 block of image data. The first and last row of each 9×9 block of image data is scaled by the factor √{square root over (2)} in step 418 and then the first and last column of each resulting block is scaled by the factor √{square root over (2)} in step 420. In step 422 an image is reconstructed by overlapping each of the scaled blocks by one row or one column with each of its four neighbors. Values that overlap are replaced by the average of the overlapping values.
Because the image that results from assembling blocks of size 9×9 in an overlapping fashion where the overlapping is by one row or one column results in a reconstructed image that is one row and one column larger than would have been reconstructed by assembling the corresponding original 8×8 blocks in nonoverlapping fashion, the larger reconstructed image must be truncated by one row and one column to generate an output image of the correct size. In step 424 either the top or bottom row of the larger reconstructed image is deleted and then either the first or last column is deleted. Processing ends in step 426 with a reconstructed image ready for display or further processing.
A side effect of the known method of DCT-1 processing shown in FIG. 4 is a one-half sample shift in each dimension of each component of the image. The reconstructed image will not be the same as the original before compression but a fractionally shifted version of it. Whether or not this shift is a problem is dependent on the application. Further processing of the reconstructed image will be affected by the shift.
Despite the usefulness of the known image processing techniques there remains a need for new and improved methods of processing images to reduce or eliminate coding artifacts including, e.g., image blockiness.