To electronically transmit a relatively high-resolution image over a relatively low-band-width channel, or to electronically store such an image in a relatively small memory space, it is often necessary to compress the digital data that represents the image. For example, High-Definition-Television (HDTV) video images are compressed to allow their transmission over existing television channels. Without compression, HDTV video images would require transmission channels having bandwidths much greater than the bandwidths of existing television channels. Furthermore, to reduce data traffic and transmission time to acceptable levels, an image may be compressed before being sent over the internet. Or, to increase the image-storage capacity of a CD-ROM or server, an image may be compressed before being stored thereon.
Such image compression typically involves reducing the number of data bits necessary to represent an image. Unfortunately, many compression techniques are lossy. That is, visual information contained in the original image may be lost during compression. This loss of information may cause noticeable differences, often called visual artifacts, in the reconstructed image. In many cases, these artifacts are undesirable, and thus significantly reduce the visual quality of the reconstructed image as compared to the quality of the original image.
Referring to FIGS. 1-3, the basics of the popular block-based Moving Pictures Experts Group (MPEG) compression standards, which include MPEG-1 and MPEG-2, are discussed. For purposes of illustration, the discussion is based on using an MPEG 4:2:0 format to compress images represented in a Y, C.sub.B, C.sub.R color space, although the basic concepts discussed also apply to other MPEG formats and images represented in other color spaces, and to other block-based compression standards such as the Joint Photographic Experts Group (JPEG) standard, which is often used to compress still images. Furthermore, although many details of the MPEG standards and the Y, C.sub.B, C.sub.R color space are omitted for brevity, these details are well-known and are disclosed in a large number of available references.
Referring to FIGS. 1-3, the MPEG standards are often used to compress temporal sequences of images--which are also called video frames--such as found in a television broadcast. Each video frame is divided into areas called macro blocks, which each include one or more pixels. FIG. 1A is a 16-pixel-by-16-pixel macro block 10 having 256 pixels 12. In the MPEG standards, a macro block is always 16.times.16 pixels, although other compression standards may use macro blocks having other dimensions. In the original video frame, i.e., the frame before compression, each pixel 12 has a respective luminance value Y and a respective pair of color-, i.e., chroma-, difference values C.sub.B and C.sub.R.
Referring to FIGS. 1A-1D, before compression of the frame, the digital luminance (Y) and chroma-difference (C.sub.B and C.sub.R) values that will be used for compression, ie., the pre-compression values, are generated from the original Y, C.sub.B, and C.sub.R values of the original frame. In the MPEG 4:2:0 format, the pre-compression Y values are the same as the original Y values. Thus, each pixel 12 merely retains its original luminance value Y. But to reduce the amount of data to be compressed, the MPEG 4:2:0 format allows only one pre-compression C.sub.B value and one pre-compression C.sub.R value for each group 14 of four pixels 12. Each of these pre-compression C.sub.B and C.sub.R values are respectively derived from the original C.sub.B and C.sub.R values of the four pixels 12 in the respective group 14. Thus, referring to FIGS. 1B-1D, the pre-compression Y, C.sub.B, and C.sub.R values generated for the macro block 10 are arranged as one 16.times.16 matrix 16 of pre-compression Y values (equal to the original Y value for each pixel 12), one 8.times.8 matrix 18 of pre-compression C.sub.B values (equal to one derived C.sub.B value for each group 14 of four pixels 12), and one 8.times.8 matrix 20 of pre-compression C.sub.R values (equal to one derived C.sub.R value for each group 14 of four pixels 12). It is, however, common in the industry to call the matrices 16, 18, and 20 "blocks" of values. Furthermore, because it is convenient to perform the compression transforms on 8.times.8 blocks of pixel values instead of 16.times.16 blocks, the block 16 of pre-compression Y values is subdivided into four 8.times.8 blocks 22a-22d, which respectively correspond to the 8.times.8 blocks A-D of pixels in the macro block 10. Thus, still referring to FIGS. 1B-1D, six 8.times.8 blocks of pre-compression pixel data are generated for each macro block 10: four 8.times.8 blocks 22a-22d of pre-compression Y values, one 8.times.8 block 18 of pre-compression C.sub.B values, and one 8.times.8 block 20 of pre-compression C.sub.R values.
FIG. 2 is a general block diagram of an MPEG compressor 30, which is more commonly called an encoder 30. Generally, the encoder 30 converts the pre-compression data for a frame or sequence of frames into encoded data that represent the same frame or frames with significantly fewer data bits than the pre-compression data. To perform this conversion, the encoder 30 reduces or eliminates redundancies in the pre-compression data and reformats the remaining data using efficient transform and coding techniques.
More specifically, the encoder 30 includes a frame-reorder buffer 32, which receives the pre-compression data for a sequence of one or more frames and reorders the frames in an appropriate sequence for encoding. Thus, the reordered sequence is often different than the sequence in which the frames are generated. The encoder 30 assigns each of the stored frames to a respective group, called a Group Of Pictures (GOP), and labels each frame as either an intra (I) frame or a non-intra (non-I) frame. The encoder 30 always encodes an I-frame without reference to another frame, but can and often does encode a non-I frame with reference to one or more of the other frames in the GOP. The encoder 30 does not, however, encode a non-I frame with reference to a frame in a different GOP.
During the encoding of an I frame, the 8.times.8 blocks (FIGS. 1B-1D) of the pre-compression Y, C.sub.B, and C.sub.R values that represent the I frame pass through a summer 34 to a Discrete Cosine Transform (DCT) circuit 36, which transforms these blocks of values into respective 8.times.8 blocks of one DC coefficient and sixty-three AC coefficients. That is, the summer 34 is not needed when the encoder 30 encodes an I frame, and thus the pre-compression values pass through the summer 34 without being summed with any other values. As discussed below, however, the summer 34 is often needed when the encoder 30 encodes a non-I frame. A quantizer 38 limits each of the coefficients to a respective maximum value, and provides the quantized AC (nonzero frequency) and DC (zero frequency) coefficients on respective paths 40 and 42. A predictive encoder 44 predictively encodes the DC coefficients, and a variable-length coder 46 converts the quantized AC coefficients and the quantized and predictively encoded DC coefficients into variable-length codes, such as Huffman codes. These codes form the encoded data that represent the pixel values of the encoded I frame. A transmit buffer 48 then temporarily stores these codes to allow synchronized transmission of the encoded data to a decoder (discussed below in conjunction with FIG. 3). Alternatively, if the encoded data is to be stored instead of transmitted, the coder 46 may provide the variable-length codes directly to a storage medium such as a CD-ROM.
If the I frame will be used as a reference (as it often will be) for one or more non-I frames in the GOP, then, for the following reasons, the encoder 30 generates a corresponding reference frame by decoding the encoded I frame with a decoding technique that is similar or identical to the decoding technique used by the decoder (FIG. 3). When decoding non-I frames that are referenced to the I frame, the decoder has no option but to use the decoded I frame as a reference frame. Because MPEG encoding and decoding are lossy, the pixel values of the decoded I frame will often be different than the pre-compression pixel values of the I frame. Therefore, using the pre-compression I frame as a reference frame during encoding may cause additional differences in the decoded non-I frame because the reference frame used for decoding (decoded I frame) would be different than the reference frame used for encoding (pre-compression I frame).
Therefore, to generate a reference frame for encoding that will be similar to or the same as the reference frame used for decoding, the encoder 30 includes a dequantizer 50 and an inverse DCT circuit 52, which are designed to mimic the dequantizer and inverse DCT circuit of the decoder (FIG. 3). The dequantizer 50 dequantizes the quantized DCT coefficients from the quantizer 38, and the circuit 52 transforms the dequantized DCT coefficients back into corresponding 8.times.8 blocks of Y, C.sub.B, and C.sub.R pixel values. Because of the losses incurred during quantization and dequantization, however, some or all of these decoded pixel values may be respectively different than the corresponding pre-compression pixel values. These decoded pixel values then pass through a summer 54 (used when generating a reference frame from a non-I frame as discussed below) to a reference-frame buffer 56, which stores the reference frame.
During the encoding of a non-I frame, the encoder 30 initially encodes each macro-block of the non-I frame in at least two ways: in the manner discussed above for I frames, and using motion prediction, which is discussed below. The encoder 30 then saves and transmits the resulting code having the fewest bits. This technique insures that the macro blocks of the non-I frames are always encoded using the fewest bits.
With respect to motion prediction, an object in a frame exhibits motion if its relative position changes in the succeeding frames. For example, a horse exhibits relative motion if it gallops across the screen. Or, if the camera follows the horse, then the background exhibits relative motion. Generally, each of the succeeding frames in which the object appears contains at least some of the same macro blocks of pixels as the preceding frames. But such matching macro blocks in the succeeding frame often occupy respective frame locations that are different than the respective frame locations they occupy in the preceding frames. Alternatively, a macro block that includes a portion of a stationary object (e.g., tree) or background scene (e.g., sky) may occupy the same frame location in a succession of frames. In either case, instead of encoding each frame independently, it takes fewer data bits to say "locations X and Z of frame #1 (non-I frame) contain the same macro blocks that are in locations S and T, respectively, of frame #0 (I frame)." This "statement" is encoded as a motion vector. For a stationary or relatively slow-moving object or background scene, the motion vector is merely set near or equal to zero.
More specifically and still referring to FIG. 2, during the encoding of a non-I frame, a motion predictor 58 compares the pre-compression Y values (the C.sub.B and C.sub.R values are not used during motion prediction) of macro blocks in the non-I frame with the decoded Y values of macro blocks in the reference frame to identify matching macro blocks. For each macro block in the non-I frame for which a match is found in the reference frame, a motion predictor 58 generates a motion vector that specifies the location of the matching macro block in the reference frame. Thus, as discussed below in conjunction with FIG. 3, during decoding of these macro blocks of the non-I frame, the decoder uses the motion vectors to obtain the pixel values for these macro blocks from the matching macro blocks in the reference frame. The predictive encoder predictively encodes the motion vectors, and the coder 46 generates codes for the predictively encoded motion vectors and provides them to the transmit buffer 48.
Furthermore, because a macro block in the non-I frame and a matching macro block in the reference frame are often similar but not identical, the encoder 30 encodes these differences along the with motion vector so the decoder can account for them. More specifically, the motion predictor 58 provides the decoded Y values of the matching macro block of the reference frame to the summer 34, which effectively subtracts, on a pixel-by-pixel basis, these Y values from the pre-compression Y values of the matching macro block of the non-I frame. These differences, which are called residuals, are arranged in 8.times.8 blocks and are processed by the DCT circuit 36, the quantizer 38, the coder 46, and the buffer 48 in a manner similar to that discussed above, except that the quantized DC coefficients of the residual blocks are not predictively encoded by the predictive encoder 44.
Additionally, it is possible to use a non-I frame as a reference frame. When the non-I frame will be used as a reference frame, the quantized residuals from the quantizer 38 are respectively dequantized and inverse transformed by the dequantizer 50 and the inverse DCT circuit 52 so that this non-I reference frame will be the same as the one used by the decoder for the reasons discussed above. The motion predictor 58 provides the decoded Y values of the reference I frame from which the residuals were generated to the summer 54, which adds the respective residuals from the circuit 52 to these decoded Y values of the reference I frame to generate the respective Y values of the reference non-I frame. The reference-frame buffer 56 then stores the reference non-I frame along with the reference I frame for use in encoding subsequent non-I frames.
Still referring to FIG. 2, the encoder 30 also includes a rate controller 60 to insure that the transmit buffer 48, which typically transmits the encoded frame data at a fixed rate, never overflows or empties, i.e., underflows. If either of these conditions occurs, errors may be introduced into the encoded data. For example, if the buffer 48 overflows, data from the coder 46 is lost. Thus, the rate controller 60 uses feed back to adjust the quantization scaling factors used by the quantizer 38 based on the degree of fullness of the transmit buffer 48. The more full the buffer 48, the larger the controller 60 makes the scale factors, and the fewer data bits the quantizer 40 generates. Conversely, the more empty the buffer 48, the smaller the controller 60 makes the scale factors, and the more data bits the quantizer 40 generates. This continuous adjustment insures that the buffer 48 neither overflows nor underflows.
FIG. 3 is a block diagram of a conventional MPEG decompressor 60, which is more commonly called a decoder 60 and which can decode frames that are encoded by the encoder 30 of FIG. 2.
For I frames and macro blocks of non-I frames that are not motion predicted, a variable-length decoder 62 decodes the variable-length codes received from the encoder 30. A prediction decoder 64 decodes the predictively encoded DC coefficients, and a dequantizer 65, which is similar or identical to the dequantizer 50 of FIG. 2, dequantizes the decoded AC and DC coefficients. An inverse DCT circuit 66, which is similar or identical to the inverse DCT circuit 52 of FIG. 2, transforms the dequantized coefficients into pixel values. The decoded pixel values pass through a summer 68 (which is used during the decoding of motion-predicted macro blocks of non-I frames as discussed below) into a frame-reorder buffer 70, which stores the decoded frames and arranges them in a proper order for display on a video display unit 72. If the I frame is used as a reference frame, it is also stored in the reference-frame buffer 74.
For motion-predicted macro blocks of non-I frames, the decoder 62, dequantizer 65, and inverse DCT 66 process the residuals as discussed above. The prediction decoder 64 decodes the motion vectors, and a motion interpolator 76 provides to the summer 68 the pixel values from the macro blocks in the reference frame that the motion vectors point to. The summer 68 adds these reference pixel values to the residuals to generate the pixel values of the decoded macro blocks, and provides these decoded pixel values to the frame-reorder buffer 70. If the non-I frame is used as a reference frame, it is stored in the reference-frame buffer 74.
A more detailed discussion of the MPEG encoder 30 and decoder 60 of FIGS. 2 and 3, respectively, is available in many publications including "Video Compression" by Peter D. Symes, McGraw-Hill, 1998. Furthermore, there are other well-known block-based compression techniques for encoding and decoding images.
Referring to FIG. 1A, a problem with block-based compression techniques such as the MPEG standard is that the loss of visual information during compression may cause some or all of the respective boundaries between the 8.times.8 pixel blocks A-D and between contiguous macro blocks 10 to be noticeable to a viewer. More specifically, the compression losses may cause an abrupt change in the pixel values across a boundary, thus making the boundary visible. Such a visible boundary is often described as "blocky" or as exhibiting a "blocky" artifact, and the process of reducing the severity of blocky artifacts, i.e., making blocky boundaries invisible to a viewer, is often called deblocking.
Some references, including C. Reeve and J. S. Lim, "Reduction of Blocking Effects in Image Coding," Optical Engineering, Vol. 23, No. 1, January/February 1984, pp. 34-37, and, N. Ngan, D. W. Lin, and M. L. Liou, "Enhancement of Image Quality for Low Bit Rate Video Coding," IEEE Transactions on Circuits and Systems, Vol. 38, No. 10, October 1991, pp. 1221-1225, disclose deblocking techniques that are implemented during image encoding. But most images and video sources are encoded according to internationally agreed-upon compression standards such as MPEG, so altering the encoding algorithms is impractical if not impossible if one wishes to design an encoding system that complies with one or more of these standards.
Other references, including T. O'Rourke, R. Stevenson, "Improved Image Decompression for Reduced Transform Coding Artifacts," IEEE Transactions On Circuits And Systems For Video Technologies, Vol. 5, No. 6, December 1995, and Y. Yang et al, "Projection-Based Spatially Adaptive Reconstruction of Block-Transform Compressed Images," IEEE Transactions on Image Processing, Vol. 4, No. 7, July 1995, disclose deblocking techniques that are implemented during image decoding. For example, O'Rourke et al. describe a statistical discontinuity-preserved image model and a statistical image compression model, and a technique for generating maximum a posteriori (MAP) estimations of boundary pixels given based on these two models. O'Rourke then estimates the values of the boundary pixels by iteratively solving a convex constrained optimization problem. Similarly, the Yang reference assumes that changes in neighboring pixel values, i.e., the values of pixels on either side of a boundary, should be at a minimum, and then, like O'Rourke, proceeds to estimate the values of the boundary pixels by iteratively solving the convex constrained optimization problem. But such techniques often require too much computation time for implementation in a real-time system. Additionally, such techniques often operate on boundaries that are not blocky. Unfortunately, when such techniques are applied to boundaries that are not blocky, the quality of the image may be degraded because generally, the assumption made by such techniques is that the difference between a pixel and its neighboring pixels should be small. Although such an assumption is correct some of the time, it is frequently incorrect, particularly in areas of an image including object edges.
Still other references describe deblocking techniques that employ low-pass filters along the block boundaries. Unfortunately, such low-pass filtering may lead to blurring at the block boundaries. Some of these techniques, such as that described in Ramamurthi and A. Gersho, "Nonlinear Space-Variant Post-processing of Block Coded Images," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-34, No. 5 October 1986, pp. 1258-1268, attempt to avoid blurring the boundaries by estimating the values of the boundary pixels in the original image and then adaptively choosing different types of filters to preserve the sharpness of the boundaries in the original image. Unfortunately, accurately estimating original boundary values from a highly compressed image may be very difficult because the quality of the decoded image is often inadequate for accurate boundary-value estimation. Furthermore, like some of the techniques described above, these techniques often operate on all of the boundaries in an image whether they are blocky or not, and thus may unnecessarily degrade the quality of the image or may be too computationally intensive for many applications.