1. Field of the Invention
The method and apparatus of the present invention relates to the storage, display and playback of digitized visual images, and more particularly, relates to an apparatus and method for compressing and decompressing digitized visual images.
2. Art Background
Many digital devices, such as computers and video cassette recorders (VCR), are used to store and display visual images, for example, in the form of movies. Digitized visual images require large storage capacities. One frame of a movie requires 153,600 (320*240.times.2) bytes if represented in decompressed form for RGB 16 display modes. At 15 frames per second, one second of a movie requires 2,304,000 bytes of storage. Since the amount of storage in digital devices is limited, and since digital devices have limited internal bandwidth, there is a need to represent digitized images in compressed form. The proliferation of digital video in multimedia applications has intensified this need.
One prior art technique to compress digitized monochrome visual images is known as Block Truncation Coding (BTC), first described by Delp and Mitchell. See, E. J. Delp and O. R. Mitchell, "Image Compression Using Block Truncation Coding," IEEE Transactions on Communications, Vol. COM-27, No. 9,pp. 1335-1342, September 1979 (hereinafter the "Delp Method"). This method takes advantage of the human eye's tendency to perceive the average value of fine detail within a small area when viewing that small area from a distance. If the fine detail is represented with information on that preserves the average and standard deviation of the original information, the human eye will not perceive any information loss.
BTC has been extended to color images. The general color BTC method includes the following steps:
1. Decompose the image into non-overlapping n.times.m blocks. PA1 2. For each block, find the mean luminance value, Y.sub.avg, and perform steps 3 to 6. PA1 3. For all pixels in a block whose luminance values are less than or equal to Y.sub.avg., find the average color C.sub.low. PA1 4. For all pixels in a block whose luminance values are greater than Y.sub.avg., find the average color C.sub.high. PA1 5. Construct a binary pattern for each block by representing pixels associated with C.sub.low as a "0" in the binary pattern, and pixels associated with C.sub.high as a 1. PA1 6. Code the block with its two representative color values and the binary pattern, a "0" or a "1" for each pixel.
See, G. Campbell, et al., "Two Bit/Pixel Full Color Encoding," in ACM Computer Graphics, Vol. 20, No. 4, pp. 215-223, Dallas, August 1986). Thus, each pixel within a block is represented as one of two colors. Space is saved because the block is encoded with two colors, four bytes for two RGB 16 colors, to signify the value of the binary pattern plus the 1 bit per pixel in the binary pattern. For example, a block of 4.times.4 pixels is encoded with four bytes to represent the two colors plus 16 bits, one for each pixel in the block. Thus, the block is encoded with a total of 48 bits or 6 bytes. Without BTC, each pixel is encoded with 16 bits required to represent its RGB 16 color value, a total of 256 bits for a block of 4.times.4 pixels. Thus, BTC results in a compression ratio of 256/48 or 16:3 for a block of 4.times.4 pixels.
The methods that decompose an image into blocks, encode the block as two representative values and encode the pixels in the block as a binary pattern are generally referred to as Binary Pattern Image Coding (BPIC). The block size that can be employed using BPIC depends upon the viewing distance to the display monitor, the pixel resolution and physical dimensions of the display monitor. For instance, see, D. Chen and A. C. Bovik, "Visual Pattern Image Coding," IEEE Transactions on Communications, Vol. COM-38, No. 12, pp. 2136-2146, December 1990. Since neither the color distribution of the images to be compressed nor the extent of information variation with each block are usually known before compression occurs, the block size required to maintain an acceptable level of information is difficult to estimate.
An example of a variation of BPIC encoding is disclosed in U.S. patent application Ser. No. 07/965,580 assigned to IBM by Arturo Rodriguez, Mark Pietras and Steven Hancock, filed October, 1992 and titled "HYBRID VIDEO COMPRESSION SYSTEM AND METHOD CAPABLE OF SOFTWARE-ONLY DECOMPRESSION IN SELECTED MULTILMEDIA SYSTEMS." The 07/965,580 application divides an image into regions and examines each region with a homogeneity test. If the region is homogeneous, it is encoded as one color. If the region is not homogeneous, the region is either encoded with BPIC or is divided into quadrants and the quadrants are encoded as homogeneous regions or with BPIC.
Since the method disclosed in the Ser. No. 07/965,580 patent application does not decompose a block beyond quadrants, it suffers from the same limitations as other non-recursive (BPIC) methods. Specifically, since neither the color distribution of the images to be compressed nor the extent of information variation with each block are usually known before compression occurs, the block or quadrant size that is required to maintain an acceptable level of information is difficult to estimate. Ideally, it is desirable to select the block size adaptively according to the information content of the local image region.
Roy and Nasarabadi suggested the use of a method that begins with large blocks, and employs the BTC technique recursively on smaller and smaller blocks by decomposing the block into quadrants, until an appropriate resolution of low information variation is found. See, J. U. Roy and N. M. Nasarabadi, "Hierarchical Block Truncation Coding," Optical Engineering, Vol. 30, No. 5, pp. 551-556.
Kamel uses a similar method, using an interval, [Y.sub.avg -t, Y.sub.avg +t], around the average luminance value of the block to find the best threshold value that minimizes the color mean-square error. See, M. Kamel, C. T. Sun, and L. Guan, "Image Compression By Variable Block Truncation Coding With Optimal Threshold," IEEE Trans. on Signal Processing, Vol. 39, No. 1, pp. 208-212, January 1991.
Recursive block decomposition results in superior compression than non-recursive methods. By adaptively selecting the minimum number of colors required to represent the block, the image may initially be divided into arbitrarily large regions. Encoding a large homogeneous region as one color results in greater compression than unneccessarily encoding the same large region as a set of smaller regions with the same color since each region must be encoded with that color, which occupies a fixed number of bytes. For example, if a large region is represented as one color, the region is encoded with a two byte color value. If the same region is encoded as 4 regions, each region must be encoded with the two byte color. Thus, in this example, representing the large region as one color saves 6 bytes.
Non-recursive methods such as that described in the Ser. No. 07/965,580 application can not encode arbitrarily large regions since, as previously described, the regions into which the image is to be divided must be chosen to be sufficiently small to avoid unacceptable information loss. For an arbitrary image, this choice may not prove optimal since the image may contain single color regions larger than the chosen block size. By dividing these single color regions into smaller regions due to the initial choice of region size, non-recursive methods result in sub-optimal compression.
Although the techniques that recursively divide the image into smaller and smaller blocks result in greater compression than non-recursive methods, the prior art recursive methods have side effects that tend to reduce the quality of the picture. Dividing up the picture into smaller and smaller blocks imposes an artificial block structure on the image and the decompressed image thus tend to appear as if an artificial structure has been imposed upon it.
As will be described, the method and apparatus of the present invention does not impose an artificial structure on an image. The method allows arbitrary shaped objects within a block to retain their original boundaries instead of superimposing the quadrant decomposition structure of the previous suggested approaches. The method of the current invention thus tends to retain high image quality because the appropriate use of a single color is validated.
To validate the appropriateness of encoding a distribution as a single color, the method of the present invention employs a homogeneity test such as that disclosed in the Ser. No. 07/965,580 patent application. As previously described, the method disclosed in the Ser. No. 07/965,580 patent application divides an image into regions and examines each region with a homogeneity test. If the region is homogeneous, it is encoded as one color. If the region is not homogeneous, the region is either encoded with BPIC or is divided into quadrants and the quadrants are encoded as homogeneous regions or with BPIC.
Unlike the method disclosed in the Ser. No. 07/965,580 patent application, the method and apparatus of the present invention recursively employs a homogeneity test to resolve a block into a series of homogeneous distributions. As will be disclosed, the method and apparatus of the present invention decomposes frames by splitting pixel distributions by luminance characteristics, unlike prior art techniques which split images recursively into smaller and smaller blocks. Thus, as previously stated, the method and apparatus of the present invention does not impose an artificial structure on the image to be compressed, and thus significantly enhances the quality of the image when decompressed.
Apart from BPIC techniques, another suggested compression technique is based on representing groups of pixels by separate luminance and chrominance values for the group. The International Radio Consultative Committee (CCIR) has prescribed a particular coding methodology for this method. Successive two by two pixel regions of a digitized frame are encoded using an individual one byte luminance value for each pixel, and two bytes of representative color difference (chrominance) values for the four pixel region. Thus, instead of representing the four pixels as 12 bytes, 3 bytes per pixel for RGB24 display modes, the four pixels are represented as 6 bytes for a compression ratio of 50%. For an effective system to convert this compressed signal back to original RGB24 display data, see U.S. Pat. No. 5,262,847, issued Nov. 16, 1993 to Arturo Rodriguez et al.
Thus, there is a need for a compression method that achieves high compression ratios while simultaneously reducing the introduction of distortion in the compressed image. As will be described more fully below, the method and apparatus of the present invention will typically result in a much higher compression ratio than the CCIR encoding methodology.
As will be described, the method and apparatus of the present invention retains the benefits of recursive BPIC techniques without imposing an artificial structure on images. Thus, the method and apparatus of the present invention results in a combination of better image quality, better compression, low complexity, while simultaneously retaining ease of decompression.