The present invention relates generally to image coding and more particularly to compression and decompression of scalable and content-based, randomly accessible digital still images.
The fast growth of Internet and digital multimedia applications has created a consistent and growing demand for new image coding tools that reduce the usually large and cumbersome raw image data files into a compressed form. Compactness of the resulting bit-stream, however, is no longer the only requirement asked of developers when devising new coding tools. End users and their applications are increasingly demanding features like scalability, error robustness and content-based accessibility.
Photographs or motion picture film are two-dimensional representations of three-dimensional objects viewed by the human eye. These methods of recording two-dimensional versions are xe2x80x9ccontinuousxe2x80x9d or xe2x80x9canalogxe2x80x9d reproductions. Digital images are discontinuous approximations of these analog images made up or a series of adjacent dots or picture elements (pixels) of varying color or intensity. On a computer or television monitor, the digital image is presented by pixels projected onto a glass screen and viewed by the operator. The number of pixels dedicated to the portrayal of a particular image is called its resolution i.e. the more pixels used to portray a given object, the higher its resolution.
A monotone imagexe2x80x94black and white images are called xe2x80x9cgrayscalexe2x80x9dxe2x80x94of moderate resolution might consist of 640 pixels per horizontal line. A typical image would include 480 horizontal rows or lines with each of these containing 640 pixels per line. Therefore, a total of 307,200 pixels are displayed in a single 640xc3x97480 pixels image. If each pixel of the monotone image requires one byte of data to describe it (i.e. either black or white), a total of 307,200 bytes are required to describe just one black and white image. Modern gray scale images use different levels of intensity to portray darkness and thus use eight bits or 256 levels of gray. The resulting image files are therefor correspondingly larger.
For color images, the color of each pixel in an image is typically determined by three variables: red (R), green (G), and blue (B). By mixing these three variables in different proportions, a computer can display different colors of the spectrum. The more variety available to represent each of the three colors, the more colors can be displayed. In order to represent, for example, 256 shades of red, an 8-bit number is needed. The range of the values of such a color is thus 0-255. The total number of bits needed to represent a pixel is therefor 24 bitsxe2x80x948 bits each for red, green, and blue, commonly known as RGB888 format. Thus, a given RGB picture has three planes, the red, the green, and the blue, and the range of the colors for each pixel in the picture is 0-16.78 million, or Rxc3x97Gxc3x97B=256xc3x97256xc3x97256. A standard color image of 640xc3x97480 pixels therefor, requires approximately 7.4 megabits of data to be stored or represented in a computer system. This number is arrived at by multiplying the horizontal and vertical resolution by the number of required bits to represent the full color rangexe2x80x94640xc3x97480xc3x9724=7,372,800 bits.
Standard, commonly available hardware, while increasingly fast and affordable, still finds files of this size slow and unwieldy. This is especially true in the case of interactive applications and Internet use. Interactive applications demand very fast multi-directional processing of multimedia data. Given their persistently large size, image files have been a rate limiting factor in the development of realistic, interactive computer applications. In the case of the Internet, end-users and applications are further limited by the slow pace of modems and other transmission media. For example, the amount of information currently capable of being transmitted over a telephone line in the interval of one second is restricted to 33,600 bits-per second due to the actual wires and switching functions used by the typical telephone company. Therefore, a single, full color RGB888 640xc3x97480 pixel page, with its 7,372,800 bits of data would take approximately three and one half minutes to transfer at this baud rate.
Many methods of compressing image data exist and are well known to those skilled in the art. Some of these methods are as xe2x80x9closslessxe2x80x9d compression; that is, upon decoding and decompressing they restore the original data without any loss or elimination of data. Because their relative reduction ratios are small however, these lossless techniques cannot satisfy all the current demands for image compression technologies. Other compression methods exist that are nonreversible and known as xe2x80x9clossyxe2x80x9d. These nonreversible methods can offer considerable compression, but do result in a loss of data. In image files, the high compression rates are actually achieved by eliminating certain aspects of the image, usually those to which the human eye has limited or no sensitivity. After coding, an inverse process is performed on the reduced data set to decompress and restore a reasonable facsimile of the original image. Lossy compression techniques may also be combined with lossless methods for a variable mix of data compression and image fidelity.
Compactness of a compressed bit-stream is usually measured by the size of the stream in comparison to the size of the corresponding uncompressed image data. A quantitative measure of the compactness is the compression ratio, or alternatively, the bit-rate where:
compression ratio=(total bytes of the original raw image data)/(total bytes required for compressed image)
and
bit-rate=(total bytes required for decompression)/(pixel number of the original image)
In general, the higher the compression ratio (or the lower the bit-rate), the higher the compactness of a compressed bit-stream. Compactness has been always a primary concern for all data compression techniques.
One of the most popular formats for compressed image files is the GIF format. GIF stands for xe2x80x9cGraphic Image Formatxe2x80x9d, and was developed by Compuserve to provide a means of passing an image from one dial-up customer to another, even across different computer hardware platforms. It is a relatively old format, and was designed to handle a palette of 256 colorsxe2x80x948 bit as opposed to 24 bit color. When developed, this was near state of the art for most personal computers.
The xe2x80x9cGIFxe2x80x9d format uses an 8 bit Color Look Up Table (sometimes called a CLUT) to identify color values. If the original image is an 8 bit, gray-scale photo, then the xe2x80x9cGIFxe2x80x9d format produces a compressed lossless image file. A gray scale image typically has only 256 levels of gray. The operative compression is accomplished by the xe2x80x9cRun Length Encodingxe2x80x9d (RLE) mechanism of compressing the information while saving a GIF file. If the original file were a 24 bit color graphic image, then it would first be mapped to an 8 bit CLUT, and then compressed using RLE. The loss would be in the remapping of the original 24 bit (16.7 million) colors to the limited 8 bit (256 colors) CLUT. RLE encoding would reproduce an uncompressed image that was identical to the remapped 8 bit image, but not the same as the original 24 bit image. RLE is not an efficient way of compressing an image when there are many changes in the coloration across a line of pixels. It is very efficient when there are rows of pixels with the same color or when a very limited number of colors is used.
The other de facto standard of still image formats is the JPEG format. JPEG stands for Joint Photographic Experts Group. JPEG uses a lossy compression method to create the final file. JPEG files can be further compressed than their GIF relations, and they can maintain more color depth than the 8 bit table used in the GIF format. Most JPEG compression software provides the user with a choice between image quality, and the amount of compression. At compression ratios of 10:1 most images look very much like the original, and maintain excellent full color rendition. If pressed to 100:1 the images tend to contain blocky image artifacts that substantially reduce quality. Unlike GIF, JPEG does not use RLE alone to compress the image, it uses a progressive set of tools to achieve the final file.
JPEG first changes the image from its original color space to a normalized color space (a lossy process) based on the luminance and chrominance of the image. Luminance corresponds to the brightness information while chrominance corresponds to hue information. Testing has indicated that the human eye is more sensitive to changes in brightness than changes in color or hue. The data is reordered in 8xc3x978 pixel blocks using the Discrete Cosine Transform (DCT), and this too produces some image loss. It effectively re-samples the image in these discrete areas, and then uses a more standard RLE encoding (as well as other encoding schemes) to produce the final file. The higher the ratio of encoding, the more image loss, and the 8xc3x978 pixel artifacts become more noticeable.
One of the requirements of evolving technologies is that they possess the characteristic/attribute of scalability. Scalability measures the extent to which a compressed bit-stream is capable of being partially decoded and utilized at the terminal end of the transmission. In meeting this need of progressive processing, scalability has become a standard requirement for the new generation of digital image coding technology. Typically, scalabilities in terms of pixel precision and of spatial resolution are, among others, two basic requirements for still image compression.
To achieve scalability while ensuring image fidelity, recent developments in image compression technology have incorporated multi-resolution decompositions based upon xe2x80x9cwaveletsxe2x80x9d. Wavelets are mathematical functions, first widely considered in academic applications only after the Second World War. The name wavelet is derived from the fact that the basis functionxe2x80x94or the xe2x80x9cmother waveletxe2x80x9d generally integrates to zero, thus xe2x80x9cwavingxe2x80x9d about the x-axis. Other characteristics, like the fact that wavelets are orthornormal or symmetric, ensure quick and easy calculation of the direct and inverse wavelet transform i.e. especially useful in decoding.
Another important advantage to wavelet based transforms is the fact that many classes of signals or images can be represented by wavelets in a more compact way. For example, images with discontinuities and images with sharp spikes usually take substantially fewer wavelet basis functions than sine or cosine based functions to achieve the same precision. This implies that wavelet-based method has potential to get a higher image compression ratios. For the same precision, the images that are reconstructed from wavelet coefficients look better than the images obtained using a Fourier (sine or cosine) transform. This appears to indicate that the wavelet scheme produces images more closely sympathetic to the human visual system.
A wavelet transforms the image into a coarse, low resolution version of the original and a series of enhancements that add finer and finer detail to the image. This multi-resolution property is well suited for networked applications where scalability and graceful degradation are required. For example, a heterogeneous network may include very high bandwidth parts as well as 28.8 modem connections and everything in between. It would be nice to send the same video signal to all parts of the network, dropping finer details and sending a low resolution image to the parts of the network with low bandwidth. Wavelets are well suited to this application by wrapping the coarse, low resolution image in the highest priority packets which would reach the entire network. The enhancements belong in lower priority packets that may be dropped in lower bandwidth parts of the network.
This multi-resolution property of the coded image also supports graceful degradation in a noisy communications channel such as a wireless network or a sick network. The high priority packets containing the low resolution base image would be retransmitted while the enhancements would be discarded if errors occur.
Content-based coding and accessibility is a further, new dimension within the realm of image compression. The ability to specify and manipulate specific regions of an image is not supported by previously disclosed coding techniques such as JPEG. Nor is content-based random accessibility a claimed functionality within any of new wavelet based technologies. End user applications that require this feature include multimedia database query, Internet server-client interaction, image content production and editing, remote medical diagnostics, and interactive entertainment, to name a few.
Content-based query to multimedia databases requires the support of the mechanism that locates those imagery materials where an interested object is present. Content-based hyperlink to Internet or local disk sites makes desired objects within an image serve as entry points for information navigation. Content-based editing enables a content producer to manipulate the attributes of the image materials in an object-oriented or region-based manner. Content-based interaction allows a digital content subscriber or a remote researcher to selectively control the image information transmission based on their regions of interest. In short, this content-based accessibility allows semantically meaningful visual objects to be used as the basis for image data representation, explanation, manipulation, and retrieval.
It is an object of the present invention to provide region-based coding in image compression. In accordance with an aspect of the instant invention there is provided a region-based method for encoding and decoding digital still images to produce a scalable, content accessible compressed bit stream comprising the steps: decomposing and ordering the raw image data into a hierarchy of multi-resolution sub-images; determining regions of interest; defining a region mask to identify regions of interest; encoding region masks for regions of interest; determining region masks for subsequent levels of resolution; and scanning and progressively sorting the region data on the basis of the magnitude of the multi-resolution coefficients.
In accordance with a further aspect of the instant invention there is provided an apparatus for the region-based encoding and decoding of digital still images that produces a scalable, content accessible compressed bit stream comprising: a means of decomposing and ordering the raw image data into a hierarchy of multi-resolution sub-images; means of determining regions of interest; means of defining a region mask to identify regions of interest; means of encoding region masks for regions of interest; means of determining region masks for subsequent levels of resolution; and a means for scanning and progressively sorting the region data on the basis of the magnitude of the multi-resolution coefficients.
In accordance with yet a further aspect of the instant invention there is provided a region-based system for encoding and decoding digital still images that produces a scalable, content accessible compressed bit stream and comprises the steps: decomposing and ordering the raw image data into a hierarchy of multi-resolution sub-images; determining regions of interest; defining a region mask to identify regions of interest; encoding region masks for regions of interest determining region masks for subsequent levels of resolution; and scanning and progressively sorting the region data on the basis of the magnitude of the multi-resolution coefficients