1. Field of the Invention
The present invention relates generally to the field of data compression, and more particularly to a method and apparatus for compressing video image data. The method and apparatus of the present invention compress image data such that a high degree of compression is achieved, and when an image is recreated from the compressed data the resultant image is a high quality, visually appealing representation of the original image.
2. Description of the Prior Art
Data compression is of particular economic importance in the art of image storage and transmission. Images naturally require a large amount of data. A moderate resolution monotone image, by today's standards, might consist of 640 picture elements, referred to as "PELs" per horizontal line, and 480 horizontal lines for a total of 307,200 PELs, and if each PEL of the monotone picture requires one byte of data to describe it, a total of 307,200 bytes are required to describe just one image. To place this in perspective, a double-side, double-density, five-and-one-quarter inch floppy disk could hold only a single such image. Further, at 9600 bits per second, the maximum current practical data limit for Public Switched Telephone Network ("PSTN", also "dial") lines, the time to send a single such image is 256 seconds, without overhead or retransmissions for error correction. Lack of data compression in the transmission of images forces the user to spend great amounts of time and money storing and sending and receiving images.
Many methods of compressing image data exist and are well known to those skilled in the art. Some of these methods are reversible, also called lossless, in that they reverse exactly upon decoding (decompressing) to exactly restore the original data without loss. Other methods are non-reversible. These non-reversible methods offer considerable compression. Further, as will be discussed later, they can combine with reversible methods for even more compression. But, these non-reversible methods cause loss of data. In some cases, such as eliminating excessive grey levels that are beyond the discrimination ability of the human visual system, or in doing the same for excessive color information, the picture degradation caused by the loss may only be relevant if electronic image processing is to be done. (Electronic image processing can do digital subtractions of one image from another or can intensify differences within an image and thus can make evident to the human visual system that which was invisible without processing.) But in general, the loss caused by these compression techniques is all too noticeable. There are two large groups of non-reversible compression techniques. One group reduces the data by applying transforms, such as a cosine transform, to local areas, or truncates or eliminates various of the resulting coefficients, thus reducing the data, and uses this reduced data set to perform the inverse transform for decompression. These methods are good at eliminating changes with a high spatial "frequency" and these changes generate substantial amounts of the image data. The compression ratios are good, but the ability to convey fine detail in the image is severely impaired. Also, these methods are very computation intensive, requiring heavy CPU power and/or much computation time. Another group of techniques apply variable coding techniques whereby one or more codes apply to a collection of PELs. One such prior art technique which produces substantial compression with substantial image quality degradation is block truncation coding. The image is divided into blocks, usually square (N by N PELs), and each block is analyzed to find a simple way to code it in less data than are required by PEL-by-PEL coding. The data within each block are analyzed and compressed using predetermined compression criterion. Compressed data for each block are transmitted and subsequently processed at the receiving location.
Various prior art image compression methods have attempted to maintain a high degree of image integrity by using various block shapes or smaller block sizes. Most block shapes, whether square, rectangular, or triangular, create a characteristic block border grid pattern through out the reconstructed image. The block border grid is a false contouring phenomenon caused by truncation of continuous grey level images to discrete grey level images creating abrupt changes in contrast and/or color from block to block. This characteristic is accentuated by the fact that the human eye is proficient at pattern detection. The linearity of the false contour grid pattern created in most data compression can be easily detected by one's eye and is very undesirable when viewing images.
Prior art systems of encoding data have typically encoded and transmitted each block of an image in one of three representations: 1. solid tone representation of each block, 2. bi-tonal representation of each block, or 3. all of the original image data of each block. Images are sub-divided into blocks and each block analyzed to determine which on of the three representations above best describes each block of the image. Based on decision criteria, if a block is substantially solid, the block is encoded entirely as a solid, all PELs in the block being assigned a single value representing the PEL "average". If the block is substantially bi-tonal, the block is encoded in two tones (or values) and includes a block bit map. The block bit map consists of one bit per PEL of the block and tells the decoder which of the two values (or tones) should be assigned to each PEL. If the block fails to qualify as either solid or bi-tonal, then all of the data for the block is transmitted, PEL-by-PEL, a very data and therefore storage space and time intensive process.
Additionally, most block coding compression methods compress each block of data independent of the surrounding blocks. By independently compressing each image block the false contour grid pattern is further enhanced. For example, it is common that when an image block is compressed without regard for its nearest neighbors a continuous gradient that existed across the span of blocks prior to compression becomes a series of discrete levels with an abrupt step at each border. Truncation of continuous contrast areas results in enhancing the false contrast grid pattern.
Prior art image compression systems have tried to reduce the obviousness of the characteristic grid by applying "smoothing" processes. These processes adjust the PEL values to reduce the sharp edge effects where the blocks meet. In an attempt to refine the block smoothing process many complex smoothing methods have been developed. While many of these smoothing, processes do a fair job of smoothing they are burdened with time intensive computations and often do so much adjusting that they result in images which appear fuzzy. Even though these techniques have helped to reduce the obviousness of the false contrast grid pattern, in many images the pattern is still obvious. Further, the typical smoothing process degrades a compressed image by smoothing high contrast image borders such that image borders are no longer distinct and appear to be out of focus or fuzzy in the reconstructed image. Fine detail is lost, just as was the case in the use of transforms.
In contrast, the present invention overcomes the foregoing limitations providing a block-style method which minimizes image degradation, yet provides substantial compression.
It has been discovered that the characteristic block grid can be reduced by using sub-areas with certain specific properties. The sub-area must be small in order to minimize the block effects referred to earlier. While this might appear to reduce compression, since for example, the compression gain on a solid type block is directly equal to the number of PELS in the block, in fact the selection of a small block can increase compression, because as the blocks grow larger, more and more blocks must be coded as bi-tonal, wherein compression is reduced by a factor of two-to-three from solid block compression, or even worse PEL-by-PEL, wherein compression is worse than unity.
The sub-areas should be maximally compact. For a situation where images are roughly statistically isotropic (have no preferred directions as far as image detail is concerned) and the PELs are square (have equal spacing center-to-center in horizontal and vertical directions), then the preferred shape is a circle. Since image PEL-to-PEL correlation statistically increases (for typical images containing visual information from a human perspective) as PEL-to-PEL distance decreases, a round sub-area offers the highest possible PEL-to-PEL correlations on a statistical basis, thus the best chance of achieving maximum compression with minimum image degradation.
The sub-areas should completely tile the image area. Otherwise, some PELs will be left out of the collection of all sub-areas. Either the data of these PELs will be lost, leaving "holes" in the image, or they will have to be handled on a special basis. If there were any significant quantity of such PELs, the special handling would decrease the effective compression.
The sub-areas should tile in such a way that linear inter-sub-area borders of any significant length are minimized or eliminated. In other words the sub-areas should interlock to create non-linear boundaries. The interlocking geometric areas minimize the linear artifact common to prior art block compression methods as pattern seen in the reconstructed image, resulting in a visually more appealing image.
It has further been discovered that at least one specific shape and size of sub-area applied to a specific size and shape image PEL array is so effective that high quality images can be provided using only two of the three common modes of coding blocks: solid tone, and bi-tone. This eliminates the data intensive PEL-by-PEL coding further aiding compression.
It has also been discovered that this sub-area lends itself to simple smoothing method which only minimally alters apparent edge and detail sharpness, and further which requires only minimal computation.
Also, a method of deriving indices for solid tone and bi-tone blocks which is much less computation intensive has been discovered.
Accordingly it is an object of the present invention to provide the methodology and apparatus to compress image data such that the image is compressed using interlocking geometric subareas of the original image which results in greater visual appeal of the reconstructed image.
A further object of the present invention is division of the image area into sub-areas such that an image may be encoded using two block representations instead of three representations as is commonly used in the prior art. The two representations used are solid and bi-tonal, thereby eliminating the need for the extremely time intensive transmission of all of the data within a block.
Another object of the present invention is a simple, non-calculation-intensive method of smoothing the boundaries where the interlocking geometric areas meet in a reconstructed image, thus further improving the image appearance.
Yet another object of the present invention is a method of developing the sub-area indices which is non-calculation-intensive, yet also yields superior representation.