Images are conventionally represented by a two dimensional array of values in which each value represents a property of the image at a corresponding point on the image. In the case of gray-scale images, a single number representing the gradations of intensity from white to black, referred to as the gray scale, is stored. In the case of color images, each "value" is a vector whose components represent the gradations in intensity of the various primary colors, or some alternative color code, at the corresponding point in the image.
This representation of an image corresponds to the output of a typical image-sensing device such as a television camera. Such a representation is convenient in that it is easily regenerated on a display device such as a CRT tube. However, it has at least two shortcomings. First, the number of bits needed to represent the data is prohibitively large for many applications. Second, if the image is to be processed to extract features that are arranged in the same order of importance as that perceived by a person viewing the image, the amount of processing needed can be prohibitively large.
The number of bits needed to store a typical image is sufficiently large to limit the use of images in data processing and communication systems. A single 512.times.512 gray-scale image with 256 gray levels requires in excess of 256,000 bytes. A small-scale computer user is limited to disk storage systems having a capacity of typically 300 Mbytes. Hence, less than 1200 images can be stored without utilizing sone form of image compression.
Similarly, the transmission of images over conventional telephone circuitry is limited by the large number of bits needed to represent the image. If an 8.times.11 inch image were digitized to 256 gray levels at 200 dots per inch (the resolution utilized in typical FAX transmissions), in excess of 28 million bits would be required. Normal consumer-quality analog telephone lines are limited to a digital communication rate of 9600 bits per second. Hence, the transmission of the image would require in excess of 45 minutes in the absence of some form of image compression.
The need to reduce the data required to represent an image has led to numerous image compression methods. These methods can be conveniently divided into two classes, invertible and non-invertible methods. The invertible methods reduce redundancy but do not destroy any of the information present in the image. These methods transform the two-dimensional array into a form requiring fewer bits to store. The original two-dimensional array is generated by the inverse transformation prior to display. The regenerated image is identical to the original image.
Consider a gray-scale image which has been digitized to two gray levels, black and white. Such an image is often referred to as a binary image, since the gray level of each pixel is either a one or zero. Hence, the image consists of a two-dimensional array of bits. A one-dimensional list of bits can be generated from the two-dimensional array by copying, in order, each row of the two-dimensional array into the one-dimensional array. It has been observed that the one-dimensional array has long runs of ones or zeros. Consider a run of 100 ones. One hundred bits are required to represent the run in the one-dimensional array. However, the same 100 bits could be represented by a 7-bit counter value specifying the length of the run and the value "one" specifying the repeated gray level. Hence, the 100 bits can be reduced to 8-bits. This is the basis of a transformation of the one-dimensional array in which the transformed image consisting of a sequence of paired values, each pair consisting of a count and a bit value.
One can define a compression ratio which is the ratio of the number of bits in the original two-dimensional array to the number of bits needed to store the transformed image. For typical binary images, compression ratios of the order of 5 to 10 can be obtained utilizing these methods. However, the gains obtained decrease rapidly as the number of gray levels is increased. For larger numbers of gray levels, the probability of finding repeated runs of the same gray level decreases. Each time the gray level changes, a new pair of values must be entered into the file. As a result, compression ratios exceeding 3 are seldom obtained for invertible compression of gray-level images.
Higher compression ratios can be obtained if non-invertible compression methods are utilized. In such methods, the image regenerated by the inverse transformation is not identical to the original image.
Conventional error measures are of only limited value in comparing two non-invertible compression methods. For example, consider two transformations. The first transformation results in random noise being introduced into the regenerated image. That is, each pixel in the regenerated image differs from that in the original image by an amount which is randomly distributed between two values. The second transformation results in a constant having a value equal to the difference between these two values being added to each pixel in a narrow band across the image. The second transformation introduces an error having a root-mean-squared error which is significantly less than that introduced by the first transformation. However, the regenerated image produced by the first transformation is far more agreeable or satisfactory to a human viewer than that produced by the second transformation.
The typical prior art non-invertible image compression methods can be divided into two steps. In the first step, two sets of numerical coefficients, p.sub.i and q.sub.j, are derived from the image by fitting the image to a linear expansion of the form EQU I(x,y)=.SIGMA..sub.i p.sub.i F.sub.i (x,y)+.SIGMA..sub.j q.sub.j G.sub.j (x,y)
As will be explained in more detail below, the basis functions F.sub.i and G.sub.j are chosen such that the most "important" information contained in the image is represented by the p's and the least inportant information is represented by the q's. The transformation in question is invertible in the sense that given an N.times.N set of pixels I(x.sub.i,y.sub.j), one can determine a total of N.sup.2 coefficients p.sub.i and q.sub.i that will exactly reproduce the N.sup.2 values I(x.sub.i, y.sub.j). Since there are N.sup.2 pixels and N.sup.2 coefficients, the set of coefficients requires the same number of bits to store as the image if the transform coefficients are stored to the same precision as the image pixel intensities. Hence, the transformation alone does not produce any compression of the image.
In the second step of the image compression method, the coefficients p.sub.i and q.sub.j are quantized. The number of bits used to represent each p.sub.i is greater than that used to represent each q.sub.j, since the p.sub.i represent the most important information in the image. Thus the p.sub.i 's will be able to be recovered more accurately than the q.sub.j 's. The reduced precision utilized in the representation of the q.sub.j 's and p.sub.i 's is the source of the non-invertibility of the transformation.
For the above discussed technique to be useful, the image transformation must separate the information into coefficients having the property that the different sets of coefficients contain image information of different importance. If is known that the most subjectively important image information is contained in the low spatial frequency components of the image. Hence, the functions F.sub.i (x,y) must be limited in their spatial frequency response to lower frequencies than the functions G.sub.j (x,y). If this condition is satisfied, then the coefficients p.sub.i will represent more subjectively important information than the coefficients q.sub.j.
In addition to the low spatial frequency information, specific images may have information in high-frequency components that is also important. Edges typically contribute to the high spatial frequency data. An image with a large number of edges oriented in a specific direction will therefore have a significant high-frequency component if one or more of the G.sub.j functions represents edges having the orientation in question. Hence, it would be advantageous to be able to divide the G.sub.j functions into sub-classes such that one or more of the sub-classes can be quantized with increased accuracy when the coefficients associated with the sub-class indicate that a significant amount of information is contained in that sub-class.
In addition, systems of basis functions which reflect or approximate the structures found in typical images are more likely to require fewer terms to represent an image with a given degree of fidelity. It is known that images tend to include structures having limited spatial extent which vary in intensity smoothly over the structure. Hence, sets of basis functions in which the F.sub.i can approximate compact objects having intensities which are proportional to low order polynomials, i.e., constants, linear, and quadratically varying surfaces, would be advantageous. If the basis functions are orthonormal, this is equivalent to requiring that at least the low order moments of each of the basis functions G.sub.j vanish, for then the low order polynomial information is represented by the basis functions F.sub.i.
To better understand the cause of the errors and the manner in which the transformation in the first step of the compression affects the type of errors, the manner in which the quantization is performed will now be discussed in more detail. To simplify the discussion, it will be assumed that only the p.sub.i coefficients are quantized to more than zero bits. That is, zero bits will be allocated for each of the q.sub.i. It will also be assumed that each coefficient p.sub.i will be allocated K bits. Let P.sub.min and P.sub.max be the minimum and maximum values, respectively, of the set of parameters {p.sub.i }. In the simplest case, 2.sup.k equally spaced levels, denoted by L.sub.j, are defined between P.sub.min and P.sub.max. Each coefficient p.sub.i is then replaced by an integer, k, where L.sub.k .ltoreq.p.sub.i &lt;L.sub.k+1. These integers, or a suitably coded version thereof, are stored in place of the coefficients, p.sub.i.
An approximation to the image, I'(x,y), can be reconstructed from the compressed representation, where EQU I'(x,y)=.SIGMA..sub.i p'.sub.i F.sub.i (x,y)
where, for purposes of illustration, p'.sub.i =(L.sub.k +L.sub.k+1)/2, the average value of the two levels. Here, k is the integer stored in place of p.sub.i.
From the above discussion, it will be apparent that an error of as much as half of the level spacing may be introduced by this quantization procedure. The above example set the levels with reference to P.sub.min and P.sub.max to simplify the discussion. Other methods for placing the 2.sup.k levels so as to minimize the overall error resulting from the quantization process are known in the art. In general, these methods utilize the variance of the set of values {p.sub.i } and the statistical distribution of the values to set the level spacing. In this case, the level spacing is proportional to the variance. Hence, the larger the variance, the larger the quantization error. In general, the variance is determined by the image being quantized and the invertible transformation used to calculate the coefficients.
Hence, it is advantageous to provide an invertible transformation which minimizes the variance of the coefficients to be quantized. For a given image, it can be shown that the variance of the sets of coefficients {p.sub.i } and {q.sub.j } will be reduced if the basis functions F.sub.i (x,y) and G.sub.j (x,y) form an orthonormal basis for the two-dimensional image space.
The above described properties of the image transformation allow one to reduce the errors introduced at the quantization stage. However, there will always be errors. The manner in which these errors influence the reconstructed image will depend on the basis functions F.sub.i (x,y) and G.sub.i (x,y).
It is useful to distinguish the various classes of basis functions by the fraction of the image over which each of the basis functions is non-zero. This will be referred to as the support of the basis function.
If the basis functions have support which is of the same order of size as the image itself, then a quantization error in one coefficient will affect every point in the reconstructed image. This leads to aliasing errors in the reconstructed image. Such errors are subjectively very objectionable. For example, a quantization error in a single coefficient could lead to "stripes" extending across the entire reconstructed image.
A second problem occurs with basis functions having large support. As noted above, images tend to contain structures whose spatial extent is small compared to the size of the image. To represent such a structure with basis functions that have support which is much larger than the structure in question often requires the superposition of many such basis functions. Hence, the number of coefficients which contain useful information is likely to be larger if basis functions having support which is much larger than the objects found in the image are used.
If, on the other hand, the basis functions have support which is small, then a quantization error will only affect a small area of the reconstructed image. This leads to errors whose consequences are localized in the images and are more like random noise. As noted above, random noise errors can be incurred without producing a subjectively objectionable image.
The quantization errors are also related to the transform's ability to concentrate the essential image information into a set of coefficients which represents that information using the smallest number of bits. The number of bits needed to represent the image is the sum of two numbers. First, bits must be allocated to communicate the values of the coefficients produced by the transform which are saved for use in reconstructing the image. The second number is the number of bits needed to communicate which of the possible coefficients were saved after transformation. For any given compression ratio, the total number of available bits for representing the compressed image is fixed. Bits used for labels are unavailable for storing coefficient values. Hence, if a significant number of bits are needed for storing the label information, the number of bits available for quantization will be significantly reduced. This reduction in bits used for quantization results in larger quantization errors.
In general, the transformation of an N.times.N image can produce N.sup.2 coefficients. The coefficients could be listed by giving a label specifying the identity of the coefficient and a number representing the value of the coefficient for each coefficient. If the transformation is effective, most of these coefficients will be zero or very small numbers. The minimum number of bits needed to represent the useful coefficients in this format would be the number needed to specify the label associated with each non-zero coefficient and the number of bits needed to adequately represent the value of that coefficient. One problem with this type of representation is the number of bits needed to communicate the labels.
Consider a 1000.times.1000 pixel image in which each pixel is digitized to 8 bits, i.e., 256 gray levels. Suppose a transformation were available which produced only 1000 coefficients that were non-zero out of the possible 1,000,000. Assume further that these coefficients are to be quantized to 8 bits. To specify the identity of each coefficient will require approximately 20 bits. Hence the total number of bits needed to specify the image in the compressed format would be 20,000+8,000=28,000 bits. As a result, a compression ratio of less than 300:1 is obtained. In principle, a compression ratio of 1000:1 would be obtainable if one did not have to allocated bits for storing the labels.
Hence, in addition to concentrating the image information into a small number of coefficients, the transformation coefficients in question should form a subset whose identity does not depend on the specific image being transformed. In this case, the entire subset will be quantized. Since the position of a coefficient in the subset is equivalent to a label, there is no need to allocate bits for a label specifying the identify of each coefficient.
Prior art methods make use of the observation that the most important information appears to be concentrated in the low spatial frequencies for images. In prior art compression methods, the image is typically transformed to produce one set of coefficients representing the low frequency information and a plurality of sets of coefficients representing different high spatial frequency information. The individual sets are then quantized with differing degrees of precision. The only labeling information is then that required to identify the individual subsets. In this manner, the information needed to specify the "label" for each coefficient is reduced.
This reduction in label storage bits, however, results in an increase in the number of bits needed to represent the coefficients. In this scheme, all of the coefficients in any given subset are treated in the same manner. Hence, all of the coefficients in a particular subset must be quantized if any of the coefficients contains significant information.
Thus, it is advantageous to have a transformation which provides as much flexibility as possible in grouping the coefficients into subsets prior to quantization. In the simple transformation described above, an N.times.N image is transformed to provide N.sub.p low frequency coefficients P.sub.i and N.sub.q high frequency coefficients q.sub.j. Ideally, one would like to find the smallest N.sub.p for which all of the useful information was in the P.sub.i 's. In this case one would merely quantize the P.sub.i and ignore the second subset, i.e., the q.sub.j 's.
As a result, it is advantageous to provide a transformation that allows one to chose the ratio of N.sub.p /N.sub.q. In general, image transformation do not allow one to set this ratio with arbitrary precision. Instead, one is presented with a number of steps. For example, in the solution taught by Adelson, et al. discussed below, ratios of 4, 16, 64, 256, and so on are provided. Transformations in which the steps in question are smaller are more desirable than those in which the steps are larger.
Although a number of transformations have been utilized in the prior art, none have provided all of these advantageous features. For example, one class of transformations is based on the Fourier expansion for the image transformation. The Fourier series basis functions have the advantage of providing an orthonormal basis for the image space. However, these functions have a number of disadvantages. First, the support of every Fourier basis function is the entire image. That is, each basis function is contributes to the entire image. As a result, quantization errors tend to produce aliasing errors which are subjectively unsatisfactory. Second, the computational work load to compute the Fourier transform is of order n log (n), where n is the number of pixels in the original image. To overcome this difficulty, the image is often divided into smaller sub-images which are pieced together after reconstruction. This procedure reduces the computational work load, but leads to other undesirable artifacts in the reconstructed image.
A second prior art solution to the image transformation problem is taught in U.S. Pat. No. 4,817,182 by Adelson, et al. In the method taught by Adelson, et al., the image is processed by a quadrature mirror filter bank (QMF) to produce four filtered images which are, in reality, four sets of coefficients. Three of the sets of coefficients are analogous to the q.sub.j discussed above in that they represent high spatial frequency information. The fourth set of coefficients is analogous to the P.sub.i in that these coefficients represent low-frequency information. The number of coefficients in each set is the same. Adelson, et al. teach treating the low-frequency components as an image of one quarter the size and then reprocessing the low-frequency coefficients using the same QMF. This iterative procedure leads to a pyramid of coefficient sets with a low-frequency set at the top and three sets of high-frequency coefficients at each level below the top. Each level represents information having higher frequency than the level above it.
It may be shown that this method is equivalent to the transformation discussed above with the q.sub.j being further divided into different subsets. Although Adelson, et al. refer to their QMF method as being equivalent to expanding the image in an orthonormal basis set, this method does not provide the claimed orthonormal expansion. The basis functions corresponding to a QMF are, by definition, symmetric. Using this property of the QMF basis functions, it can be shown that the QMF basis functions can not be an orthonormal set. Hence, this method does not provide the advantages of an orthonormal transformation of the image.
Another problem with the method taught by Adelson, et al. is that no means for selecting the QMF filter in terms of the image properties is taught. As a result, the basis functions taught therein do not optimally represent smoothly varying image features. For example, these basis functions can not exactly represent quadratic surfaces.
A still further problem with the method taught by Adelson, et al. is the inability to finely tune the compression ratio. Each time the filtering operation is applied, the number of coefficients in the {p.sub.i } set decreases by a factor of four. Hence, one only has the option of choosing compression ratios that are powers of four if only the {p.sub.i } are to be quantized. As a result, one must accept a very low compression ratio or device a strategy for determining which of the sets of {q.sub.j } are to be quantized.
Broadly, it is an object of the present invention to provide an improved apparatus and method for coding an image such that the coded image can be represented in fewer bits than the original image.
It is a further object of the present invention to provide an apparatus and method which utilize an orthonormal transformation of the image.
It is yet another object of the present invention to provide an apparatus and method in which the transformation utilizes basis functions having support which is small compared to the size of the image.
It is a still further object of the present invention to provide an apparatus and method in which the compression ratio may be selected in increments other than powers of four.
It is yet another object of the present invention to provide an apparatus and method in which the transformation utilizes basis functions that can adequately represent smoothly varying images such as low order polynomials.
It is a still further object of the present invention to provide an apparatus and method in which the transformation allows the high-frequency basis functions to be divided into a greater number of sub-classes than the prior art image compression transformations.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the invention and the accompanying drawings.