The present invention relates to methods and apparatuses for transforming an array of data and, more specifically, to a method and apparatus for reducing the amount of data needed to store an image.
Images are conventionally represented by a two dimensional array of values in which each value represents a property of the image at a corresponding point on the image. In the case of black and white images, a single number representing the gradations of intensity from white to black, referred to as the gray scale, is stored. In the case of color images, each "value" is a vector whose components represent the gradations in intensity of the various primary colors at the corresponding point in the image.
This representation of an image corresponds to the output of a typical image sensing device such as a television camera. Such a representation is convenient in that it is easily regenerated on a display device such as a CRT tube. However, it has at least two short-comings. First, the number of bits needed to represent the data is prohibitively large for many applications. Second, if the image is to be processed to extract features which are ordered in the same order of importance as that perceived by a person viewing the image, the amount of processing needed can be prohibitively large.
The number of bits needed to store a typical image is sufficiently large to limit the use of images in data processing and communication systems. A single 512.times.512 gray scale image with 256 gray levels requires in excess of 256,000 bytes. A small scale computer user is limited to disk storage systems having a capacity of typically 300 Mbytes. Hence, less than 1200 images can be stored without utilizing some form of image compression.
Similarly, the transmission of images over conventional telephone circuitry is limited by the high number of bits needed to represent the image. An 8.times.11 inch image digitized to 256 gray levels at 200 dots per inch (the resolution utilized in typical FAX transmissions) requires in excess of 28 million bits. Normal consumer quality analog telephone lines are limited to a digital communication rate of 9600 bits per second. Hence, the transmission of the image requires in excess of 45 minutes in the absence of some form of image compression.
The need to reduce the data required to represent an image has led to numerous image compression methods. These methods can be conveniently divided into two classes, invertible and non-invertible methods. The invertible methods reduce redundancy but do not destroy any of the information present in the image. These methods transform the two-dimensional array into a form requiring fewer bits to store. The original two-dimensional array is generated by the inverse transformation prior to display. The regenerated image is identical to the original image.
Consider a black and white image which has been digitized to two gray levels, black and white. The image consists of a two-dimensional array of bits. A one dimensional list of bits can be generated from the two-dimensional array by copying, in order, each row of the two-dimensional array into the one dimensional array. It has been observed that the one dimensional array has long runs of ones or zeros. Consider a run of 100 ones. One hundred bits are required to represent the run in the one-dimensional array. However, the same 100 bits could be represented by a 7-bit counter value specifying the length of the run and the value "zero" specifying the repeated gray level. Hence, the 100 bits can be reduced to 8-bits. This is the basis of a transformation of the one-dimensional array in which the transformed image consisting of a sequence of paired values, each pair consisting of a count and a bit value.
One can define a compression ratio which is the ratio of the number of bits in the original two-dimensional array to the number of bits needed to store the transformed image. For typical black/white images, compression ratios of the order of 5 to 10 can be obtained utilizing these methods. However, the gains obtained decrease rapidly as the number of gray levels is increased. At higher numbers of gray levels, the probability of finding repeated runs of the same gray level decreases. Each time the gray level changes, a new pair of values must be entered into the file. As a result, compression ratios exceeding 3 are seldom obtained for invertible compression of gray level images.
Higher compression ratios can be obtained if non-invertible compression methods are utilized. In such methods, the image regenerated by the inverse transformation is not identical to the original image.
Conventional error measures are of only limited value in comparing two non-invertible compression methods. For example, consider two transformations. The first transformation results in random noise being introduced into the regenerated image. That is, each pixel in the regenerated image differs from that in the original image by an amount which is randomly distributed between two values. The second transformation results in a constant having a value equal to the difference in these two values being added to each pixel in a narrow band across the image. The second transformation introduces an error having a root mean squared error which is significantly less than that introduced by the first transformation. However, the regenerated image produced by the first transformation is far more acceptable than that produced by the second transformation.
The typical prior art non-invertible image compression transformations can be divided into two steps. In the first step, two sets of coefficients, p.sub.i and q.sub.j are derived from the image by fitting the image to a linear expansion of the form EQU I(x,y)=.SIGMA.p.sub.i f.sub.i (x,y)+.SIGMA.q.sub.j g.sub.j (x,y)
As will be explained in more detail below, the basis functions f.sub.i and g.sub.j are chosen such that the most "important" information contained in the image is represented by the p's and the least important information is represented by the q's. The transformation in question is invertible in the sense that given an NxN set of pixels, I(x.sub.i,y.sub.j), one can determine a total of N.sup.2 coefficients p.sub.i and q.sub.i that will exactly reproduce the N.sup.2 values I(x.sub.i,y.sub.j). Since there are N.sup.2 pixels and N.sup.2 coefficients, the set of coefficients requires the same number of bits to store as the image if the transform coefficients are stored to the same precision as the image pixel intensities. Hence, the transformation alone does not produce any compression of the image.
In the second step of the image compression transformation, the coefficients p.sub.i and qj are quantized. The number of bits used to represeft each pi is greater than that used to represent each q.sub.j, since the p.sub.i represent the most important information in the image. Thus on will be above to recover the pi's more accurately than the qi's. If each p.sub.i is represented by the same number of bits as the image pixels, a net reduction in the number of bits needed to represent the image is achieved. However, the reduced precision utilized in the representation of the q.sub.i are the source of the non-invertiblity of the transformation.
For the above discussed technique to be useful, the image transformation must separate the information into coefficients having the property that a sub-set of the coefficients contains the important image information. It is known that the most subjectively important image information is contained in the low spatial frequency components of the image. Hence, the functions f.sub.i (x,y) must be limited in their spatial frequency response to lower frequencies than the functions g.sub.j (x,y). If this condition is satisfied, then the coefficients p.sub.i will represent more important information than the coefficients q.sub.j.
In addition to the low spatial frequency information, specific images may have information in high frequency components which is also important. Edges typically contribute to the high spatial frequency data. An image with a large number of edges oriented in a specific direction will therefore have a significant high frequency component if one or more of the g.sub.j functions represents edges having the orientation in question. Hence, it would be advantageous to be able to divide the g.sub.j into classes such that one or more of the classes can be quantized with increased accuracy when the coefficients associated with the sub-class indicate that a significant amount of information is contained in the class.
In addition, basis functions which reflect the structures found in typical images are more likely to require fewer functions to represent an image with a given degree of fidelity. It is known that images tend to include structures which vary in intensity smoothly over the structure. Hence, sets of basis functions in which the f.sub.i can approximate low order polynomials would be advantageous. If the basis functions are orthonormal, this is equivalent to requiring that at least the first two moments of each of the basis functions g.sub.i vanish.
To better understand the cause of the errors and the manner in which the transformation in the first step of the compression affects the type of errors, the manner in which the quantization is performed will now be discussed in more detail. To simplify the discussion, it will be assumed that only the p.sub.i coefficients are quantized to more than zero bits. That is, zero bits will be allocated for each of the q.sub.i. It will be assumed that the image is to be compressed by a factor R. The number of bits available for representing each coefficient p.sub.i will be denoted by K. Let P.sub.min and P.sub.max be the minimum and maximum values, respectively, of the set of parameters {p.sub.i }. In the simplest case, 2.sup.K equally spaced levels, L.sub.j, are defined between P.sub.min and P.sub.max. Each coefficient p.sub.i is then replaced by an integer, k, where L.sub.k .ltoreq.p.sub.i &lt;L.sub.k+1. These integers, or a suitably coded version of them, are stored in place of the coefficients, p.sub. i.
An approximation to the image, I'(x,y) can be reconstructed from the compressed representation, where EQU I'(x,y)=.SIGMA.p'.sub.i f.sub.i (x,y)
where, for purposes of illustration, p'.sub.i =(L.sub.k +L.sub.k+1)/2. Here, k is the integer stored in place of p.sub.i.
From the above discussion, it will be apparent that an error of as much as half of the level spacing may be introduced by this quantization procedure. The above example set the levels with reference to P.sub.min and P.sub.max to simplify the discussion. Other methods for placing the 2.sup.K levels so as to minimize the overall error resulting from the quantization process are known in the art. In general, these methods utilize the variance of the set of values {p.sub.i } and the staistical distribution of the values to set the levels. In this case, the larger the variance, the larger the quantization error. In general, the variance is determined by the image being quantized and the invertible transformation used to calculate the parameters.
Hence, it is advantageous to provide an invertible transformation which minimizes the variance of the coefficients to be quantized. For a given image, it can be shown that the variance of the sets of coefficients {p.sub.i } and {q.sub.j } will be reduced if the basis functions f.sub.i (x,y) and g.sub.j (x,y) form an orthonormal basis for the two-dimensional image space.
The above described properties of the image transformation allow one to reduce the errors introduced at the quantization stage. However, there will always be errors. The manner in which these errors influence the reconstructed image will depend on the basis functions f.sub.i (x,y) and g.sub.j (x,y). It is useful to distinguish the various classes of basis functions by the fraction of the image over which each of the basis functions is non-zero. This will be referred to as the support of the basis function.
If the basis functions have support which is of the same order of size as the image itself, then a quantization error in one coefficient will affect the entire reconstructed image. This leads to aliasing errors in the reconstructed images. Such errors are subjectively very objectionable. For example, a quantization error in a single coefficient could lead to "stripes" extending across the entire reconstructed image.
If, on the other hand, the basis function have support which is small, then such a quantization error will only affect a small area of the reconstructed image This leads to errors which are more like random noise. As noted above, random noise errors can be incurred without producing a subjectively objectionable image.
Although a number of transformations have been utilized in the prior art, none have provided all of the above advantageous features. For example, one class of transformations is based on the Fourier expansion for the image transformation. The Fourier series basis functions have the advantage of providing an orthonormal basis for the image space. However, these functions have a number of disadvantages. First, the support of every Fourier basis function is the entire image; hence, quantization errors tend to produce aliasing errors which are subjectively unsatisfactory. Second, the computational work load to compute the Fourier transform is of order nlog(n), where n is the number of pixels in the original image. To overcome this difficulty, the image is often divided into smaller sub-images which are pieced together after reconstruction. This procedure reduces the computational work load, but leads to other artifacts in the reconstructed image.
A second prior art solution to the image transformation problem is taught in U.S. Pat. No. 4,817,182 by Adelson, et.al.. In the method taught by Adelson, et.al., the image is processed by a quadrature mirror filter (QMF) to produce four filtered images which are, in reality, four sets of coefficients. Three of the sets of coefficients are analogous to the q.sub.j discussed above in that they represent high spatial frequency information. The fourth set of coefficients is analogous to the p.sub.i in that these coefficients represent low frequency information. The number of coefficients in each set is the same. Adelson, et.al. teach treating the low frequency components as an image of one quarter the size and then reprocessing the low frequency coefficients using the same QMF. This iterative procedure leads to a pyramid of coefficient sets with a low frequency set at the top and three sets of high frequency coefficients at each level below the top. Each level represents information having higher frequency than the level above it.
It may be shown that this method is equivalent to the transformation discussed above with the q.sub.j being further divided into different sets. Although Adelson, et.al. refer to their QMF method as being equivalent to expanding the image in an orthonormal basis set, this method does not provide the claimed orthonormal expansion. The basis functions corresponding to a QMF are symmetric by definition. Using this property of the QMF basis functions, it can be shown that the QMF basis functions can not be an orthonormal set. Hence, this method does not provide an optimal transformation of the image.
Another problem with the method taught by Adelson, et.al. is that no means for selecting the QMF filter in terms of the image properties is taught. As a result, the basis functions taught therein do not optimally represent smoothly varying image features.
Broadly, it is an object of the present invention to provide an improved apparatus and method for coding an image such that the coded image can be represented in fewer bits than the encoded image.
It is a further object of the present invention to provide an apparatus and method which utilize an orthonormal transformation of the image.
It is yet another object of the present invention to provide an apparatus and method in which the transformation utilizes basis functions having compact support.
It is yet another object of the present invention to provide an apparatus and method in which the transformation utilizes basis functions that can adequately represent smoothly varying images.
It is a still further object of the present invention to provide an apparatus and method in which the transformation allows the high frequency basis functions to be divided into a greater number of sub-classes than the prior art image compression transformations.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the invention and the accompanying drawings.