An image such as an image displayed on a computer monitor may be represented as a two-dimensional matrix of digital data values. A single frame on a VGA computer monitor may, for example, be represented as three matrixes of pixel values. Each of the three matrixes has a data value which corresponds to a pixel on the monitor.
The images on the monitor can be represented by a 640 by 480 matrix of data values representing the luminance (brightness) values Y of the pixels of the screen and two other 640 by 480 matrixes of data values representing the chrominance (color) values U and V of the pixels on the screen. Although the luminance and chrominance values are analog values, the one luminance value and the two chrominance values for a pixel may be digitized from analog form into discrete digital values. Each luminance and chrominance digital value may be represented by an 8-bit number. One frame of a computer monitor therefore typically requires about 7 megabits of memory to store in an uncompressed form.
In view of the large amount of memory required to store or transmit a single image in uncompressed digital form, it would be desirable to compress the digital image data before storage or transmission in such a way that the compressed digital data could later be decompressed to recover the original image data for viewing. In this way, a smaller amount of compressed digital data could be stored or transmitted. Accordingly, numerous digital image compression and decompression methods have been developed.
According to one method, each individual digital value is converted into a corresponding digital code. Some of the codes have a small number of bits whereas others of the codes have a larger number of bits. In order to take advantage of the fact that some of the codes are short whereas others of the codes are longer, the original digital data values of the original image are filtered using digital filters into a high frequency component and a low frequency component. The high frequency component represents ambiguities in the image and is therefore observed to have a comparatively large number of identical data values for real-world images. By encoding the commonly occurring digital data values in the high frequency component with the short digital codes, the total number of bits required to store the image data can be reduced from the number of bits that would otherwise be required if 8-bits were used to represent all of the data values. Because the total number of bits in the resulting encoded data is less than the total number of bits in the original sequence of data values, the original image is said to have been compressed.
To decompress the compressed encoded data to recover the original image data, the compressed encoded data is decoded using the same digital code. The resulting high and low frequency components are then recombined to form the two-dimensional matrix of original image data values.
Where the data being compressed is two-dimensional data such as image data, separation of the original data into high and low frequency components by the digital filters may be accomplished by filtering in two dimensions such as the horizontal dimension of the image and the vertical dimension of the image. Similarly, undecoded high and low frequency components can be recombined into the original image data values by recombining in two dimensions.
To achieve even greater compression, the low frequency component may itself be filtered into its high and low frequency components before encoding. Similarly, the low frequency component of the low frequency component may also be refiltered. This process of recursive filtering may be repeated a number of times. Whether or not recursive filtering is performed, the filtered image data is said to have been "transformed" into the high and low frequency components. This digital filtering is called a "transform". Similarly, the high and low pass components are said to be "inverse transformed" back into the original data values. This process is known as the "inverse transform".
FIG. 1 is a diagram of a digital gray-scale image of a solid black square 1 on a white background 2 represented by a 640 by 480 matrix of 8-bit data luminance values.
FIG. 2 is a diagram illustrating a first intermediate step in the generation of the high and low frequency components of the original image. A high pass digital filter which outputs a single data value using multiple data values as inputs is first run across the original image values from left to right, row by row, to generate G subblock 3. The number of digital values in G subblock 3 is half of the number of data values in the original image of FIG. 1 because the digital filter is sequentially moved to the right by twos to process two additional data values for each additional one data output generated for G subblock 3. Similarly, a low pass digital filter which outputs a single data value using multiple data values as inputs is first run across the original image values from left to right, row by row, to generate H subblock 4. The number of digital values in H subblock 4 is half of the number of data values in the original image because the digital filter is moved to the right by twos to process two additional data values for each additional one data output generated for H subblock. Each of two vertical bars in high pass G subblock 3 appears where a change occurs spatially in the horizontal dimension in the original image of FIG. 1. Where the G filter encounters a change from white data values to black data values when the filter G is run across the image of FIG. 1 in a horizontal direction, the G digital filter outputs a corresponding block data value into subblock 3. Similarly, when the G digital filter encounters the next change, which is this time a change from black to white data values, the G digital filter again outputs a corresponding black data value into G subblock 3.
FIG. 3 is a diagram illustrating a second intermediate step in the generation of the high and low frequency components of the original image. The high pass digital filter is run down the various columns of the subblocks H and G of FIG. 2 to form the HG subblock 5 and GG subblock 6 shown in FIG. 3. Similarly, the low pass digital filter is run down the various columns of the H and G subblocks 3 and 4 of FIG. 2 to form HH and GH subblocks 7 and 8 shown in FIG. 3. The result is the low pass component in subblock HH and the three high pass component subblocks GH, HG and GG. The total number of high and low pass component data values in FIG. 3 is equal to the number of data values in the original image of FIG. 1. The data values in the high pass component subblocks GH, HG and GG are referred to as the high frequency component data values of octave 0.
The low pass subblock HH is then filtered horizontally and vertically in the same way into its low and high frequency components. FIG. 4 illustrates the resulting subblocks. The data values in HHHG subblock 9, HHGH subblock 10, and HHGG subblock 11 are referred to as the high frequency component data values of octave 1. Subblock HHHH is the low frequency component. Although not illustrated, the low frequency HHHH subblock 12 can be refiltered using the same method. As can be seen from FIG. 3, the high frequency components of octaves 0 and 1 are predominantly white because black in these subblocks denotes changes from white to black or black to white in the data blocks from which to high frequency subblocks are generated. The changes, which are sometimes called edges, from white to black as well as black to white in FIG. 1 result in high frequency data values in the HG, HG and GG quadrants as illustrated in FIG. 3.
Once the image data has been filtered the desired number of times using the above method, the resulting transformed data values are encoded using a digital code such as the Huffman code in Table 1.
TABLE 1 ______________________________________ Corresponding Digital Digital Gray-Scale Value Code ______________________________________ . . . 5 1000001 4 100001 2 10001 black 1 101 white 0 0 -1 111 -2 1101 -3 11001 -4 110001 -5 1100001 . . . ______________________________________
Because the high frequency components of the original image of FIG. 1 are predominantly white as is evident from FIGS. 3 and 4, the gray-scale white is assigned the single bit 0 in the above digital code. The next most common gray-scale color in the transformed image is black. Accordingly, gray-scale black is assigned the next shortest code of 101. The image of FIG. 1 is comprised only of black and white pixels. If the image were to involve other gray-scale shades, then other codes would be used to encode those gray-scale colors, the more predominant gray-scale shades being assigned the relatively shorter codes. The result of the Huffman encoding is that the digital values which predominate in the high frequency components are coded into codes having a few number of bits. Accordingly, the number of bits required to represent the original image data is reduced. The image is therefore said to have been compressed.
Problems occur during compression, however, when the digital filters operate at the boundaries of the data values. For example, when the high pass digital filter generating the high pass component begins generating high pass data values of octave 0 at the left hand side of the original image data, some of the filter inputs required by the filter do not exist.
FIG. 5 illustrates the four data values required by a four coefficient high pass digital filter G in order to generate the first high pass data value G.sub.0 of octave 0. As shown in FIG. 5, data values D.sub.1, D.sub.2, D.sub.3 and D.sub.4 are required to generate the second high pass data value of octave 0, data value G.sub.1. In order to generate the first high pass component output data value G.sub.0, on the other hand, data values D.sub.-1, D.sub.0, D.sub.1, and D.sub.2 are required. Data value D.sub.-1 does not, however, exist in the original image data.
Several techniques have been developed in an attempt to solve the problem of the digital filter extending beyond the boundaries of the image data being transformed. In one technique, called zero padding, the nonexistent data values outside the image are simply assumed to be zeros. This may result in discontinuities at the boundary, however, where an object in the image would otherwise have extended beyond the image boundary but where the assumed zeros cause an abrupt truncation of the object at the boundary. In another technique, called circular convolution, the two dimensional multi-octave transform can be expressed in terms of one dimensional finite convolutions. Circular convolution joins the ends of the data together. This introduces a false discontinuity at the join but the problem of data values extending beyond the image boundaries no longer exists. In another technique, called symmetric circular convolution, the image data at each data boundary is mirrored. A signal such as a ramp, for example, will become a peak when it is mirrored. In another technique, called doubly symmetric circular convolution, the data is not only mirrored spatially but the values are also mirrored about the boundary value. This method attempts to maintain continuity of both the signal and its first derivative but requires more computation for the extra mirror because the mirrored values must be pre-calculated before convolution.
FIG. 6 illustrates yet another technique which has been developed to solve the boundary problem. According to this technique, the high and low pass digital filters are moved through the data values in a snake-like pattern in order to eliminate image boundaries in the image data. After the initial one dimensional convolution, the image contains alternating columns of low and high pass information. By snaking through the low pass sub-band before the high pass, only two discontinuities are introduced. This snaking technique, however, requires reversing the digital filter coefficients on alternate rows as the filter moves through the image data. This changing of filter coefficients as well as the requirement to change the direction of movement of the digital filters through various blocks of data values makes the snaking technique difficult to implement. Accordingly, an easily implemented method for solving the boundary problem is sought which can be used in data compression and decompression.
Not only does the transformation result in problems at the boundaries of the image data, but the transformation itself typically requires a large number of complex computations and/or data rearrangements. The time required to compress and decompress an image of data values can therefore be significant. Moreover, the cost of associated hardware required to perform the involved computations of the forward transform and the inverse transform may be so high that the transform method cannot be used in cost-sensitive applications. A compression and decompression method is therefore sought that not only successfully handles the boundary problems associated with the forward transform and inverse transform but also is efficiently and inexpensively implementable in hardware and/or software. The computational complexity of the method should therefore be low.
In addition to transformation and encoding, even further compression is possible. A method known as tree encoding may, for example, be employed. Moreover, a method called quantization can be employed to further compress the data. Tree encoding and quantization are described in various texts and articles including "Image Compression using the 2-D Wavelet Transform" by A. S. Lewis and G. Knowles, published in IEEE Transactions on Image Processing, April 1992. Furthermore, video data which comprises sequences of images can be compressed by taking advantage of the similarities between successive images. Where a portion of successive images does not change from one image to the next, the portion of the first image can be used for the next image, thereby reducing the number of bits necessary to represent the sequence of images.
JPEG (Joint Photographics Experts Group) is an international standard for still-images which typically achieves about a 10:1 compression ratios for monochrome images and 15:1 compression ratios for color images. The JPEG standard employs a combination of a type of Fourier transform, known as the discrete-cosine transform, in combination with quantization and a Huffman-like code. MPEG1 (Motion Picture Experts Group) and MPEG2 are two international video compression standards. MPEG2 is a standard which is still evolving which is targeted for broadcast television. MPEG2 allows the picture quality to be adjusted to allow more television information to be transmitted on a given line. H.261 is another video standard based on the discrete-cosine transform. H.261 also varies the amount of compression depending on the data rate required.
Compression standards such as JPEG, MPEG1, MPEG2 and H.261 are optimized to minimize the signal to noise ratio of the error between the original and the reconstructed image. Due to this optimization, these methods are very complex. Chips implementing MPEG1, for example, may be costly and require as many as 1.5 million transistors. These methods only partially take advantage of the fact that the human visual system is quite insensitive to signal to noise ratio. Accordingly, some of the complexity inherent in these standards is wasted on the human eye. Moreover, because these standards encode by areas of the image, they are not particularly sensitive to edge-type information which is of high importance to the human visual system. In view of these maladaptions of current compression standards to the characteristics of the human visual system, a new compression and decompression method is sought which handles the above-described boundary problem and which takes advantage of the fact that the human visual system is more sensitive to edge information than signal to noise ratio so that the complexity and cost of implementing the method can be reduced.
A system is desired for compressing and decompressing video using dedicated digital hardware to compress and using software to decompress. For example, in a video mail application one user uses a hardware compression expansion card for an IBM PC personal computer coupled to a video camera to record a video message in the form of a video message file. This compressed video message file is then transmitted via electronic mail over a network such as a hardwired network of an office building. A recipient user receives the compressed video message file as he/she would receive a normal mail file and then uses the software to decompress the compressed video message file to retrieve the video mail. The video mail may be displayed on the monitor of the recipient's personal computer. It is desirable to be able to decompress in software because decompressing in software frees multiple recipients from purchasing relatively expensive hardware. Software for performing the decompression may, for example, be distributed free of charge to reduce the cost of the composite system.
In one prior art system, the Intel Indeo video compression system, a hardware compression expansion card compresses video and a software package is usable to decompress the compressed video. This system, however, only achieves a small compression ratio. Accordingly, video picture quality will not be able to be improved as standard personal computers increase in computing power and/or video bandwidth.
U.S. patent application Ser. No. 08/040,301 entitled "Data Compression and Decompression" discloses a method and apparatus for compressing and decompressing video. The software decompression implementation written in the programming language C disclosed in U.S. patent application Ser. No. 08/040,301 only decompresses at a few frames per second on a standard personal computer at the present date. A method capable of implementation in software which realizes faster decompression is therefore desirable.