1. Field of the Invention
The present invention generally relates to image compression for diverse applications and, more particularly, in combination with a structure for storing Discrete Cosine Transform (DCT) blocks in a packed format, performing Huffman entropy encoding and decoding in accordance with the JPEG (Joint Photographic Experts Group) standard.
2. Description of the Prior Art
Pictorial and graphics images contain extremely large amounts of data and, if digitized to allow transmission or processing by digital data processors, often requires many millions of bytes to represent respective pixels of the image or graphics with good fidelity. The purpose of image compression is to represent images with less data in order to save storage costs or transmission time and costs. The most effective compression is achieved by approximating the original image, rather than reproducing it exactly. The JPEG standard, discussed in detail in “JPEG Still Image Data Compression Standard” by Pennebaker and Mitchell, published by Van Nostrand Reinhold, 1993, which is hereby fully incorporated by reference, allows the interchange of images between diverse applications and opens up the capability to provide digital continuous-tone color images in multi-media applications.
JPEG is primarily concerned with images that have two spatial dimensions, contain gray scale or color information, and possess not temporal dependence, as distinguished from the MPEG (Moving Picture Experts Group) standard. JPEG compression can reduce the storage requirements by more than an order of magnitude and improve system response time in the process. A primary goal of the JPEG standard is to provide the maximum image fidelity for a given volume of data and/or available transmission or processing time and any arbitrary degree of data compression is accommodated. It is often the case that data compression by a factor of twenty or more (and reduction of transmission or processing time by a comparable factor) will not produce artifacts which are noticeable to the average viewer.
One of the basic building blocks for JPEG is the Discrete Cosine Transform (DCT). An important aspect of this transform is that it produces uncorrelated coefficients. Decorrelation of the coefficients is very important for compression because each coefficient can be treated independently without loss of compression efficiency. Another important aspect of the DCT is the ability to quantize the DCT coefficients using visually-weighted quantization values. Since the human visual system response is very dependent on spatial frequency, by decomposing an image into a set of waveforms, each with a particular spatial frequency, it is possible to separate the image structure the eye can see from the image structure that is imperceptible. The DCT thus provides a good approximation to this decomposition to allow truncation or omission of data which does not contribute significantly to the viewer's perception of the fidelity of the image.
In accordance with the JPEG standard, the original monochrome image is first decomposed into blocks of sixty-four pixels in an 8×8 array at an arbitrary resolution which is presumably sufficiently high that visible aliasing is not produced. (Color images are compressed by first decomposing each component into an 8×8 pixel blocks separately.) Techniques and hardware is known which can perform a DCT on this quantized image data very rapidly, yielding sixty-four DCT coefficients. Many of these DCT coefficients for many images will be zero (which do not contribute to the image in any case) or near-zero which can be neglected or omitted when corresponding to spatial frequencies to which the eye is relatively insensitive. Since the human eye is less sensitive to very high and very low spatial frequencies, as part of the JPEG standard, providing DCT coefficients in a so-called zig-zag pattern which approximately corresponds to an increasing sum of spatial frequencies in the horizontal and vertical directions tends to group the DCT coefficients corresponding less important spatial frequencies at the ends of the DCT coefficient data stream, allowing them to be compressed efficiently as a group in many instances.
While the above-described discrete cosine transformation and coding may provide significant data compression for a majority of images encountered in practice, actual reduction in data volume is not guaranteed and the degree of compression is not optimal, particularly since equal precision for representation of each DCT coefficient would require the same number of bits to be transmitted (although the JPEG standard allows for the DCT values to be quantized by ranges that are coded in a table). That is, the gain in compression developed by DCT coding derives largely from increased efficiency in handling zero and near-zero values of the DCT coefficients although some compression is also achieved through quantization that reduces precision. Accordingly, the JPEG standard provides a second stage of compression and coding which is known as entropy coding.
The concept of entropy coding generally parallels the concept of entropy in the more familiar context of thermodynamics where entropy quantifies the amount of “disorder” in a physical system. In the field of information theory, entropy is a measure of the predictability of the content of any given quantum of information (e.g. symbol) in the environment of a collection of data of arbitrary size and independent of the meaning of any given quantum of information or symbol. This concept provides an achievable lower bound for the amount of compression that can be achieved for a given alphabet of symbols and, more fundamentally, leads to an approach to compression on the premise that relatively more predictable data or symbols contain less information than less predictable data or symbols and the converse that relatively less predictable data or symbols contain more information than more predictable data or symbols. Thus, assuming a suitable code for the purpose, optimally efficient compression can be achieved by allocating fewer bits to more predictable symbols or values (that are more common in the body of data and include less information) while reserving longer codes for relatively rare symbols or values.
As a practical matter, Huffman coding and arithmetic coding are suitable for entropy encoding and both are accommodated by the JPEG standard. One operational difference for purposes of the JPEG standard is that, while tables of values corresponding to the codes are required for both coding techniques, default tables are provided for arithmetic coding but not for Huffman coding. However, some particular Huffman tables, although they can be freely specified under the JPEG standard to obtain maximal coding efficiency and image fidelity upon reconstruction, are often used indiscriminately, much in the nature of a default, if the image fidelity is not excessively compromised in order to avoid the computational overhead of computing custom Huffman tables.
It should be appreciated that while entropy coding, particularly using Huffman coding, guarantees a very substantial degree of data compression if the coding or conditioning tables are reasonably well-suited to the image, the encoding, itself, is very computationally intensive since it is statistically based and requires collection of statistical information regarding a large number of image values or values representing them, such as DCT coefficients. Conversely, the use of tables embodying probabilities which do not represent the image being encoded could lead to expansion rather than compression if the image being encoded requires coding of many values which are relatively rare in the image from which the tables were developed even though such a circumstance is seldom encountered.
It is for this reason that some Huffman tables have effectively come into standard usage even though optimal compression and/or optimal fidelity for the degree of compression utilized will not be achieved. Conversely, compression efficiency of Huffman ending can usually be significantly increased and greater image fidelity optimally maintained for a given number of bits of data by custom Huffman tables corresponding to the image of interest but may be achievable only with substantial computational burden for encoding.
Another inefficiency of Huffman coding characteristically is encountered when the rate of occurrence of any particular value to be encoded rises above 50% due to the hierarchical nature of the technique of assigning Huffman codes to respective coded values and the fact that at least one bit must be used to represent the most frequently occurring and most predictable value even though the information contained therein may, ideally, justify less. For example, if the rate of occurrence of a single value rises to 75%, the efficiency of Huffman coding drops to 86%. As a practical matter, the vast majority of DCT coefficients are (or are quantized to) zero and substantial inefficiencies are therefore frequently encountered in Huffman encoding of DCT coefficients or DCT coefficient differences. For the AC coefficients, the JPEG committee solved this problem by collecting runs of zero-valued coefficient together and not coding the individual zero-valued coefficients. Thus the likelihood of a Huffman symbol occurring more than half the time is greatly reduced.
Huffman decoding also requires a substantial computational burden since the compression efficiency derives largely from a variable length code that requires additional processing to detect the ending point of respective values or symbols. Additionally, this processing is achieved through use of coding tables which must be transmitted with the entropy coded image data and may change with every image. Complexity of access to data in tables is aggravated by the fact that Huffman codes are of variable length in order to allocate numbers of bits in accordance with the predictability of the data or amount of data contained in a particular symbol or value. To increase response speed it has been the practice to compute and store look-up tables from the Huffman tables which can then be used for decoding. Therefore, at least two bytes of table data must often be accessed per code word.
Moreover, since sixteen bit Huffman code lengths are allowed, a prior art method of decoding would access a Huffman table with a 216 entries for the symbol values and 216 entries for the code length. These entries must be computed each time a Huffman table is specified or changed. Up to fours DC and four AC tables could be needed to decode and interleaved baseline four-component image. For hardware with small on-chip caching or RAM, such large tables will degrade performance because extra processing cycles are needed with every cache miss or RAM access.