This present invention relates to lossless digital data compression, and more particularly to a method and apparatus for re-indexing digital data to a new symbol mapping as an aid to data compression.
Color digital images can be represented in a number of digital formats. FIG. 1 depicts two common formats, a color plane format (planes 24, 26, 28) and a palette-indexed format (table 30 and index image 32). In either format, digital image 20 is represented by an array of pixel values. In a color plane format, each pixel is represented in multiple color planes, for instance, red, green, and blue color planes. In the example of FIG. 1, representative color plane values are shown for a subimage 22 of image 20. Red color plane 24, green color plane 26, and blue color plane 28 represent the intensity of their respective colors, at each pixel position, as a number between 0 and 255. This format allows over sixteen million unique colors to be represented accurately. The downside of this format is that it requires twenty-four bits to represent each pixel in the image.
In many images (particularly computer-generated images and icons) relatively few colors are used. In most other images, a number of intelligently-selected colors much smaller than sixteen million, e.g., 256 colors, may be entirely adequate to represent the image data to a viewer. Such images are ideally suited for a palette-indexed image.
A typical palette-indexed image has two elements: a palette table, which provides for translation between an index value and its associated red, green, and blue intensity values, and an index image, which contains an index value for each pixel in the image. In FIG. 1, palette table 30 contains eight indices, one for each unique combination of red, green, and blue pixel values that is found in subimage 22. For instance, index 0 represents a light gray, index 1 represents white, and index 2 represents a dark brown. Index image 32 contains one index value for each pixel.
The size of a palette-indexed image is the sum of the sizes of the index image and the palette table. For images of any significant size, the palette table typically requires a relatively negligible amount of space, such that the size of the image is dominated by the size of the index image.
A viewable image is created from a palette-indexed image by replacing each index with its palette table entry. Thus, the index xe2x80x9c0xe2x80x9d for the top left-hand pixel of index image 32 would be used to retrieve the red, green, blue values (192,192,192) from palette table 30 for use during display of that pixel.
Many applications for palette-indexed images can greatly benefit if the palette-indexed image data can be compressed, e.g., for efficient storage and/or transmittal. Accordingly, several lossless compression schemes are currently employed with palette-indexed images. It has been recognized that for some lossless compression schemes, compression efficiency can vary with palette selection. In other words, by merely xe2x80x9cshufflingxe2x80x9d the entries in palette table 30 (and re-indexing the index image to conform to the new palette table), different compression values may result for subsequent compression of the index image.
U.S. Pat. No. 5,471,207, issued to Zandi et al., is entitled xe2x80x9cCompression of Palettized Images and Binarization for Bitwise Coding of M-ary Alphabets Thereforexe2x80x9d. Zandi et al. describe a reindexing scheme that uses a context model of the input data to select a particular binarization of the input data to provide good compression with a binary entropy coder. This reindexing scheme accepts a dataset that uses a series of M symbols, S0, . . . , SMxe2x88x921, which can be arranged randomly, or in decreasing order of occurrence in the dataset. Symbol S0 is binarized to a first reindex value. Symbol S1 is then binarized to a second reindex value, selected from all unassigned reindex values, that minimizes the bitwise entropy for the reindexed S0 and S1. The process recurses through the remaining symbols Si, each time selecting a reindex value from the unassigned reindex values.
The Zandi et al. approach is limited to systems that use a binary entropy coder, and the performance of their approach is also limited by the particular order in which symbols are considered for re-indexing.
It is recognized herein that a general approach for reindexing symbols is needed. The embodiments presented herein illustrate such an approach, based on the general observation that many encoders perform better when the data presented to them contains small, rather than large, variations between adjacent data. Thus, for instance, a palettized image may be more easily compressed if symbols that frequently occur adjacent to each other in the index image are assigned symbol values that are close in symbol space.
The embodiments disclosed herein also overcome prior art limitations regarding symbol selection order. For instance, Zandi et al.""s reindexer considers symbols in a fixed order, essentially answering the question, for each symbol in order, xe2x80x9cwhere should this symbol go?xe2x80x9d In contrast, the disclosed embodiments consider a limited number of reassignment positions at each iteration, considering many (or all) unassigned symbols for these positions. In essence, these embodiments answer the question xe2x80x9cwhich symbol best belongs in this position?xe2x80x9d This approach avoids situations where a critical symbol (from a compressibility standpoint) receives a suboptimal reassignment, merely because more optimal positions were first filled by less critical symbols.
In accordance with one aspect of the invention, a method for reindexing a digital array of symbol values is disclosed. The symbols are drawn from an M-ary alphabet of symbols. An array of cross-counts is calculated, the array comprising individual cross-counts that each indicate the degree of occurrence, within the digital array, of two symbols drawn from the M-ary alphabet appearing in a predefined contextual relationship. A symbol reassignment pool is initialized, and a symbol from the M-ary alphabet is assigned to a seed position in the pool. Unassigned symbols are then considered for assignment to positions adjacent to the symbols already assigned to the pool. To select the appropriate symbol or symbols for assignment, potential functions are calculated for unassigned symbols. A potential function for a particular unassigned symbol and pool position is based on weighted cross-counts between that symbol and those symbols already in the pool. The unassigned symbol with the largest potential function is assigned to the symbol reassignment pool at the position for which that potential function was calculated.
Preferably, each time a symbol is assigned to a position in the symbol reassignment pool, potential functions are recalculated, and then another symbol is assigned to a position in the symbol reassignment pool, this process continuing until all symbols from the M-ary alphabet have been assigned to the symbol reassignment pool.