1. Technical Field
The present invention relates to data compression. More particularly, the present invention relates to compressing large sets of short blocks of data.
2. Description of the Prior Art
Read-only memory ("ROM") is used in many of today's electronic products to store the firmware and data that define the product functionality. In some instances, the amount of available physical ROM is dictated by hardware design or cost considerations, and the amount of data that can be stored in the ROM determines the level of functionality of the product. Conversely, in other instances, a minimal functionality is required, and the amount of physical ROM required to accommodate such functionality impacts the hardware requirements, and, ultimately, the cost of the product. In either case, the use of data compression to store more data in the same amount of physical ROM can lead to competitive advantages in functionality, cost, or both.
For example, built-in ROM is used to store large data sets in printers manufactured for the Asian market. To meet minimal functionality requirements, these printers must include a certain number of Kanji fonts in some industry standard format (e.g. TrueType.RTM. format). A typical Kanji font includes representations of about 10,000 Kanji glyphs. Each glyph varies in length from a few tens to several hundreds of bytes each. The total size of the uncompressed font is typically between 2 and 3 Mbytes. When printing a page, the printer must be able to access each of the individual glyphs in a random fashion and in real time. Therefore, if compression is used, each glyph must be compressed independently, or in small groups, such that when a given glyph is accessed, only a small amount of data needs to be decompressed.
The requirement that small blocks of data be compressed independently to allow independent and random access to each compressed block makes it difficult to exploit redundancies and correlation between the different blocks as part of a compression scheme. Another critical parameter is decompression speed, dictated by the real-time access requirement. Clearly, decompression speed and block size are related. A faster decompression algorithm allows larger groups of glyphs to be compressed together, thereby providing better compression ratios while still meeting the real-time decompression requirement. In the printer application discussed above, a certain minimal number of fonts must be stored in ROM to meet minimal functionality requirements. Data compression can help reduce the physical hardware size (e.g. number or size of ROM integrated circuits) required to store this minimal font set, thus reducing the cost of the product.
Another example of a ROM-based data compression application is the use of a ROM disk in super-portable computers. These computers may include the operating system and several popular software applications in a ROM card that is built into the machine. The amount of software that can be built into the product, and, hence, its functionality, are limited by the physical size of the ROM card. For a given physical size, the increased logical capacity of the ROM card that results from data compression leads to increased product functionality, and, hence, increased product marketability.
The ROM card is logically organized, and is seen by the system as a regular disk containing a file system. The minimal unit of data that is accessed by the system is a disk cluster, i.e. a data block whose size can vary between about 512 and 8192 bytes. Generally, the disk cluster size is fixed for a given file system. A typical ROM disk has a total capacity of about 10-20 Mbytes (i.e. raw, uncompressed capacity). As in the case of the printer font application discussed above, a ROM disk application requires that thousands of small blocks of data, i.e. the disk clusters, must be randomly accessible in real time.
These are just two examples of a class of applications of data compression that share the following set of common requirements:
1. The data set consists of a multitude, e.g. thousands, of short blocks of data, e.g. data blocks having from a few tens to a few thousand bytes.
2. The individual data blocks must be randomly accessible. Therefore, it must be possible to decompress each block independently.
3. Access time to individual blocks is critical. Therefore, decompression time is critical. However, compression is not time critical because it only need be performed once.
The state of the art provides several symmetrical data compression/decompression schemes. A thorough compression, i.e. one that achieves the densest possible compression, takes longer. Thus, there may not be sufficient time available in a symmetric, real time system, for a thorough compression cycle. In an asymmetric system, i.e. one that does not impose real time restrictions on the compression phase, a more thorough compression can be obtained. On the other hand, during the compression cycle, data structures can be optimized to accelerate the decompression cycle.
A scheme for compressing large data sets consisting of several small blocks of data would find ready application in the data processing industry and allow manufacturers to increase product functionality and decrease product form factor.