A large table of entries, such as one 64 kilobytes in size and indexed by a 16-bit key, may contain a relatively smaller subset of non-contiguous entries that is supported by a particular application. For example, Unicode is a world-wide character encoding standard based on a 16-bit unit of encoding. With Unicode-configured products supporting only the code points used in a specific language, a smaller, non-contiguous subset of that language's code points are used. By way of example, with Japanese, in which products ordinarily use the Japanese Industrial Standard (JIS) code, only approximately 6,000 symbols, out of the 64 kilobyte possible symbols supported by Unicode, are used. These 6000 or so symbols are widely scattered to various locations throughout the Unicode range, and the remaining code points are of no interest to that product.
Storing a smaller number of character attributes or the like in a full size, 64-kilobyte table leaves many unused entry values. Although this storage method allows a simple indexing operation using the Unicode value as the key, and is thus the fastest method for retrieving stored information, the table is 64k times the size of the attributes, with most of the space wasted. For many systems and applications, the fast retrieval speed is not worth the amount of memory required to implement this method, and thus other storage methods have been attempted.
An alternative to the full size table of entries uses a binary search. While no slots are wasted in such a binary table, the key must be stored with each entry, increasing the amount of space used for storing the entries of interest. Moreover, it takes O(log(N)) operations to perform a look-up, where N is the size of the table, and on the average thus provides relatively slow retrieval compared to other storage methods.
A hash table may be used to store the attributes, indexed by a hash value computed from the code point. One type of hash table is a hash table with collision resolution. This type of hash table must store information to handle collisions, usually accomplished by storing the key with each entry. Extra time is required to test for collisions, and even more time is required to resolve a collision when a collision is detected. Lastly, this type of hash table typically frequently winds up with a number of extra, unused entries, i.e., the table is not always densely packed.
Lastly, perfect hash algorithms (with no collisions) provide desirable performance in many applications, as they can provide a good tradeoff between memory usage and speed. Moreover, with perfect hash algorithms, the keys are not stored with the entries, which saves space. However, extra, unused entries are typically present with perfect hashing, thus wasting space. In addition, the cost (e.g., the amount of processing time) required to calculate the hash value must be given important consideration, since retrieval time is an important factor in each look-up operation. Moreover, what works well for hashing one subset of values of a larger range does not work well for another subset. Consequently, a significant problem with perfect hash algorithms is that in general, no good, rapid and consistent way exists in which to find a perfect hash for a given subset of values that is both very fast and which densely packs the hash values together.