1. Technical Field
The invention is related to perfect hashing for packing sparsely defined or sparsely distributed data into memory, and in particular, to perfect hashing of variable-rate data of one or more dimensions in an efficient randomly accessible format.
2. Related Art
In general, hashing is a well known technique for mapping data elements into a hash table by using a hash function to process the data for determining an address in the hash table. To cope with collisions (i.e., two or more data elements mapping to the same address in the hash table), hashing algorithms typically perform a sequence of probes into the hash table, where the number of probes varies per query. These probes provide collision detection by determining whether data has already been stored at a particular address in the hash table. Unfortunately, in time critical applications, such as image rendering in a graphics processing unit (GPU), this type of probing strategy is inefficient because GPU SIMD parallelism makes all pixels wait for the worst-case number of probes. Some GPUs can address this issue using dynamic branching; however, this is only effective if all pixels in a region follow the same branching path, which is unlikely for hash tests.
Avoiding excessive hash collisions and clustering generally requires a hash function that distributes data seemingly at random throughout the table. Consequently, hash tables often exhibit poor locality of reference, resulting in frequent cache misses and high-latency memory accesses. Perfect hashing addresses some of these problems by using a hash function that maps elements into a hash table without any collisions, such that all the elements map to distinct slots of the hash table.
In theory, perfect hash functions are rare in the space of all possible functions. In fact, the description of a minimal perfect hash function (wherein all slots in the hash table are filled) theoretically requires a number of bits proportional to the number of data entries. Consequently, it is not generally feasible to construct a perfect hash using an expression with a small number of machine-precision parameters. Instead, additional data is generally stored in auxiliary lookup tables for use in combination with the hash table.
Typical perfect hashing schemes have generally focused on external storage of data records indexed by character strings or sparse integers. Consequently, conventional perfect hashing schemes are generally not well adapted for use with spatially coherent multidimensional data. For example, in typical computer graphics applications, 2D and 3D texture data is often accessed coherently by the GPU (i.e., adjacent or nearby image patches or segments are accessed either sequentially or in parallel by the GPU). This texture data is then swizzled, tiled, cached, etc. by the GPU. Unfortunately, typical hash functions do not generally exploit the spatial coherence issues of such data when accessing that data.
Further, many images, such as, for example, vector images or graphics, involve sparsely defined spatial data. In particular, with these types of images, image discontinuities such as sharp vector silhouettes are generally present at only a small fraction of pixels. Texture sprites often overlay high-resolution features at sparse locations. Image attributes such as alpha masks are mainly binary, requiring additional resolution at only a small subset of pixels. In addition, surface textures or geometries can be also represented as sparse 3D data. Further, such data can be encoded using variable rates, such as where various cells of a grid-based vector image have a differing amount of complexity. However, compressing this type of variable-rate sparse data while retaining efficient random-access is a problem that has not been addressed by conventional perfect hashing schemes.
Spatial hashing is another conventional hashing technique that is commonly used for point and region queries in multidimensional databases. Spatial hashing is also used in graphics for efficient collision detection among moving or deforming objects. However, these techniques generally employ imperfect hashing (e.g., traditional multi-probe hash tables implemented on the CPU). Further, these techniques do not transition to multidimensional tables. Also, they strive to make intermediate hashes as random as possible. Consequently, conventional spatial hashes are not well adapted for use with sparsely defined spatial data such as vector graphics.