Many graphics applications involve sparsely defined spatial data. For example, image discontinuities such as sharp vector silhouettes are generally present at only a small fraction of pixels. Texture sprites often overlay high-resolution features at sparse locations. Image attributes such as alpha masks are mainly binary, requiring additional resolution at only a small subset of pixels. Surface texture or geometry can be represented as sparse 3D data.
Compressing such sparse data while retaining efficient random-access is a challenging problem. Current solutions include the following:                Data quantization is lossy and uses memory at all pixels even though the vast majority may not have defined data.        Block-based indirection tables typically have many unused entries in both the indirection table and the blocks.        Intra-block data compression (including vector quantization) uses fixed-length encodings for fast random access.        Quadtree/octree structures contain unused entries throughout their hierarchies, and moreover require a costly sequence of pointer indirections.Such solutions, however, incur significant memory overhead.Perfect Hashing        
A perfect hash usually refers to a hash function that maps elements into a hash table without any collisions. Generally, all the elements map to distinct slots of the hash table. The probability that randomly assigning n elements in a table of size m results in a perfect hash is
            Pr      PH        ⁡          (              n        ,        m            )        =                    (        1        )            ·              (                  1          -                      1            m                          )            ·              (                  1          -                      2            m                          )              ⁢                  ⁢    …    ⁢                  ⁢                  (                  1          -                                    n              -              1                        m                          )            .      When the table is large (i.e., m>>n) one can use the approximation ex≅1+x for small x to obtain
                                          Pr            PH                    ⁡                      (                          n              ,              m                        )                          ≅                ⁢                              1            ·                          ⅇ                                                -                  1                                /                m                                      ·                          ⅇ                                                -                  2                                /                m                                              ⁢                                          ⁢          …          ⁢                                          ⁢                      ⅇ                                          -                                  (                                      n                    -                    1                                    )                                            /              m                                                              =                ⁢                  ⅇ                                    -                              (                                  1                  +                  2                  +                  …                  +                                      (                                          n                      -                      1                                        )                                                  )                                      /            m                                                  =                ⁢                  ⅇ                      -                          (                                                                    n                    ⁡                                          (                                              n                        -                        1                                            )                                                        /                  2                                ⁢                m                            )                                                              ≅                ⁢                              ⅇ                                                            -                                      n                    2                                                  /                2                            ⁢              m                                .                    Thus, the presence of a hash collision is highly likely when the table size m is much less than n2. This is an instance of the well-known “birthday paradox” (e.g., a group of only 23 people have more than 50% chance of having at least one shared birthday).The probability of finding a minimal perfect hash (e.g., where n=m) is
                                          Pr            PH                    ⁡                      (            n            )                          =                ⁢                                            (                              n                n                            )                        ·                          (                                                n                  -                  1                                n                            )                        ·                          (                                                n                  -                  2                                n                            )                                ⁢                                          ⁢          …          ⁢                                          ⁢                      (                          1              n                        )                                                  =                ⁢                              n            !                                n            n                                                  =                ⁢                  ⅇ                      (                                          log                ⁢                                                                  ⁢                                  n                  !                                            -                              n                ⁢                                                                  ⁢                log                ⁢                                                                  ⁢                n                                      )                                                  ≅                ⁢                  ⅇ                      (                                          (                                                      n                    ⁢                                                                                  ⁢                    log                    ⁢                                                                                  ⁢                    n                                    -                  n                                )                            -                              n                ⁢                                                                  ⁢                log                ⁢                                                                  ⁢                n                                      )                                                            =                    ⁢                      ⅇ                          -              n                                      ,            which uses Stirling's approximation log n!≅n log n−n. Therefore, the expected number of bits needed to describe these rare minimal perfect hash functions is intuitively
                    log        2            ⁢              1                              Pr            PH                    ⁡                      (            n            )                                ≅                  log        2            ⁢              ⅇ        n              =                    (                              log            2                    ⁢          ⅇ                )            ⁢      n        ≅                  (        1.443        )            ⁢              n        .            Several number-theoretical methods construct perfect hash functions by exploiting the Chinese remainder theorem. However, even for sets of a few dozen elements, these functions involve integer coefficients with hundreds of digits.
A more computer-amenable approach is to define the hash using one or more auxiliary tables. One approach uses three such tables and two nested hash functions to hash a sparse set of n integers taken from u={0 . . . , u−1}. Such a scheme takes constant time and 3n log n bits of memory. The hash is constructed with a deterministic algorithm that takes O(nu) time. Another approach reduces space complexity to the theoretically optimal Θ(n) bits, but the constant is large and the algorithm difficult.
Some schemes treat perfect hashing as an instance of sparse matrix compression. They map a bounded range of integers to a 2D matrix and compact the defined entries into a 1D array by translating the matrix rows. Sparse matrix compression is known to be NP-complete.
The most practical schemes achieve compact representations and scale to larger datasets by giving up guarantees of success. These probabilistic constructions may iterate over several random parameters until finding a solution. For example, one scheme defines a hash h(k)=h0(k)+g1[h1(k)]+g2[h2(k)]mod m, where functions h0,h1,h2 map string keys k to m, r, r respectively, and g1,g2 are two tables of size r. However, this algorithm takes expected time O(r4), and is practical only up to n=512 elements.
Another approach involves creating the first scheme with good average-case performance (˜11n bits) on large datasets. The insight is to assign values of auxiliary tables g1,g2 in decreasing order of number of dependencies. This approach also describes a second scheme that uses quadratic hashing and adds branching based on a table of binary values. This second scheme achieves ˜4n bits for datasets of size n˜106.
Spatial Hashing
Hashing is commonly used for point and region queries in multidimensional databases. Spatial hashing is also used in graphics for efficient collision detection among moving or deforming objects. However, these techniques employ imperfect hashing (e.g., traditional multi-probe hash tables implemented on the CPU).
These techniques do not transition to multidimensional tables. Also, they strive to make intermediate hashes as random as possible. As such, there exists a need for a perfect multidimensional hash function that preserves spatial coherence and thus improves runtime locality of reference.