This specification relates to generating hash functions for use in nearest neighbor search.
Nearest neighbor search identifies a group of items that are most similar to a given item in a feature space. Each of the items is represented by a feature vector in the feature space. One method for performing nearest neighbor search uses binary hashing to map the features of an item to a Hamming code in a Hamming space, i.e., a sequence of ones and zeros, and then compares the resulting Hamming codes. Good binary hashing functions map items that are similar in the original space to similar Hamming codes and map items that are dissimilar in the original space to dissimilar Hamming codes.
A K-bit Hamming code is generated for an item using a sequence of K hash functions, each of which specifies the value of one bit in the Hamming code. Various methods, for example, Locality Sensitive Hashing, Shift-Invariant Kernel-based Hashing, and Spectral Hash have previously been used to determine appropriate hash functions.
However, these methods do not always generate good short Hamming codes; e.g., when K is small, there is not often good discrimination in the resulting Hamming codes. While longer codes have better discrimination than short codes, longer codes also require more storage overhead and more computation time than shorter codes.