Data may be provided by numerous types of data sources. Examples of data sources include cameras, microphones, cellular telephones, radios and electronic documents. Signals representing the data output from or, in the case of documents, the data in the document may be analyzed because the signals may contain a feature that is of interest to a user. For example, the image may contain an object that is to be recognized, or the audio signal or electronic document may contain a particular phrase or word.
One technique of analyzing data is by vector quantization. Vector quantization divides a data set into segments that may be represented as vectors. The vectors may be grouped into a plurality of groups using a nearest neighbor search algorithm such as, K-means, distance mapping or clustering. The output of the nearest neighbor search algorithm may be a vector that represents a center, or centroid, of each of the groups of vectors. There may be hundreds to thousands of centroids that are representative of data in an input data set. The centroid vector may be determined after a number of iterations of the nearest neighbor search. By representing the data set with a centroid vector, a complex or a large volume of data may be represented by a smaller data set. The centroid vectors may be used as codewords in a codebook for the particular groups of vectors. There may be a number of iterations through the data vectors to determine an optimal centroid vector that will be used as the final codeword. This iterative process may be called training of the codebook. The codewords may be of different lengths. Vector quantization can be used, for example, in signal analysis, image analysis, data compression and other data processing operations.
The data analysis performed by using vector quantization may be performed by comparing data of interest to each of the codewords in the codebook. This comparison can consume both time and resources. As a result, the comparisons may need the more powerful processors available only on servers or some desktop computers.
There are many computer operations that may be performed to narrow data sets such as K-means, clustering, hash functions, and the like. An exemplary hash function H is a transformation that takes a variable-size input m and returns a fixed-size string, which is called the hash value h (that is, h=H(m)). The hash value may be stored in a hash table that is indexed according to the respective hash values.