Conventionally, a feature quantity is used in many technological fields such as image retrieval, speech recognition, text retrieval, and pattern recognition. The feature quantity is generated by converting information such as an image, speech, and text so as to be easily processed on a computer. The feature quantity is represented as a D-dimensional vector (feature vector).
An operation using feature vectors can determine the degree of similarity of a content, for example. Suppose that there is a short distance between a feature vector for image α and a feature vector for image β. In this case, images α and β can be assumed to be similar to each other. Similarly, suppose that there is a short distance between a feature vector for speech waveform α and a feature vector for speech waveform β. In this case, speech waveforms α and β can be assumed to be similar to each other. The information processing such as speech recognition, text retrieval, or pattern recognition converts information into feature vectors, compares the feature vectors with each other, and finds a distance between them to determine the degree of similarity of the information.
The measurement of a distance between feature vectors uses L1 norm, L2 norm, and intervector angle as scales. The scales can be calculated as follows in terms of feature vectors x, y∈RD.
L1 Norm
                    [                  Math          .                                          ⁢          1                ]                                                                                                            x              -              y                                            1                =                              ∑            i                    ⁢                                                                x                i                            -                              y                i                                                                                                
L2 Norm
                    [                  Math          .                                          ⁢          2                ]                                                                                                            x              -              y                                            2                =                                            ∑              i                        ⁢                                          (                                                      x                    i                                    -                                      y                    i                                                  )                            2                                                                      
Intervector Angle
                    [                  Math          .                                          ⁢          3                ]                                                            θ        =                              cos                          -              1                                (                                                    x                T                            ⁢              y                                                                                          x                                                  2                            ⁢                                                                  y                                                  2                                              )                                                
The following problems arise if the feature vector equals the real vector. One problem slows down calculation of a distance between the two feature vectors x, y∈RD. For example, the square of the L2 norm may be used as a distance scale as follows.
                    [                  Math          .                                          ⁢          4                ]                                                                                                            x              -              y                                            2          2                =                              ∑                          i              =              1                        D                    ⁢                                    (                                                x                  i                                -                                  y                  i                                            )                        2                                                          
Therefore, the calculation requires subtractions D times, multiplications D times, and additions D−1 times. The calculation load is very high if the feature vector is represented in a floating-point format. Increasing the feature vector dimension further increases the calculation load.
As another problem, a large amount of memory is consumed. If a feature vector is represented in a 4-byte single-precision real number, the D-dimensional feature vector consumes 4D-bytes of memory. Increasing the feature vector dimension also increases the memory consumption. A large number of feature vectors consume the amount of memory corresponding to the feature vectors to be processed.
To solve the two problems, there are recently proposed techniques to convert a feature vector into a binary code using a sequence of 0s and 1s. Typical techniques include the random projection (see non-patent literature 1), the very sparse random projection (see non-patent literature 2), and the Spectral Hashing (see non-patent literature 3).
These techniques convert a D-dimensional feature vector into a d-bit binary code. The conversion is performed so that a distance in the original space strongly correlates with a Hamming distance in the space after the conversion (see Lemma 3.2 on page 1121 of non-patent literature 1 for the reason why a distance in the original space strongly correlates with a Hamming distance in the space after the conversion). This enables the Hamming distance calculation on binary codes to replace the calculation of a distance between feature vectors.
A Hamming distance results from counting the number of different bits in two binary codes. This calculation is very fast because it just counts the number of bits set to 1s after XORing two codes. In many cases, the binary code conversion accelerates the calculation several tens to hundreds times faster. The originally required amount of memory can be reduced from 4D bytes to d/8 bytes because the Hamming distance calculation on binary codes replaces the calculation of a distance between feature vectors. This can save the memory capacity by a tenth to a hundredth.
An extracted feature quantity can be converted into a binary code. Applying various algorithms to binary codes enables to retrieve or recognize contents. To retrieve similar contents, for example, feature quantities of contents registered to a database are all converted into binary codes in advance. The feature quantity of a content supplied as an input query is also converted into a binary code. Calculating Hamming distances between the binary code for the input query and all binary codes registered to the database can retrieve and output contents similar to the input query.
A binary code includes a sequence of 0s and 1s corresponding to d bits. The binary code may be considered a d-dimensional vector whose element each takes only two values −1 and 1. To avoid confusion in the following description, terms “binary code” and “binary vector” are distinguished as follows. A “binary code” is represented as data containing a sequence of 0s and 1s. For example, storing a 128-bit binary code in C in memory just requires an array of unsigned integer type (unsigned char) corresponding to 16 elements (8 bits×16=128 bits).
By contrast, a “binary vector” contains elements each of which takes only two values. For example, suppose that a binary vector contains elements each of which takes only −1 and 1. Then, a binary vector corresponding to binary code “01101110” is (−1,1,1,−1,1,1,1,−1)T. The binary vector may contain elements each of which takes only two values 0 and 1. Further, the binary vector may contain elements each of which takes only two any values α and β(α≠β). A difference between “binary code” and “binary vector” concerns the expression of information. There is no essential difference between the two.
The feature vector can be converted into a d-dimensional binary vector whose elements each take only two values −1 and 1. Then, various processes such as recognition using SVM (support vector machine) and k-means clustering can be also applied to binary codes. However, these cases may be incapable of accepting the benefit of high-speed distance calculation based on the Hamming distance. In other words, some algorithms may disable the benefit of high-speed distance calculation based on the binary code conversion.
The following describes a classifier-based recognition process and a k-means clustering algorithm as examples that are incapable of accepting the benefit of high-speed distance calculation based on the binary code conversion. For example, the classifier-based recognition process applies the linear SVM (linear support vector machine) to a case of identifying binary vector x∈{−1,1}d as two classes. The linear SVM evaluates the following equation.f(x)=wTx+b 
If f(x) is positive, the process assumes that x belongs to class A. If f(x) is negative, the process assumes that x belongs to class B.
Symbol w denotes the weight parameter and is expressed as w∈Rd. Symbol b denotes the bias parameter and is expressed as b∈R1. A learning process automatically determines the parameters w and b using feature quantities predetermined for the learning.
In this case, w∈Rd results in a real value instead of two values even if the feature quantity provided for the learning is a binary vector. The calculation of f(x) contains wTx. However, the calculation of wTx requires floating-point calculation because x corresponds to two values and w corresponds to a real-value vector. The classifier-based recognition process using SVM cannot accept the benefit of high-speed calculation based on conversion of a feature vector into a binary vector.
The following describes a case of applying the k-means clustering algorithm to binary vectors, namely, a case of finding k clusters each of which groups binary vectors that are close to each other when there are provided N d-dimensional binary vectors. The k-means clustering algorithm calculates k clusters and representative vectors according to the following procedure.
Step 1: Randomly select k feature quantities from N feature quantities and define the selected feature quantities as representative vectors for a cluster.
Step 2: Find the nearest representative vector for each of the N feature quantities given as input.
Step 3: Calculate an average of the feature quantities belonging to each representative vector and define the result as a new representative vector.
Step 4: Repeat steps 2 and 3 until the results converge.
The k-means clustering algorithm causes a problem that a new representative vector is defined by an average of binary vectors at step 3. The average calculation changes the representative vector to a real-number vector even if data given as input is a binary vector. Consequently, the distance calculation at step 2 must find a distance between the binary vector and the real vector. Namely, the floating-point calculation is required. The k-means clustering algorithm also cannot accept the benefit of high-speed calculation based on conversion of a feature vector into a binary vector.
As described above, the recognition process using the classifier or the k-means clustering algorithm cannot accept the benefit of high-speed calculation based on conversion of a feature vector into a binary vector. This is because both of them require an inner product operation between d-dimensional binary vector p∈{−1,1}d and d-dimensional real vector q∈Rd. The k-means clustering algorithm requires a “distance” between d-bit binary vector p∈{−1,1}d and d-dimensional real vector q∈Rd. This ultimately results in an operation of inner product pTq. This is because the square of a Euclidean distance between p and q is expressed by the following equation.∥p−q∥2=∥p∥2−2pTq+∥q∥2  [Math. 5]
The classifier-based recognition process and the k-means clustering algorithm require speeding up calculation of an inner product between a binary vector and a d-dimensional real vector to solve the problem.