As a technique that searches gigantic data, such as millions or billions of data laid open on Web sites, such a technique that transforms features of data into short binary templates, globally termed a “binary hashing”, has been developed briskly. Data to be retrieved is transformed into fixed-length binary data (binary templates), and a bit logic operation, such as an exclusive OR (XOR), is used to compute a distance between two items of fixed-length binary data (binary templates). The bit logic operation such as XOR, is fast. Thus, if the data length of the fixed-length binary data (binary templates) could be reduced sufficiently, even in retrieving in a large-scale database, high speed retrieval might be accomplished using the data loaded on a physical memory of a computer. However, if, in order to compute the distance between two binary templates, such a method that counts the number of times of flips (bit inversions) in the result of an XOR operation is used, the distance computed may be deviated significantly from the distance between the original data.
It is noted that binary hashing maps a data set that is composed by a plurality of (n number of) items of data and represented by points on a D-dimensional space, where D is a predetermined positive integer,X={{right arrow over (x)}1, . . . ,{right arrow over (x)}n}∈RD×n  (1)to a Hamming space of binary codes (binary codes) in which near or nearest neighbors on the original space are mapped similarly near or nearest on the Hamming space. That is, the data set is transformed to n number K)-bit-long binary codes (binary data),Y={{right arrow over (y)}1, . . . ,{right arrow over (y)}n}∈BK×n  (2)K being a preset positive integer, as the near or nearest neighbor relation by the Euclid distance in the original data set space RD×n is kept as it is. In the above expressions, a symbol “→” (representing a superscript arrow) denotes a vector. →x; (i=1, . . . , n) denotes a D-dimensional vector and →yi (i=1, . . . , n) denotes a K-dimensional vector. Note that, in the equation (1), R denotes a set of entire real numbers, and that, in the equation (2), B denotes a binary code (binary code).
To produce a K-bit binary code, K number hash functions are used. A hash function receives a D-dimensional vector and returns a binary value −1 or 1, as an example.
There are a large variety of hash functions and it is assumed here that the hashing that is based on linear projection. A k'th (k=1, . . . , K) hash function hk(→x) is defined by the following equation (3):hk({right arrow over (x)})=sgn(f({right arrow over (w)}kT{right arrow over (w)}+bk  (3)
In the above equation, sgn( ) is a sign function which returns a sign of an argument, that is, a function in which, if, in the equation (3), an argument f( ) is negative or positive, the sign function returns −1 or +1, respectively. f( ) is a transform function, →wk is a projection vector, T is a transpose, →x is a data point and bk is a threshold value (offset).
Sincehk({right arrow over (x)})∈{1,−1}a k'th one of binary hash bits is given by the following expression (4):
                              (                      1            +                                          h                k                            ⁡                              (                                  x                  →                                )                                              )                2                            (        4        )            
That is, the k'th bit of the binary code (k=1, . . . , K) is 1, and 0 when the k'th hash function hk (→x) is +1, and −1, respectively.
As the technique of the binary hashing, there is a series of techniques termed Locality Sensitive Hashing (is abbreviated as “LSH”), see Non-Patent Literature 1 and so forth.
In LSH, an identity function is used as the transform function f( ) of the above equation (1), and →w is randomly selected from p-stable distributions, while →b is randomly selected from uniform distributions. LHS does not rely on parameter selection or learning data, so that only short time is needed for parameter determination.
It has been proved that, in LHS, the degree of approximation of the neighbor relation may be improved by increasing the projected bit length K to 128, 512 and so on. That is, the Hamming distance may approach more closely to the Euclidean distance. It has however been pointed out that approximation in LHS is not good in the case wherein the bit length K is not of a larger value, so that sufficient accuracy may not be achieved.
Non-Patent Literature 2 in particular discloses a technique in which, as in LHS, selection of parameters of the hash function does not rely on learning data. In this technique, in which →w is selected in a similar manner as in LHS, a trigonometric function is used as the transform function f( ). It is said that, by so doing, the accuracy in the approximation has been improved for the bit length K which is not of a larger value.
These days, such a technique in which selection of parameters of the hash function relies on learning data is being developed. The spectral hashing, disclosed in Non-Patent Literature 3, uses a trigonometric function as the transform function f( ). In the spectral hashing, after moving the learning data {xi} so that the centroid thereof coincides with a zero or origin point, a principal axis obtained by processing learning data with principal component analysis (Principal Component Analysis: PCA), with an offset b being set to 0 and with the projection vector of →wk, is selected. That is, the spectral hashing algorithm may be defined as follows:
After translating in-parallel the data so that an average value thereof is equal to zero, principal components of the data are found using the PCA.
For each of the PCA directions, eigenfunctions (Lpφ=λφ) of Lp (e.g., one-dimensional Laplacian), that is, k number smallest single-dimension analytical eigenfunctions, are calculated with the use of rectangular approximation. For each of the directions, k number smallest eigenvalues are calculated to generate a list of d×k number eigenvalues to find k number smallest eigenvalues
A binary code is obtained by using a threshold value of 0 from an output of the analytical eigenfunctions for input of each data.
In the Non-Patent Literature 3, the eigenfunctions Φk and eigenvalue λk of the one-dimensional Laplacian are given as below:
      Φ    k    =                    sin        ⁡                  (                                    π              2                        +                                                            k                  ⁢                                                                          ⁢                  π                                                  b                  -                  a                                            ⁢              x                                )                    ⁢                          ⁢      and      ⁢                          ⁢              λ        k              =          1      -              exp        ⁡                  (                                    -                                                ɛ                  2                                2                                      ⁢                                                                                                k                    ⁢                                                                                  ⁢                    π                                                        b                    -                    a                                                                              2                                )                    
While in LHS, the projection vector →w is randomly generated, in the spectral hashing, it is found based on the principal component analysis (PCA) of data. For this reason, the spectral hashing is said to be higher in accuracy than LHS. However, in the spectral hashing, it is necessary to perform principal component analysis. Thus, if singular value decomposition which is stable, as numerical computation, is used, the computation amount of spectral hashing is on the order of O (N2) to O (N3), where N is the number of dimensions of a matrix (number of dimensions of features). Note that O (N2) to O(N3) indicates that an algorithm is such a one in which the computation amount is proportional to a square or a triple of the size (N) of an input data set.
It is known in general that a pattern as a subject for recognition forms a relatively compact and complex manifold in a feature space. It has been pointed out that, in such a case, a pattern distribution tends to be concentrated in a subspace spanned by a smaller number of principal component vectors, so that sufficient accuracy may not be achieved.
In an algorithm (Unsupervised Sequential Projection Learning for Hashing, abbreviated as USPLH), disclosed in Non-Patent Literature 4, intended to resolve the problem, f( ) is an identity function, and the learning data are moved so that the centroid thereof coincides with a zero or origin point. An eigenvector is then found and data are projected thereon and subjected to thresholding at 0. A point r+ and a point r− which are close to 0 (see FIG. 1) are assigned different hash values, even though these data points are close to each other. Learning is made so that a point r+ and a point R+, which are of the same sign and which are respectively closer to and remoter from 0, are assigned the same hash value, and so that a point r− and a point R−, which are of the same sign and which are respectively closer to and remoter from 0, are assigned the same hash value (see FIG. 1). In USPLH, the parameter →wk is learned in accordance with the following algorithm (see Algorithm 2 of the Non-Patent Literature 4).
1. Learning data X and a binary code length (hashing code length) K are entered.
2. Initialize so that X0MC=φ, S0MC=0
3. Repeat the following 4 to 7 from k=1 to k=K.
4. Compute a corrected covariance matrix:Mk=Σi=0k−1λk−iXiMCiSMCiXMCiT+ηXXT 5. Extract a first principal component vector (eigenvector) →e of Mk to set it to →wk:{right arrow over (w)}k={right arrow over (e)}6. Produce a pseudo label from the projection →wk.
Sample XkMC and construct SkMC.
7. Compute a residual:X=X−{right arrow over (w)}k{right arrow over (w)}kTX 
The following describes a case wherein data points are projected on the one-dimensional axis. It is assumed that, with respect to{right arrow over (w)}kT{right arrow over (x)}=0(boundary of division by the one-dimensional axis), a point lying on the left side of the boundary is hk(→x)=−1 and a point on the right side of the boundary is hk(→x)=+1. Two points (→xi, →xj), (→xi∈r−, →xj∈r+), lying at sites in the left and right regions that are close to the boundary, with the boundary in-between, are assigned different hash bits, even though their projections on the one-dimensional axis are extremely close to each other. For the distances of the projections of →xi, and →xj on the projection vector|{right arrow over (w)}k({right arrow over (x)}i−{right arrow over (x)}j)|being not greater than ε, which is a preset positive number, the hash value h(→xi)=−1 and the hash value h(→xj)=1. Note that FIG. 1 is equivalent to FIG. 2 of the Non-Patent Literature 4.
On the other hand, two points (→xi and →xj), which are points (→xi∈r− and →xj∈R−) or points (→xi∈r+ and →xj∈R+), with R− and R+ lying in left and right regions far remote from the boundary, with the boundary in-between, are assigned the same hash bits even though their projections are far remote from each other|{right arrow over (w)}k({right arrow over (x)}i−{right arrow over (x)}j)|≧ξwhere ξ is a preset positive number). That is, the product of the hash values (→xi) and (→xj) is equal to 1.
To correct such boundary error, USPLH introduces a neighbor pair set M and a non-neighbor pair set C. The data point pair (→xi, →xj), included in the set M, are data points within r− and r+ which should be assigned the same hash bits. The data point pair (→xi, →xj), included in the set C, are data points within R− and within r− or data points within R+ and within r+, and should be assigned respective different hash bits. The following neighbor pair set M and non-neighbor pair set C are introduced:M={({right arrow over (x)}i,{right arrow over (x)}j)}:h({right arrow over (x)}i)·h({right arrow over (x)}j)=−1,|{right arrow over (w)}kT({right arrow over (x)}i−{right arrow over (x)}j)|≦εC={({right arrow over (x)}i,{right arrow over (x)}j)}:h({right arrow over (x)}i)·h({right arrow over (x)}j)=1,|{right arrow over (w)}kT({right arrow over (x)}i−{right arrow over (x)}j)|≦ξwhere ε<ξ.
A preset number of point pairs are sampled from each of the neighbor pair set M and the non-neighbor pair set C. XMC contains all points separated at least by one sample point pair. Using labeled pairs and XMC (m-number sampling), a pairwise label matrix SMC is found.S∈Rm×m Si,j=1((→xi,→xj)∈M)Si,j=−1((→xi,→xj)∈C) andSi,j=0 if otherwise.
That is,
for a point pair of (→xi,→xj)∈M, SkMC=1, and
for a point pair of (→xi,→xj)∈C, SkMC=−1
are assigned.
In the next iteration, the pseudo labels are made so that a point pair in the set M is assigned with the same hash value and a point pair in the set C is assigned with different hash values. By so doing, the error made by the previous hash function is corrected.
Each hash functions hk( ) generates a pseudo label set XkMC and the corresponding label matrix SkMC. The new label information is used to adjust the data covariance matrix in each iteration of sequential learning. To learn a new projection vector →W, all the pairwise label matrices since the beginning are used but their contribution decreases exponentially by a factor λ at each iteration.
The principal component direction corrected by a residual error is found. However, since there exist no pseudo labels at the beginning, the first vector →wi is the principal direction of the data. Each hash function is learned to satisfy the pseudo labels iteratively by adjusting the data covariance matrix. It is seen that the above mentioned USPLH algorithm represents a technique that finds the principal component directions corrected by the residual error.
Patent Literature 1 discloses, for a near or nearest search method that uses a hash function, a technique which searches the nearest pattern with a low error ratio at a high speed. In this technique, a set of learning patterns is assumed to be a normal distribution (Gaussian distribution), and a cumulative probability distribution on an arbitrary axis on the learning pattern is approximated by a sigmoid function (Psd=1/{1−exp(−(x−μ)/a)}, where μ is an average and a is a standard deviation, using e.g., the least square approximation. A plurality of hash functions partitioning the values of the probability at a constant interval based on cumulative probability distribution is defined. A sum of subsets in a spatial region (packet), obtained by partitioning by the hash functions, is found from output values of the hash functions that input an unknown pattern. The nearest pattern is searched from the resulting sum of sets.
Non-Patent Literature 5 discloses a system of biometric authentication in which a template for authentication, present in a database, is masked (by taking bitwise exclusive OR (XOR)) with a random BCH (Bose-Chaudhuri-Hocquenghem) code word C to protect the biometric information. Reference is made to an Example stated hereinbelow. The above mentioned binary hashing technique may be applied for this system because the template for authentication needs to be fixed-length binary data.    [Patent Literature 1] JP2009-20769A    [Non-Patent Literature 1] Mayur Datar, Nicole Immorlica, Piotr Indyk and Vahab S. Mirrokni, “Locality-Sensitive Hashing Scheme Based on p-Stable Distribution”, Proc. Symposium on Computational Geometry, pp. 253-262, 2004    [Non-Patent Literature 2] Maxim Raginsky and Svetlana Lazebnik, “Locality-Sensitive Binary Codes from Shift-Invariant Kernels”, NIPS Vol. 22, 2010    [Non-Patent Literature 3] Yair Weiss, Antonio Torralba and Rob Fergus, “Spectral Hashing”, NIPS 2008    [Non-Patent Literature 4] Jun Wang, Sanjib Kumar and Shih-Fu Chang, “Sequential Projection Learning for Hashing with Compact Codes”, Proc. of the 27th ICML 2010    [Non-Patent Literature 5] Pim Tuyls, Anton H. M. Akkermans, Tom A. M. Kavenaar, Geert-Jan Schrijen, Asker M. Basen and Raymond N. J. Veldhuis, “Practical Biometric Authentication with Template Protection”, Proceedings of AVBPA 2005, Lecture Notes in Computer Science, Vol. 3546, Springer Verlag, pp. 436-446, (2005)