In recent years, with the development and growth of techniques related to computers and networks, the importance of data related to personal attributes and behaviors (personal data) and confidential data on organizations, such as companies, is increasing. Using the personal data and the confidential data through computations or analysis makes it possible to obtain unprecedented new knowledge or realize new functions, but the privacy of individuals and confidential information on organizations are at risk. Therefore, a preserving technique that enables utilization of the personal data and the confidential data in a protected (encrypted) state is attracting attention. As the preserving technique that enables utilization of data in an encrypted state, a homomorphic encryption scheme is known. The homomorphic encryption scheme is one of public-key encryption schemes using a pair of different keys in encryption and decryption, and is cryptography with a function that enables operations of data in an encrypted state.
The homomorphic encryption scheme allows operations corresponding to additions and multiplications on two or more ciphertexts, thereby enabling to obtain ciphertexts corresponding to results of calculations such as additions and multiplications performed on original plain texts without decrypting the ciphertexts. Furthermore, the homomorphic encryption scheme includes a fully homomorphic encryption scheme that allows additions and multiplications to be performed any number of times. The fully homomorphic encryption scheme allows operations, such as exclusive OR, logical AND, and negation, thereby enabling calculations by various logical circuits. However, the fully homomorphic encryption scheme needs a huge amount of time for processing, such as encryption, decryption, or secure computations, and a huge size of ciphertexts, and may be impractical in terms of performance. Therefore, a more practical somewhat homomorphic encryption scheme has been proposed.
The somewhat homomorphic encryption scheme enables secure inner-product calculations and secure distance calculations on vector data at a higher speed and a smaller size than those of schemes before improvement. Therefore, the somewhat homomorphic encryption scheme can be utilized for biometric authentication to compare pieces of data derived from biometric objects or for a tag search system to search for objects commonly present in tags from among a large number of tags.
[Patent Literature 1] Japanese Laid-open Patent Publication No. 2014-126865
However, in the homomorphic encryption scheme that enables a calculation of a Hamming distance in an encrypted state, there is a problem in that it is difficult to easily detect a spoofing attack.
Specifically, the somewhat homomorphic encryption scheme will be described as an example. First, three key generation parameters (n, q, t) are mainly prepared to generate an encryption key. n is an integer raised to the power of 2 and is called a lattice dimension, q is a prime number, and t is an integer less than the prime number q. In the flow to generate the encryption key, first, an n-dimensional polynomial sk with small coefficients is randomly generated as a secret key (the smallness of each of the coefficients is limited by a certain parameter σ). Subsequently, an n-dimensional polynomial a1 with coefficients each being less than q and an n-dimensional polynomial e with extremely small coefficients are randomly generated.
Then, a0=−(a1*sk+t*e) is calculated, and a set (a0, a1) is defined as a public key pk. In the calculation of the polynomial a0, calculations are performed such that xn=−1, xn+1=−x, . . . for a polynomial of degree n or greater to continuously calculate a polynomial of degree less than n. Furthermore, as for the coefficients of the polynomial, the coefficients are divided by the prime number q and the remainders are output. In general, a space in which the above-described calculation is performed is mathematically represented by Rq:=Fq[x]/(xn+1).
Subsequently, with respect to plain text data m represented by a polynomial of degree n with coefficients each being less than t and the public key pk=(a0, a1), three polynomials u, f, and g of degree n with extremely small coefficients are randomly generated, and encrypted data Enc(m, pk)=(c0, c1) of the plain text data m is defined as follows. As for (c0, c1), c0=a0*u+t*g+m and c1=a1*u+t*f are calculated. These calculations are performed by using operations in the space Rq.
Thereafter, with respect to two ciphertexts Enc (m1, pk)=(c0, c1) and Enc(m2, pk)=(d0, d1), a ciphertext addition Enc (m1, pk)+Enc(m2, pk) is calculated as (c0+d0, c1+d1) and a ciphertext multiplication Enc (m1, pk)*Enc (m2, pk) is calculated as (c0*d0, c0*d1+c1*d0, c1*d1). When the ciphertexts are multiplied as described above, it is noted that the data size of the ciphertext changes from a two-component vector to a three-component vector.
Furthermore, in a decryption process, Dec(c, sk)=[c0+c1*sk+c2*sk2+ . . . ]q mod t is calculated by using the secret key sk with respect to a ciphertext c=(c0, c1, c2, . . . ) (here, it is assumed that the number of components of ciphertext data increases due to encryption operations, such as a plurality of ciphertext multiplications), so that decryption is performed. Incidentally, [f(x)]q mod t with respect to a polynomial f(x) means a polynomial in which each coefficient zi in f(x) is substituted with [zi]q mod t. Furthermore, a value of [z]q with respect to an integer z is set to w when w<q/2 and set to w−q when w≧q/2, where w is a remainder when the integer z is divided by q. Namely, the value of [z]q is in a range of [−q/2, q/2). Furthermore, “a mod t” means a remainder when an integer a is divided by t.
A numerical example will be given below for ease of understanding.Secret key sk=Mod(Mod(4,1033)*x3+Mod(4,1033)*x2+Mod(1,1033)*x,x4+1);Public key pk=(a0,a1)a0=Mod(Mod(885,1033)*x3+Mod(519,1033)*x2+Mod(621,1033)*x+Mod(327,1033),x4+1)a1=Mod(Mod(661,1033)*x3+Mod(625,1033)*x2+Mod(861,1033)*x+Mod(311,1033),x4+1);Enc(m,pk)=(c0,c1);Ciphertext with respect to plain text data m=3+2x+2x2+2x3 c0=Mod(Mod(822,1033)*x3+Mod(1016,1033)*x2+Mod(292,1033)*x+Mod(243,1033),x4+1); andc1=Mod(Mod(840,1033)*x3+Mod(275,1033)*x2+Mod(628,1033)*x+Mod(911,1033),x4+1)
Incidentally, in the above-described values, the key generation parameters are set such that (n, q, t)=(4, 1033, 20). Furthermore, Mod(a, q) means a remainder when the integer a is divided by the prime number q, and Mod(f(x), x4+1) means a remainder polynomial when the polynomial f(x) is divided by a polynomial x4+1. Moreover, it is indicated that Mod(f(x), x4+1) is the same for each of a set of f(x)=x4 and f(x)=−1, a set of f(x)=x5 and f(x)=−x, . . . .
The above-described somewhat homomorphic encryption scheme is superior to the fully homomorphic encryption scheme in terms of the above-described performance, but when vector data (a plurality of data sequences) needs to be processed at once, a huge amount of processing time and a huge size of ciphertexts are still needed and it may be difficult to realize practically adequate performance in some cases.
To deal with a case in which the vector data needs to be processed at once, a somewhat homomorphic encryption scheme with improved performance has been proposed, in which vector data is represented by a single value by polynomial transformation and homomorphic encryption is performed on the single value to greatly improve the performance, such as a processing time and a size of ciphertexts.
In this somewhat homomorphic encryption scheme, two pieces of d-dimensional vector data A=(a1, a2, . . . , ad) and B=(b1, b2, . . . , bd) are used as input data. Furthermore, to calculate a distance between the two pieces of the vector data in an encrypted state at a high speed, two types of polynomial transformation such as ascending-order transformation and descending-order transformation are used.
Specifically, the ascending-order transformation and the descending-order transformation are represented as follows.[Ascending-order transformation]A=(a1,a2, . . . ,ad)=>pm1(A)=a1+a2x+a3x2+ . . . +adxd-1 [Descending-order transformation]B=(b1,b2, . . . ,bd)=>pm2(B)=b1xd+b2xd-1+ . . . +bdx 
When an encryption process using homomorphic encryption E is performed on the above-described A and B, results are represented as follows.E1(A)=E(pm1(A))E2(B)=E(pm2(B))
By using the property of the homomorphic encryption for the two ciphertexts, it is possible to calculate the inner product of the vectors A and B at a high speed through a homomorphic encryption multiplication process of E(D)=E(pm1(A)))×E(pm2(B)). Specifically, when a decryption operator provided with a secret key decrypts a secure calculation result E(D), D obtained by the decryption is equal to a polynomial obtained by pm1(A)×pm2(B); therefore, the inner product a1b1+a2b2+ . . . +adbd of A and B can be obtained from a coefficient of a term xd.
For example, if A and B are binary vectors (that is, vectors in which all of elements ai and bj is either zero or one), a Hamming distance DH between A and B can be implemented by using the above-described property in the state of being subjected to the homomorphic encryption as described below.E(DH)=E1(A)×E2(C)+E1(C)×E2(B)−2*E1(A)×E2(B)where a vector C is assumed as a vector in which all of elements is one, that is, C=(1, 1, . . . , 1).
Incidentally, the homomorphic calculation may be performed by using any of two types of homomorphic encryption schemes based on ideal lattices and ring-LWE. Furthermore, as a security requirement, if it is acceptable to allow a decryption operator provided with the secret key to know all of values of A and B, the inner product and the distance are calculated by directly using the values of A and B; therefore, a requirement, in which the decryption operator is allowed to know results of calculations using A and B but is not allowed to know the values of A and B, may be needed in some cases.
In the above-described homomorphic encryption scheme, a spoofing attack by data generally called wolf, which is data that enables spoofing with high probability with respect to an arbitrary user, is possible. In biometric authentication, biometric information includes, as an error, a difference in an environment or position in which the biometric information is read. Therefore, even for the identical person, biometric information used for registration (template) and biometric information used for authentication do not completely match with each other. Therefore, a system is configured to allow an error to some extent at the time authentication.
In the system that allows an error as described above, when biometric authentication is performed and if an error is in the acceptable range, the authentication may be successfully performed even with biometric information on other person even though the person is not the identical person (false acceptance). The wolf indicates data that intentionally cause the false acceptance phenomenon with high probability.
For example, in a biometric authentication system, if a binary vector is used as biometric information, an attacker who intends to perform spoofing uses an unexpected vector (for example, a vector other than the binary vector) as a vector for matching. By using the unexpected vector for matching, the attacker can successfully perform spoofing with the probability of ½ in the first attempt and successfully perform spoofing in at most two attempts.
A method of attack by the attacker will be described in detail below. First, a vector for registration is assumed as A=(a1, a2, . . . , ad). Here, ai is either zero or one. The attacker generates a vector for attack such that B=(b1, b2, . . . , bd)=(α, 0, 0, . . . , 0).
Incidentally, α is not limited to the first element but may be stored in any of the elements of B, and the other elements are zero. In the following, an example will be described in which α is stored in a first element b1.
The attacker sets α=(d/2−θ+1) in the first attempt. Here, θ is a threshold. The attacker generates E2(B) by using a public key of a user, and transmits E2(B) to a calculator who performs matching with a template. Incidentally, a binary vector is used as the biometric information in the biometric authentication system, and values of 0, 1, . . . , t−1 are acceptable as the value of a1 in an encryption system; therefore, the attacker can perform an encryption process.
The calculator calculates a secure distance E(DH) by using E2(B) and E1(A) that is registered in advance. In this example, for ease of understanding of the method of attack, a coefficient of xd (a Hamming distance between A and B), which is used for final determination among coefficients in DH, is focused on. Assuming that the coefficient is denoted by D, the calculation is performed by Expression below.D=Σai+Σbi−2Σai×bi 
Furthermore, by substituting B=(α, 0, 0, . . . , 0), the following is obtained.D=Σai+(d/2−θ+1)(1−2a1)
In general, a Hamming weight (the number of elements with values of 1) of the biometric information represented by the binary vector follows a binomial distribution, and the Hamming weight becomes about d/2 with extremely high probability. Therefore, assuming that Σai=d/2, D is calculated as follows.D=θ−1 when a1=1D=d−θ+1 when a1=0
From the above expressions, when a1=1, D is always smaller than the threshold, so that the attacker can successfully perform spoofing. In contrast, when a1=0, it is needed to satisfy θ>(d+1)/2 to obtain D smaller than the threshold.
As described above, in the biometric authentication, the system is configured to allow an error to some extent between the biometric information (template) registered in advance and the biometric information (matching data) to be read at the time of matching. However, if a threshold as a reference to allow the error is too small, a phenomenon occurs in which authentication fails even for the identical person (false rejection), and, if the threshold is large, false acceptance occurs. Therefore, as an appropriate threshold, a threshold with which these phenomena occur with the same probability is used. For example, as for IrisCode as biometric information on an iris, it has been reported that the threshold becomes 0.3 d to 0.4 d. Based on this fact, the condition of θ>(d+1)/2 is not satisfied, so that spoofing fails when a1=1.
When the attacker fails to perform spoofing, the attacker performs the same process as in the first attempt by setting α=−(d/2−θ+1) as a next (second) attempt. In the second attempt, D=θ−1, which is smaller than the threshold, so that the spoofing is successfully performed. As described above, the attacker can successfully perform spoofing in at most two attempts by using the unexpected vector.
At this time, on the calculator side, because the vector for matching transmitted by the attacker as the unexpected vector is subjected to the homomorphic encryption, it is not easy to detect that the vector is false data.