If an index, which is data generated for a data set, is used, data including desired partial information can be extracted from a source data set. By using a document as the data and a word as the partial information, keyword search on a database can be performed.
Even when the document is encrypted in order to prevent information leakage, if the index is not encrypted, the keyword search function is not affected. However, the index includes information about the corresponding document. Thus, unless an index obtained after countermeasures against information leakage are taken is used, information leakage cannot be prevented.
Non-Patent Literatures (NPLs) 1 and 2 disclose methods for generating indexes having resistance to information leakage. The method disclosed in NPL 1 is more efficient than that disclosed in NPL 2 in calculation amount and memory capacity. Thus, hereinafter, the method disclosed in NPL 1 will be described. The methods disclosed in these literatures use a technique using Bloom filters disclosed in NPL 3. Since Bloom filters can also be used in the present invention, first, Bloom filters will be described.
“Bloom filters”
A Bloom filter is a bit string that is generated by inputting a value set. A Bloom filter can be used for efficiently determining whether an element is included in a set. Herein, processing relating to Bloom filters will be described by using two functions of a function Gen and a function Check. The function Gen receives a value set {w_1, . . . w_n} and outputs a bit string. The function Check receives a value w_i and a bit string and determines whether the value w_i is included in a set corresponding to the bit string.
The function Gen for the set {w_1, . . . , w_n} uses filter functions F, each of which receives an element w_i in the set and outputs a bit string. In addition, the function Gen outputs a logical OR of bit strings obtained by inputting the values w_1 to w_n to the respective filter functions F.
FIG. 1 illustrates a Bloom filter generation method. Processing of the function Gen performed when a set {w_1,w_2, w_3} is input will be described as an example with reference to FIG. 1. As illustrated in FIG. 1, when F(w_1) is 01001001, F(w_2) is 00010010, and F(w_3) is 10000101, a Bloom filter 11011111 corresponding to the set {w_1,w_2, w_3} can be obtained by calculating a logical OR of corresponding bits of the three bit strings.
The function Check for a filter f and a set element w uses a filter function F and determines whether 1 is represented in the filter f at all the positions corresponding to the positions at which 1 is represented in F(w). If all the corresponding positions represent 1, the function Check outputs 1. Otherwise, the function Check outputs 0. Herein, the function Check outputs 1 or 0. If the element w_i is included in a document d, the function Check outputs 1. If the element w_i is not included in the document d, the function Check outputs 0.
FIG. 2 illustrates a Bloom filter determination method. An example where the Bloom filter 11011111 generated for the set {w_1, w_2, w_3} in FIG. 1 and the element w_2 are input to the function Check and an example where a Bloom filter 10001101 generated for a set {w_1, w_3} and the element w_2 are input to the function Check will be described with reference to FIG. 2.
As illustrated in (a) of FIG. 2, if the Bloom filter is 11011111 and F(w_2) is 00010010, both the 4th bit and the 7th bit represent 1 (matched). Thus, the function Check outputs 1.
In contrast, as illustrated in (b) of FIG. 2, if the Bloom filter is 11001101 and F(w_2) is 00010010, neither the 4th bit nor the 7th bit is 1 (not matched). Thus, the function Check outputs 0.
With the function Check, it is only necessary to calculate filter values for a single word and compare the values with an input Bloom filter. Thus, this processing is more efficient than processing in which each element in a set is examined to determine whether the element is w_i.
It is known that Bloom filters have the following property.
“Property 1”
The function Check could output 1 for a Bloom filter calculated for a set that does not include an input element w_i. However, the function Check always outputs 1 when the element w_i is included in a set.
NPL 3 discloses a method for selecting a good filter function F with which the function Check less outputs 1 by mistake.
If “property 1” is used, by deeming a document as a word set and inputting a word set, a Bloom filter can be generated. By associating each document with a Bloom filter and storing the associated document and Bloom filter, keyword search on a document can be performd more efficiently.
By causing the function Check to determine whether a Bloom filter corresponding to each document includes a keyword, a document corresponding to a Bloom filter for which the function Check outputs 1 is extracted. In this way, there is no need to directly determine whether each document includes a keyword.
In addition, Bloom filters have the following property.
“Property 2”
By obtaining a logical OR of a Bloom filter f_{S_1} generated for a set S_1 and a Bloom filter f_{S_2} generated for a set S_2, a Bloom filter for a sum set of the set S_1 and the set S_2 can be obtained. Thus, by using documents as sets and words as elements, a Bloom filter obtained by a logical OR is a Bloom filter for a document that can be represented by a sum set of words included in both of the documents.
“Property 2” is attributable to use of the same filter function F for different documents. Hereinafter, a logical OR of Bloom filters for a document D_1 and a document D_2 will be described as a Bloom filter for the documents D_1 and D_2. If subscripts for documents D_1 to D_4 are consecutive, a logical OR of Bloom filters for these documents will be described as a Bloom filter for documents D_1, . . . D_4.
Next, a logical OR operation will be described based on a simple example. A logical OR operation is performed as follows.    Bloom filter A: 010001    Bloom filter B: 010100    A∘B: 010101
In the following, an operation of obtaining a logical OR of the Bloom filter f_{S_1} and the Bloom filter f_{S_2} will be represented as f_{S_1}∘f_{S_2}.
NPL 1:
    Eu-Jin Goh, “Secure Indexes,” May 5, 2004 [online], [searched on Jun. 21, 2011], Internet <URL:http://crypto.stanford.edu/{tilde over ( )} eujin/papers/secureindex/secureindex.pdf>NPL 2:    Y.-C.Chang and M.Mitzenmacher, “Privacy Preserving Keyword Searches on Remote Encrypted Data,” Cryptology ePrint Archive, Report 2004/051, February 2004. [online], [searched on Jun. 21, 2011], Internet <URL:http://eprintiacr.org/2004/051.pdf>NPL 3:    B.Bloom, “Space/Time Trade-offs in Hash Coding with Allowable Errors,” Communications of the ACM, vol.13, No.7, pp.422-426, July 1970.