Consider, the following scenario with two fictitious parties Alice and Bob. A party Bob owns a database D consisting of m data elements. A user Alice wishes to access this database, and establishes an agreement with Bob whereby she can achieve such access. However, for privacy reasons, Alice does not want Bob to know what items she is querying in the database. Naturally, one can imagine a number of scenarios in which database privacy is desired.
The problem area to address the above concerns is known in the art and referred to herein as Private Information Retrieval (PIR). When concerned with maintaining database privacy (for example, by preventing the user from learning any more information than it should), the problem area is sometimes referred in the art and herein as either Symmetric Private Information Retrieval (SPIR) or as Oblivious Transfer.
One trivial scheme for achieving the goal of privacy is for the database owner (in this case Bob) to send the entire database to Alice. If the database contains m bits, then the total communication complexity is O(m), where this notation for purposes herein means a m+b, where a and b are numbers. Alice can make any query, and Bob will trivially have no information about Alice's query. Of course, this solution is completely impractical for even a moderately-sized database. Additionally, this type of scheme does not satisfy the need for maintaining database privacy.
Some schemes require total communication that is super logarithmic in the size of the database or in other words, O((log2m)d), where m is the number of items in the database and d is greater than 1. The best known theoretical lower bound for total communication in such a scenario is O(log2m).
Chor, Kushilevitz, Goldreich, and Sudan in Private Information Retrieval, Journal of the ACM, 45, 1998 (earlier version in FOCS 95) considered the information-theoretic case wherein the security analysis requires no computational assumptions. For this case, they show that if only a single database is used, then m bits must be communicated. On the other hand, if several replicas of identical databases are used (subject to the restriction that these databases do not communicate with each other), then one can achieve a scheme that does not require transmitting m bits. They determined that there is a 2-database private information retrieval scheme with communication complexity O(m1/3) subject to the restriction that the databases do no communicate with one another, and for any constant k≧2, there is a k-database private information retrieval scheme with communication complexity O(m1/k) subject to the restriction that the databases do not communicate with one another.
Ambainis in Upper Bound on the communication complexity of private information retrieval, in Proc. of the 24th ICALP, 1997, showed that for any constant k≧2, there is a k-database private information retrieval scheme with communication complexity
      O    (          m              1                              2            ⁢            k                    -          1                      )    ,subject to the restriction that the databases do not communicate with one another, and for k=θ(log m), there is a θ(log m)-database private information retrieval scheme with communication complexity O(log2m−log log m), again subject to the restriction that the databases do not communicate with one another.
Chor and N. Gilboa, in Computationally Private Information Retrieval, Proceedings of 29th STOC, pp. 304-313, 1997, show that for every ε>0, there is a 2-database private information retrieval scheme with communication complexity O(mε). Their scheme requires the existence of pseudo-random generators. It is well known in the art that such generators can be constructed if one-way functions exist.
E. Kushilevitz and R. Ostrovsky, in Replication is not needed: single database, computationally private information retrieval, in Proceedings of FOC '97, pp. 364-373, used a computational intractability assumption to achieve a single database (i.e., k=1) private information retrieval scheme whose communication complexity is less than m. Under the well known Quadratic Residuocity assumption, they demonstrated that for any ε>0, there a single-database computational private information retrieval scheme whose communication complexity is O(mε). To construct such a scheme, they first demonstrated a basic scheme with communication complexity O((2√{square root over (m)}+1)−k) where k is a security parameter. Under the assumption that k=mc for some constant c, the resulting scheme achieves communication complexity O(m1/2+c). Next, Kushilevitz and Ostrovsky demonstrated that if one of the steps in this scheme could itself be replaced by a single-database computational private information retrieval protocol, then the resulting communication complexity would be lower. Using this idea, they proposed a recursive scheme whose communication complexity is
  O  (            m              1                  L          +          1                      ·          (                        m          L                +                  L          ·          k                    )        )where L is the number of levels of recursion. By making an assumption that the security parameter is k=mc for some constant c, and setting L+O(√{square root over (log m/log k)}), the communication complexity is nO(√{square root over (c)}).
Subsequently, Cachin, Micali, and Stadler in Computational Private Information Retrieval with Polylogarithmic Communication, in Proc. of Eurocrypt 1999, LNCS, pages 402-414. Springer-Verlag, 1999, showed how to construct a single-database computational private information retrieval scheme for which the communication complexity is polylogarithmic in the size of the database; i.e., O(logdm), where d is a constant greater than 1. For the recommended parameters in their scheme, d=6, which makes the actual total communication complexity O(log6m). The Cachin-Micali-Stadler scheme is based on two computational intractability assumptions. The first assumption is the Φ-hiding assumption, which states, roughly, that given a composite integer n and a small prime p, it is hard to determine whether p divides Ø(n) with probability non-negligibly better than ½. The second assumption is the Φ-sampling assumption, which states, roughly, that it one can efficiently find a random composite n such that p divides Φ(n).
In order for the user to obtain the ith bit of an m-bit database, he must at least send some encoding of i. Thus, in any scheme O(log m) bits have to be communicated. However, there is still a gap between the Cachin-Micali-Stadler scheme (which has complexity O(log6 m), and the theoretical lower bound of O(log m).
Chang in Single-Database Private Information Retrieval with Logarithmic Communication, in Proc. of 9th Australasian Conference on Information Security and Privacy (ACISP 2004), Sydney, Australia, Lecture Notes in Computer Science, Springer Verlag, demonstrated the first single database computational private information retrieval scheme for which the server side communication complexity is O(log m). The scheme utilizes Paillier's cryptosystem as a building block and thus is secure as long as that cryptosystem is. The Paillier cryptosystem, in turn, can be shown to be secure assuming the composite residuosity assumption, which is an extension of the Quadratic-Residuosity assumption (which is the same assumption used in the Kushilevitz-Ostrovsky scheme described above). Roughly speaking, the composite residuosity assumption states that it is computationally intractable to decide whether a random element in (Z/n2Z)* has an nth root modulo n2. Chang's scheme is a special case of a scheme due to Julian Stern that demonstrated how to construct single database private information retrieval schemes from almost any semantically secure additive homomorphic encryption scheme, of which the Paillier cryptosystem is one such example. However, the user-side communication complexity of Chang's scheme is O(mε·log m), which means that the total communication complexity is O(mε·log m).
Thus, from the perspective of overall communication complexity, the Cachin-Micali-Stadler scheme is better. Nonetheless, there was still a significant gap between the O(log6m) complexity of this scheme and the theoretical lower bound of O(log m).