Many documents and databases are nowadays stored in outsourced data storage servers. This trend is stressed further with the cloudification of many services and companies' information systems. While such outsourced databases are cheap to maintain and efficient to retrieve up-to-date information they can pose significant risks from a security or privacy perspective. Some solutions have been proposed so as to secure outsourced data storage on multiple clouds but they do not tackle the issue of privacy. Indeed by accessing a database the entity storing the database learns the queries of the queriers or users which might be a problem if the user's intentions are to be kept secret. For example, investors querying a stock-market database for the current market value of certain stocks might prefer not to reveal their interest in the stocks because it could inadvertently influence their price. Alternatively, companies might want to search for certain patents without revealing the patents' identities.
Private information retrieval—PIR—schemes are cryptographic protocols that allow clients to retrieve records from outsourced databases or clouds while completely hiding the identity of the retrieved records from database owners. There are two flavors of PIR either information theoretic PIR or Computational PIR. The latter provides a weaker security guarantee and we thus focus on information theoretic PIR which provides absolute guarantee that the servers get no information about what the user wants. Information-theoretic PIR is only possible in a multi-server setting with some replication across these servers. There is an extensive body of work that provides solutions to this problem based on coding theory and more particularly locally decodable codes or LDCs. However current solutions require a lot of replication. Basically if/is the locality of the LDC then the whole database needs to be encoded first which results in a first factor of increase of the size of the database and then the whole encoded database needs to be replicated/times on/different servers.
Suppose we have a database composed of n elements for example E1, . . . , En. Current solutions encode the database in a codeword of N symbols S1, . . . , SN with N>n and then store I copies of the encoded database at I different servers where I is the locality of the code. To retrieve a symbol Si without revealing it, the user has to retrieve one symbol from each server, none of them being Si. The user thus gets I symbols and by local decoding can compute Si. The symbols retrieved are not random, they are chosen according to a decoding algorithm, which indicates that one has to retrieve symbols Sj1, . . . , SjI in order to be able to compute Si. To make sure that the given symbols can always be retrieved, the existing protocols store all symbols on all servers, which means that the whole codeword of length N is replicated I times.
The technical problem to solve is to allow storage of a collection of documents or of databases in a plurality of outsourced storage entities or clouds while offering the possibility for users to query a piece of information from the database without revealing their query and in particular without revealing the piece of information to the servers or clouds, and while reducing the storage overhead of current solutions.
This goal is achieved thanks to the invention by means of the subject matter of claim 1.