3.1 Field of the Invention
The present patent document relates generally to verification of data returned in a search query and more particularly to a method and system for verifying the results of a search query performed on a finite set of data stored on an untrusted server.
3.2 Background of the Related Art
Providing integrity guarantees in third-party data management settings is an active area of research, especially in view of the growth in usage of cloud computing. In such settings, verifying the correctness of outsourced computations performed over remotely stored data becomes a crucial property for the trustworthiness of cloud services. Such a verification process should incur minimal overheads to the clients or otherwise the benefits of computation outsourcing are dismissed; ideally, computations should be verified without having to locally rerun them or to utilize too much extra cloud storage.
In this paper, we study the verification of outsourced operations on general sets and consider the following problem. Assuming that a dynamic collection of m sets S1, S2, . . . , Sm is remotely stored at an untrusted server, we wish to publicly verify basic operations on these sets, such as intersection, union and set difference. For example, for an intersection query of t sets specified by indices 1≦ii, i2, . . . , it≦m, we aim at designing techniques that allow any client to cryptographically check the correctness of the returned answer 1=Si1∩Si2∩ . . . ∩Sit. Moreover, we wish the verification of any set operation to be operation-sensitive, meaning that the resulting complexity depends only on the (description and outcome of the) operation, and not on the sizes of the involved sets. That is, if δ=|l| is the answer size then we would like the verification cost to be proportional to t+δ, and independent of m or Σi|Si|; note that work at least proportional to t+δ is needed to verify any such query's answer. Applications of interest include keyword search and database queries, which boil down to set operations.
Relation to verifiable computing. Recent works on verifiable computing [1 12, 16] achieve operation-sensitive verification of general functionalities, thus covering set operations as a special case. Although such approaches clearly meet our goal with respect to optimal verifiability, they are inherently inadequate to meet our other goals with respect to public verifiability and dynamic updates, both important properties in the context of outsourced data querying. Indeed, to outsource the computation as an encrypted circuit, the works in [1, 12, 16] make use of some secret information which is also used by the verification algorithm, thus effectively supporting only one verifier; instead, we seek for schemes that allow any client (knowing only a public key) to query the set collection and verify the returned results. Also, the description of the circuit in these works is fixed at the initialization of the scheme, thus effectively supporting no updates in the outsourced data; instead, we seek for schemes that are dynamic. In other scenarios, but still in the secret-key setting, protocols for general functionalities and polynomial evaluation have recently been proposed in [1] and [6] respectively.
Aiming at both publicly verifiable and dynamic solutions, we study set-operation verification in the model of authenticated data structures (ADSs). A typical setting in this model, usually referred to as the three-party model [36], involves protocols executed by three participating entities. A trusted party, called source, owns a data structure (here, a collection of sets) that is replicated along with some cryptographic information to one or more untrusted parties, called servers. Accordingly, clients issue data-structure queries to the servers and are able to verify the correctness of the returned answers, based only on knowledge of public information which includes a public key and a digest produced by the source (e.g., the root hash of a Merkle tree, see FIG. 10).1 Updates on the data structure are performed by the source and appropriately propagated by the servers. Variations of this model include: (i) a two-party variant (e.g., [30]), where the source keeps only a small state (i.e., only a digest) and performs both the updates/queries and the verifications—this model is directly comparable to the model of verifiable computing; (ii) the memory checking model [7], where read/write operations on an array of memory cells is verified—however, the absence of the notion of proof computation in memory checking (the server is just a storage device) as well as the feature of public verifiability in authenticated data structures make the two models fundamentally different.2 1Conveying the trust clients have in the source, the authentic digest is assumed to be publicly available; in practice, a time-stamped and digitally signed digest is outsourced to the server.2Indeed, memory checking might require secret memory, e.g., as in the PRF construction in [7].
Achieving operation-sensitive verification. In this work, we design authenticated data structures for the verification of set operations in an operation-sensitive manner, where the proof and verification complexity depends only on the description and outcome of the operation and not on the size of the involved sets. Conceptually, this property is similar to the property of super-efficient verification that has been studied in certifying algorithms [21] and certification data structures [19, 37], which is achieved as well as in the context of verifiable computing [1, 12, 16], where an answer can be verified with complexity asymptotically less than the complexity required to produce it. Whether the above optimality property is achievable for set operations (while keeping storage linear) was posed as an open problem in [23]. We close this problem in the affirmative.
All existing schemes for set-operation verification fall into the following two rather straightforward and highly inefficient solutions. Either short proofs for the answer of every possible set-operation query are precomputed allowing for optimal verification at the client at the cost of exponential storage and update overheads at the source and the server—an undesirable trade-off, as it is against storage outsourcing. Or integrity proofs for all the elements of the sets involved in the query are given to the client who locally verifies the query result: in this case the verification complexity can be linear in the problem size—an undesirable feature, as it is against computation outsourcing.