1. Field of the Invention
Embodiments herein provide a method, system, etc. for a sovereign information sharing service.
2. Description of the Related Art
Within this application several publications are referenced by arabic numerals within parentheses. Full citations for these, and other, publications may be found at the end of the specification immediately preceding the claims. The disclosures of all these publications in their entireties are hereby expressly incorporated by reference into the present application for the purposes of indicating the background of the present invention and illustrating the state of the art.
Conventional information integration approaches, as exemplified by centralized data warehouses and mediator-based data federations, assume that the data in each database can be revealed completely to the other databases. Consequently, information sharing across autonomous entities is inhibited due to confidentiality and privacy concerns. The goal of sovereign information sharing [2, 3, 8] is to enable such sharing by allowing queries to be computed across sovereign databases such that nothing apart from the result is revealed. The computation of join of sovereign databases in such a manner is referred to as sovereign join. Two motivating applications of sovereign joins are cited below [3].
First, for national security, it might be necessary to check if any of the airline passengers is on the watch list of a federal agency [21]. Sovereign join may be used to find only those passengers who are on the list, without obtaining information about all the passengers from the airline or revealing the watch list.
Second, in epidemiological research, it might be of interest to ascertain whether there is a correlation between a reaction to a drug and some DNA sequence, which may require joining DNA information from a gene bank with patient records from various hospitals. However, a hospital disclosing patient information could be in violation of privacy protection laws, and it may be desirable to access only the matching sequences from the gene bank.
A system offering sovereign join services has the following desirable attributes. First, the system should be able to handle general joins involving arbitrary predicates. The national security application cited above requires a fuzzy match on profiles. Similarly, the patient records spread across hospitals may require complex matching in the healthcare application.
Second, the system should be able to handle multi-party joins. The recipient of the join result can be a party different from one of the data providers.
Next, the recipient should only be able to learn the result of the join computation. No other party should be able to learn the result values or the data values in someone else's input. Lastly, the system should be provably secure. The trusted component should be small, simple, and isolated [4].
A secure network service is provided for sovereign information sharing who's only trusted component is a secure coprocessor [15, 26, 32]. The technical challenge in implementing such a service arises from the following. First, secure coprocessors have limited capabilities. They rely on the server to which they are attached for disk storage or communication with other machines. They also have small memory. Second, while the internal state of a computation within the secure coprocessor cannot be seen from outside, the interactions between the server and the secure coprocessor can be observed.
Simply encrypting communication between the data providers and the secure processor is, therefore, insufficient. The join computation needs to be carefully orchestrated such that the read and write accesses made by the secure coprocessor cannot be exploited to make unwanted inferences.
Careful orchestration of join computation in the face of limited memory has been a staple of database research for a long time. The goal in the past, however, has been the minimization of input/output (I/O) to maximize performance. While the I/O minimization is still important, avoiding leakage through patterns in I/O accesses now becomes paramount.
In principle, sovereign information sharing can be implemented by using techniques for secure function evaluation (SFE) [13, 31]. Given two parties with inputs x and y respectively, SFE computes a function ƒ(x, y) such that the parties learn only the result. SFE techniques are considered to have mostly theoretic significance and have been rarely applied in practice, although some effort is afoot to change the situation [22].
To avoid the high cost of SFE one approach taken in [3] was to develop specialized protocols for intersection, inter-section size, equijoin, and equijoin in size. Similar protocols for intersection have been proposed in [8, 16]. A new intersection protocol has been recently proposed in [10]. However, the protocols provided in [3] have the following shortcomings. First, it is not clear how to extend them to operations involving general predicates as they are hash-based. Second, they leak information. For example, the equijoin size protocol leaks the distribution of duplicates; if no two values have the same number of duplicates, it can also leak the intersection.
Secure coprocessors have been earlier used in a variety of applications, including secure e-commerce [33], auditable digital time stamping [30], secure fine-grained access control [12], secure data mining [1], and private information retrieval [5, 28]. A taxonomy of secure coprocessing applications has been provided in [27]. The techniques developed therein though are quite different. Note that the capabilities provided in the architectures such as Trusted Computing Group's trusted platform module [29], while complementary, do not solve the problem.