1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to query optimization in encrypted database systems.
2. Description of Related Art
(Note: This application references a number of different publications, as indicated throughout the specification by one or more reference numbers. A list of these different publications ordered according to these reference numbers can be found below in the section entitled “References.” Each of these publications is incorporated by reference herein.)
The widespread deployment and adoption of broadband communications, the resulting glut in bandwidth, and advances in networking have caused a generational shift in computing. The emerging grid infrastructure harnesses available computing and storage at disparate heterogeneous machines into a shared network resource. There is an on-going consolidation that results in the application service provider (ASP) model. Organizations outsource some of their core information technology (IT) operations (e.g., data centers) to specialized service providers over the Internet [7, 6]. Many organizations and users will be storing their data and processing their applications at remote, potentially untrusted, computers. One of the primary concerns is that of data privacy—protecting data from those who do not need to know.
There are two kinds of threats to privacy. Outsider threats from hackers and insider threats from, perhaps, disgruntled employees. Encrypting stored data [15] is one way to address outsider threats. Data is only decrypted on the server before computation is applied and re-encrypted thereafter. Encryption and decryption performance is a problem that can be addressed by hardware and by applying techniques to minimize decryption.
Insider threats are more difficult to protect against. Recent studies indicate that a significant fraction of data theft is perpetrated by insiders [5]. For example, how would one protect the privacy of data from the data base system administrator who probably has superuser access privileges?
If the end user (end user and client are used interchangeably herein) is on a secure environment, then one way to solve the insider threat problem is to store all data encrypted on the server and make it impossible to decrypt on the server (for example, only the end user is made aware of decryption keys). In this model, we assume computation against data stored at the server is initiated by the end user. Moreover, assume that it is possible to transform and split the computation into two parts: a server part of the computation is sent to the server to execute directly against encrypted data giving encrypted results, which are shipped to the client, which decrypts and performs a client part of the computation. This scheme, under appropriate conditions, addresses the problem of insider threats. The difficulty is that there is no know way to split general computations as required. However, an interesting subset of SQL techniques necessary for such computational transformations have been found [14]. An algebraic framework has also be shown in which these techniques may be applied. However, the problem of how to put these techniques together in an optimum manner has not been addressed.
There are six concepts needed to address the query optimization problem, as described in this application: 1) data level partitioning to improve the query partitioning schemes presented by previous work, 2) a novel operator that sends data in a round trip from the server to the client and back for evaluating logical comparisons as in sorting, 3) operator level partitioning to distribute the query processing tasks between the client and the server, 4) transformation rules that are required to generate alternate query execution plans in the optimizer 5) query plan enumeration to choose the best query execution plan, and 6) an enhanced storage model that is flexible enough to satisfy different performance and privacy requirements for different systems and applications. Each is explained and described in this application. By means of an example, it is shown that significant performance improvements are possible from application of the techniques in this application.