1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to querying encrypted data in a relational database system.
2. Description of Related Art
(Note: This application references a number of different publications, as indicated throughout the specification by one or more reference numbers. A list of these different publications ordered according to these reference numbers can be found below in the section entitled “References.” Each of these publications is incorporated by reference herein.)
The Internet has made it possible for all computers to be connected to one another. The influence of transaction-processing systems and the Internet ushered in the era of e-business. The Internet has also had a profound impact on the software industry. It has facilitated an opportunity to provide software usage over the Internet, and has led to a new category of businesses called “application service providers” or ASPs.
ASPs provide worldwide customers the privilege to use software over the Internet. ASPs are staffed by experts in the art of putting together software solutions, using a variety of software products, for familiar business services such as payroll, enterprise resource planning, and customer-relationship marketing. ASPs offer their services over the Internet to small and large worldwide organizations. Since fixed costs are amortized over a large number of users, there is the potential to reduce the service cost even after possibly increased telecommunications overhead.
It is possible to provide storage and file access as services. The natural question is the feasibility of providing the next value-add layer in data management. From the business perspective, database as a service inherits all the advantages of the ASP model, indeed even more, given that a large number of organizations have their own database management systems (DBMSs). The model allows organizations to leverage hardware and software solutions provided by the service providers, without having to develop them on their own. Perhaps more importantly, it provides a way for organizations to share the expertise of database professionals, thereby cutting the people cost of managing a complex information infrastructure, which is important both for industrial and academic organizations [15].
From the technological angle, the model poses many significant challenges foremost of which is the issue of data privacy and security. In the database-service-provider model, user data resides on the premises of the database-service provider. Most corporations view their data as a very valuable asset. The service provider would need to provide sufficient security measures to guard data privacy.
At least two data-privacy challenges arise. The first challenge is: how do service providers protect themselves from theft of customer data from hackers that break into their site and scan disks? Encryption of stored data is the straightforward solution, but not without challenges. Trade-offs need to be made regarding encryption techniques and the data granularity for encryption.
This first challenge was examined by Hacigümüs et al. [6]. It was found that hardware encryption is superior to software encryption. Encrypting data in bulk reduced the per-byte encryption cost significantly, exposing to startup overheads. Encrypting by row was found preferable to encrypting by field for queries from the TPC-H benchmark [14].
The second challenge is that of “total” data privacy, which is more complex since it includes protection from the database provider. The requirement is that encrypted data may not be decrypted at the provider site. A straightforward approach is to transmit the requisite encrypted tables from the server (at the provider site) to the client, decrypt the tables, and execute the query at the client. But, this approach mitigates almost every advantage of the service-provider model, since now primary data processing has to occur on client machines. It will become clear later, for a large number of queries such as selections, joins, and unions, much of the data processing can be done at the server, and the answers can be computed with little effort by the client.
What is needed in the art are servers, hosted by the service provider, that store encrypted databases, and clients that can access and decrypt the encrypted databases. Further, there is need for a certain amount of query processing to occur at the server without jeopardizing data privacy. While data privacy is paramount, there is also a need for adequate performance of any queries performed by the servers. The present invention satisfies these needs.
There is previous work in different research areas, some of which are related to the present invention. Search on encrypted data [2], where only keyword search is supported, and doing arithmetic over encrypted data [10] have been studied in the literature. However, functionalities provided by those are very limited and insufficient in executing complex SQL queries over encrypted data.