Encryption is a well established technique for protecting sensitive data, such as confidential and personal financial or medical information, that may be stored in database systems. The data is often encrypted to prevent access by unauthorized persons or an untrusted system administrator, or to increase security of client/server type systems. See for example U.S. Pat. No. 6,148,342 and U.S. Patent Application Publications 2002/0104002A1 and 2002/0129260A1. However, once encrypted, the data can no longer be easily queried (aside from exact matches).
In their classic paper [24], Rivest, Adleman, and Dertouzos point out that the limit on manipulating encrypted data arises from the choice of encryption functions used, and there exist encryption functions that permit encrypted data to be operated on directly for many sets of interesting operations. They call these special encryption functions “privacy homomorphisms”. The focus of [24] and the subsequent follow-up work [2, 5, 9, 10] has been on designing privacy homomorphisms to enable arithmetic operations on encrypted data. Comparison operations were excluded from this line of research, though; it was observed in [24] that there is no secure privacy homomorphism if both comparison operations and arithmetic operations were included.
Note, cryptography purists may object to this use of the term “encrypted”; they may define the term to mean that absolutely no information about the original data can be derived without decryption. In this application, the term “encrypted” generally refers to the results of mathematical efforts to balance the confidentiality of data while allowing some computations on that data without first requiring decryption (which is typically a computationally expensive alternative). The data is perhaps “cloaked” or “disguised” more than “encrypted” would imply in a strict cryptographic sense.
Hacigumus et al. proposed a clever idea in [14] to index encrypted data in the context of a service-provider model for managing data. Tuples are stored encrypted on the server, which is assumed to be untrusted. For every attribute of a tuple, a bucket id is also stored that represents the partition to which the unencrypted value belongs. This bucket id is used for indexing. Before issuing a selection query to the server, the client transforms the query, using bucket ids in place of query constants. The result of the query is generally the superset of the answer, which is filtered by the client after decrypting the tuples returned by the server. Projection requires fetching complete tuples and then selecting the columns of interest in the client. Aggregation also requires decrypting the values in the client before applying the aggregation operation.
Feigenbaum et al. propose a simple but effective scheme in [11] to encrypt a look-up directory consisting of (key, value) pairs. The goal is to allow the corresponding value to be retrieved if and only if a valid key is provided. The essential idea is to encrypt the tuples as in [14], but associate with every tuple the one-way hash value of its key. Thus, no tuple will be retrieved if an invalid key is presented. Answering range queries was not a goal of this system.
In [27], Song et al. propose novel schemes to support key word searches over an encrypted text repository. The driving application for this work is the efficient retrieval of encrypted email messages. They do not discuss relational queries and it is not clear how their techniques can be adapted for relational databases.
In [4], Bouganim et al. use a smart card with encryption and query processing capabilities to ensure the authorized and secure retrieval of encrypted data stored on untrusted servers. Encryption keys are maintained on the smart card. The smart card can translate exact match queries into equivalent queries over encrypted data. However, the range queries require creating a disjunction for every possible value in the range, which is infeasible for data types such as strings and reals. The smart card implementation could benefit from an encryption scheme wherein range queries could be translated into equivalent queries over encrypted data.
In [29], Vingralek explores the security and tamper resistance of a database stored on a smart card. The author considers snooping attacks for secrecy, and spoofing, splicing, and replay attacks for tamper resistance. Retrieval performance is not the focus of this work and it is not clear how much of the techniques apply to general purpose databases not stored in specialized devices.
Among commercial database products, Oracle 8i allows values in any of the columns of a table to be encrypted [21]. However, the encrypted column can no longer participate in indexing as the encryption is not order-preserving.
Related work also includes research on order-preserving hashing [6, 12]. However, protecting the hash values from cryptanalysis is not the concern of this body of work. Similarly, the construction of original values from the hash values is not required. One-way functions [30, 31] ensure that the original values cannot be recovered from the hash values.
A scheme for performing comparison operations directly on encrypted data without first performing a decryption of the data is therefore needed, and is provided by the invention described in the related application. That invention partitions plaintext data (e.g. column values) into a number of segments, then encrypts each plaintext into ciphertexts in an order-preserving segmented manner. Comparison queries are then performed on the numerical values of the ciphertexts, and the query results are decrypted.
The present invention eliminates the distribution information available for encrypted data, thus strengthening the data protection.