Database is an integral part of almost every information system. The key features databases propose are shared access, minimal redundancy, data consistency, data integrity and controlled access. The case where databases hold critical and sensitive information is quite common, therefore an adequate level of protection to database content has to be provided.
Database security methods can be divided into four layers:                physical security;        operating system security;        DBMS (Database Management System) security; and        data encryption.        
The first three layers alone are not sufficient to guarantee the security of the database since the database data is kept in a readable form. Anyone having access to the database including the DBA (Database Administrator) is able to read the data. In addition, the data is frequently backed up so access to the backed up data also needs to be controlled. Moreover, a distributed database system makes it harder to control disclosure of the data.
The secure transmission of data and user authentication has been well studied and incorporated into today's e-business market. Almost all Web browsers and servers support SSL (Secure Socket Layer) or TSL (Transport Socket Layer) so, for example, a credit card number is protected on its way to the Web server. Vendors, such as VeriSign® supply services of third party authentication. Before creating a secured channel, for example SSL channel, Web browsers authenticate the destination address by verifying the authenticity of the Web server's certificate. However, once the data arrives securely at the certified server support in storing and processing the data in a secure way is inadequate.
Security and privacy aspects of private data stored on a data storage server have recently become an interesting and challenging field of research. Encryption is a well established technology for protecting sensitive data. Anyone having access to the encrypted data cannot learn anything about the sensitive data without the encryption key. Furthermore, encryption can be used to maintain data integrity so that any unauthorized changes of the data can easily be detected.
There are three general approaches for considering integrating cryptography into databases:                The first approach is called “loose coupling”. In this approach, the server implements pre-defined cryptographic services installed on the database server. One example is an encryption package that is stored on the database server and encrypts the newly inserted database content using the user supplied encryption key.        The second approach is called “tight coupling”. In this approach a new set of cryptographic services are added to the DB as new SQL statements together with the necessary control and execution context ensures that the new SQL queries are executed securely. This approach is a harder task to implement than the previous one, since changes have to be performed in core database software.        The third approach is a mixture of both approaches where some changes are implemented as new SQL statements while most of the changes are integrated into the database as stored procedures built over the new set of SQL statements.        
The three approaches described above consider encryption to be performed in the database server. Thus, the database server is assumed to be trusted.
Database Encryption Methods
Database encryption can be implemented at different levels: tables, columns, rows and cells. Encrypting the whole table, column or row entails the decryption of the whole table, column or row respectively when a query is executed. Therefore, an implementation which decrypts only the data of interest is preferred.
Several database encryption methods have been proposed. For example, a database encryption method presented in U.S. Pat. No. 4,375,579 (on the basis of this patent was publicized an article “A Database Encryption System with Subkeys” by Davida G. I., Wells, D. L. and Kam J. B.) is based on the Chinese-Reminder theorem where each row is encrypted using different sub-keys for different cells. This method enables encryption at the level of rows and decryption at the level of cells. However, U.S. Pat. No. 4,375,579 has a number of significant disadvantages:                a. It relays on a specific encryption function and not on any symmetric or asymmetric encryption function.        b. Each encrypted record is a single function of all of its field values and each field is encrypted with a separate encryption key. In order to perform an update operation, all field values must be known. This means that only by having all the encryption keys any change can be made to a record. Updates can be performed only at secure periods when all of the encryption keys are accessible to the DBMS.        c. In order to perform management operations, such as adding or deleting a column, all of the encryption keys for that column have to be accessed and the values have to be decrypted (Deleting of adding a column has immediate affect on all of the fields in all of the records in the table).        d. Needs a special mechanism for updates that could only be performed during secure periods. After each update, each row can not be accessed until it is re-encrypted, since the selected values are not the updated values. In order to select specific fields, the entire record has to be retrieved in order to decrypt the above specific fields.        
Another database encryption method presented in “Multilevel Secure Database Encryption with Subkeys” by Min-Shiang, H., and Wei-Pang, Y. extends the encryption method presented in U.S. Pat. No. 4,375,579 by supporting multilayer access control. It classifies subjects and objects into distinct security classes which are ordered in a hierarchy such that an object with a particular security class can be accessed only by subjects in the same or a higher security class. In this method, each row is encrypted with sub-keys according to the security class of its cells. Still another database encryption method presented in “A Cryptographic Mechanism for Sharing Databases” by Buehrer, D., and Chang, C. proposes an encryption method for a database based on Newton's interpolating polynomials. One disadvantage of all the above methods is that the basic element in the database is a row and not a cell, thus the structure of the database is modified. In addition, all of those methods require re-encrypting the entire row when a cell value is modified.
A further database encryption method presented in “A Database Record Encryption Scheme Using RSA Public Key Cryptosystem and Its Master Keys” by Chang, C. C., and Chan, C. W. is based on the RSA public-key method and suggests two database encryption methods: one field oriented and the other record oriented. Both of the suggested methods support distinction between write and read access rights. The disadvantage of the field oriented encryption method is that it is not resistant to substitution attacks trying to substitute two encrypted cells. The disadvantage of the record oriented method is similar to the one of the record oriented encryption methods discussed above. Still further encryption method provided in “Practical Techniques for Searches on Encrypted Data” by Song, D. X., Wagner, D., and Perrig, A. suggests computing the bitwise exclusive or (XOR) of the plaintext values with a sequence of pseudo-random bits generated by the client according to the values of the plaintext value and a secure encryption key. This method supports searches over the encrypted data without revealing anything about the plaintext values except the locations of the searched plaintext. However, the proposed method does not protect from attacks that substitute two encrypted values in the database and requires query translation since the pseudo-random bits for a value searched need to be computed by the client.
Still a further encryption method presented in “GBDE-GEOM Based Disk Encryption Source” by Kamp, P. H. suggests encrypting the entire physical disk allowing the database to be protected. One of the disadvantages of that method is that the DBA can perform no administrative tasks on the database, since the entire content of the database is encrypted.
Therefore, it is an object of the present invention, to provide a simple and efficient method and system for database encryption, overcoming the shortcomings of the prior art database encryption methods.
It is another object of the present invention, to suggest how to encrypt the entire content of the database without changing its structure.
It is still another object of the present invention, to allow the DBA to continue managing the database without being able to view or manipulate the database content.
It is still another object of the present invention, to provide a method and system for database encryption, wherein anyone gaining access to the database can not learn anything about its content or tamper the data, unnoticed, without the encryption key.
It is a further object of the present invention to provide a method and system decrypting only the data of interest.
It is still a further object of the present invention to provide a method and system for database encryption, wherein the structure of the database tables and indexes remains as before encryption.
It is still a further object of the present invention to provide a method and system for database encryption, wherein queries are not changed because of the encryption.
It is still a further object of the present invention to provide a method and system for database encryption, ensuring that existing applications can use the encrypted database without the need for any changes in the application software.
It is still a further object of the present invention to provide a method and system for secure database indexing, protecting against information leakage and unauthorized modifications.
It is still a further object of the present invention to provide a method and system for secure database indexing supporting discretionary access control in a multi-user environment.
Other objects and advantages of the invention will become apparent as the description proceeds.
Indexing Encrypted Databases
The conventional way to provide an efficient execution of database queries is using indexes. Indexes in an encrypted database raise the question of how to construct the index so that no information about the database content is revealed.
Increasingly, organizations and users prefer to outsource their data center operations to external application providers. As a consequence of this trend toward outsourcing, highly sensitive data is now stored on systems that are not under the data owner control. While data owners may not entirely trust providers' discretion, preventing a provider from inspecting data stored on their own machines is difficult. For this kind of service to work successfully it is of primary importance to provide means of protecting the secrecy of the information remotely stored, while guaranteeing its availability to legitimate clients.
Communication between the client and the database service provider can be secured through standard means of encryption protocols such as SSL (Secure Socket Layer). With regard to the stored data security, access control has proved to be useful, provided that data is accessed using the intended system interfaces. However, access control is useless if the attacker simply gains access to the raw database data, thus bypassing the traditional mechanisms. This kind of access can easily be gained by insiders, such as the system administrator and the database administrator (DBA).
Database encryption introduces an additional layer to conventional network and application security solutions, and prevents exposure of sensitive information even if the raw data is compromised. Database encryption prevents unauthorized users from viewing sensitive data in the database and, it allows database administrators to perform their tasks without having access to sensitive information. Furthermore, it protects data integrity as unauthorized modifications can easily be detected.
A common technique to speed up queries execution in databases is to use a pre-computed index, as described in “Database Management Systems” by Ramakrishnan, R. and Gehrke, J. However, once the data is encrypted, the use of standard indexes is not trivial and it depends on the encryption function used. Most encryption functions preserve equality thus, Hash indexes can be used, but information, such as the frequencies of indexed values is revealed. Most encryption functions do not preserve order thus, B-Tree indexes, can no longer be used once the data is encrypted.
Furthermore, if several users with different access rights use the same index, each one of them needs access to the entire index, possibly to indexed elements, which are beyond his access rights. Google™ Desktop, as an example to this problem, allows indexing and searching personal computers data. Using this tool, a legitimate user, is able to bypass user names and passwords, and view personal data of other users who use the same computer, since it is stored in the same index.
Indexes are mostly structured as trees and which can reveal the order of the indexed nodes (by browsing the ordered leafs). This information can be exploited to estimate the value of a particular encrypted node since the relative position of the encrypted node within the ordered set of nodes can imply the plaintext value of this node. In addition, the references to the positions of a particular indexed value may allow various statistical attacks on the indexed values. Even if the references to the indexed values are secured, a change to the index after an insert to the database provides the potential attacker with valuable information (an attacker could correlate the new value inserted to the index with the new value inserted to the database and thus reveal the reference for that value).
Several methods for encrypted indexing have been proposed in the past. For example, an indexing method provided in “Executing SQL Over Encrypted Data in the Database-Service-Provider Model” by Hacigumus, H., Iyer, B., Li, C., and Mehrotra, S. is based on encrypting the whole database row and assigning a set identifier to each value in this row. When searching a specific value, its set identifier is calculated and then passed to the server who in turn returns to the client a collection of all rows with values assigned to the same set. Finally, the client searches the specific value in the returned collection and retrieves the desired rows. In this method, equal values are always assigned to the same set, thus some information is revealed when applying statistical attacks. Using this approach requires more computation by the client since the result of the queries is not accurate. Furthermore, the sizes of the buckets assigned to the same set are also a matter to be considered.
Another indexing method provided in “A Framework for Efficient Storage Security in RDBMS” by Iyer, B., Mehrotra, S., Mykletun, E., Tsudic, G., and Wu, Y. is based on constructing the index on the plaintext values and encrypting each page separately. Whenever a specific page of the index is needed for processing a query, it is loaded into memory and decrypted.
Since the uniform encryption of all pages is likely to provide many cipher breaking clues, still another indexing method provided in “Chip-secured data access: Confidential Data on Untrusted Servers” by Bouganim, L., and Pucheral, P. suggests encrypting each index page using a different key depending on the page number.
However, the above methods described in “A Framework for Efficient Storage Security in RDBMS” by Iyer, B., Mehrotra, S., Mykletun, E., Tsudic, G., and Wu, Y., and “Chip-secured data access: Confidential Data on Untrusted Servers” by Bouganim, L., and Pucheral, P. implemented at the level of the operating system are not satisfactory since in most cases it is not possible to modify the operating system implementation. Furthermore, in these methods, it is not possible to encrypt different portions of the database using different keys.
A further indexing method suggested by Boneh, D., Crescenzo, G. D., Ostrovsky, R., and Persiano, G. in “Public Key Encryption with Keyword Search” constructs a mechanism enabling the server searching for pre-defined key words within a document using a special “trapdoor” supplied by the user for that keyword. Apart from the key word, the method reveals nothing about the document. However, the above method does not support range queries and query translation has to be performed since the client has to compute the “trapdoor” from each keyword searched.
The major drawback of the last two methods is that there is no support in indexes structured as trees since the server can only perform exact matches to the user's query and thus lacks the ability to evaluate the relation between two tree nodes in the index.
Assuming the index is implemented as a B+-Tree, encrypting each of its fields separately would reveal the ordering relationship between the encrypted values.
Still a further indexing method suggested in “Order Preserving Encryption for Numeric Data” by Agrawal, R., Kiernan, J., Srikant, R., and Xu, Y. builds the index over the data encrypted using an encryption method called OPES (Order Preserving Encryption Scheme). OPES allows comparison operations to be applied directly to the encrypted data. However, revealing the order of the encrypted values is not acceptable for any application.
Still a further indexing method provided in “Balancing Confidentiality and Efficiency in Untrusted Relational DBMSs” by Damiani, E., De Captiani Divimercati, S., Jajodia, S., Paraboschi, S., and Samarati, P. suggests encrypting each node of the B+-Tree as a whole. However, since references between the B+-Tree nodes are encrypted together with the index values, the index structure is concealed, and therefore the DBA finds the index unmanageable.
The Attacker Model
The attacker can be categorized into three classes: Intruder—a person who gains access to a computer system and tries to extract valuable information. Insider—a person who belongs to the group of trusted users and tries to get information beyond his own access rights. Administrator—a person who has privileges to administer a computer system, but uses his administration rights in order to extract valuable information. All of the above attackers can use different attack strategies: Direct storage attacks—attacks against storage may be performed by accessing database files following a path other than through the database software, by physical removal of the storage media or by access to the database backup disks. Indirect Storage attacks—an adversary can access schema information, such as table and column names, metadata, such as column statistics, and values written to recovery logs in order to guess data distributions. Memory attacks—an adversary can access the memory of the database software directly (The last one is usually protected by the Hardware/Operation System level).
When selecting the right approach for indexing encrypted databases, the following aspects should be considered:                a. Information Leakage—a secure index in an encrypted database should not reveal any information on the database plaintext values. The possible information leaks are: Static leakage—Gaining information on the database plaintext values by observing a snapshot of the database at a certain time. For example, if the index is encrypted in a way that equal plaintext values are encrypted to equal ciphertext values, statistics about the plaintext values, such as their frequencies can easily be learned. Linkage leakage—Gaining information on the database plaintext values by linking a database value to its position in the index. For example, if the database value and the index value are encrypted in the same way (both ciphertext values are equal), an observer can search the database ciphertext value in the index, determine its position and estimate its plaintext value. Dynamic leakage—Gaining information about the database plaintext values by observing and analyzing the changes performed in the database over a period of time. For example, if a user monitors the index for a period of time, and if in this period of time only one value is inserted (no values are updated or deleted), the observer can estimate its plaintext value based on its position in the index.        b. Unauthorized Modification—In addition to the passive attacks that monitor the index, active attacks that modify the index should also be considered. Active attacks are more problematic, in the sense that they may mislead the user. For example, modifying index references to the database rows may result in queries returning erroneous set of rows, possibly benefiting the adversary. Unauthorized modifications can be made in several ways: Spoofing—Replacing a ciphertext value with a generated value; Splicing—Replacing a ciphertext value with a different ciphertext value; Replay—Replacing a ciphertext value with an old version previously updated or deleted.        c. Structure Perseverance—When applying encryption to an existing database, it would be desirable that the structure of the database tables and indexes is not modified during the encryption. This ensures that the database tables and indexes can be managed in their encrypted form by a database administrator as usual, while keeping the database contents hidden. For example, if a hash index is used and the values therein do not distribute equally, performance might be undermined, and the DBA might wish to replace the hash function. In such a case, the DBA needs to know structure information, such as the number of values in each list, but does not need to know the values themselves.        d. Performance—Indexes are used in order to speed up queries execution. However, in most cases, using encrypted indexes causes performance degradation due to the overhead of decryption. Indexes in an encrypted database raise the question of how to construct the index so that no information about the database content is revealed, while performance in terms of time and storage is not significantly affected.Discretionary Access Control (DAC)        
In a multi-user (discretionary) database environment each user only needs access to the database objects (e.g., group of cells, rows and columns) needed to perform his job. Encrypting the whole database using the same key, even if access control mechanisms are used, is not enough. For example, an insider who has the encryption key and bypasses the access control mechanism can access data that are beyond his security group. Encrypting objects from different security groups using different keys ensures that a user who owns a specific key can decrypt only those objects within his security group. Following this approach, different portions of the same database column might be encrypted using different keys. However, a fundamental problem arises when an index is used for that column. In this case each one of the users, who belong to different security groups using different keys, needs access to the entire index, possibly to indexed elements, which are beyond their access rights. The same problem arises when the index is updated.
Key Management in Database Encryption Methods
Databases contain information of different levels of sensitivity that have to be selectively shared between large numbers of users. Encrypting each column with a different key, results in a large number of keys for each legitimate user. However, using the approach proposed in “Secure and Selective Dissemination of XML Documents” by Bertino, E., and Ferrari, E. can reduce the number of keys. It is shown how the smallest elements which can be encrypted using the same key according to the access control policy can be found. Thus, the keys are generated according to the access control policy in order to keep their number minimal. This approach can be incorporated in the proposed method to encrypt sets of columns with the same key in accordance with the database access control policy. The dynamic nature of encrypted databases adds complexity and special requirements to the key management process. However, “Secure and Selective Dissemination of XML Documents” by Bertino, E., and Ferrari, E. does not deals the database encryption problems.
Key management in encrypted databases can be preformed at five different levels:                a. keys can be created on a database level; this implies that the whole database is encrypted using the same key, thus, users gaining access to the encryption key can access the whole database;        b. keys can be created on a table level; each table will be encrypted using (possibly) a different key, and a user that gaining access to one of the encryption keys can access all tables encrypted using that key;        c. keys can be created in vertical-partitions-levels; in this case, each row can be encrypted using a different key;        d. keys can be created on a column level; this enables each column to be encrypted using a different key; and        e. keys can be created on a cell level; this enables maximal freedom when enforcing the access control policy by encryption but introduces difficulties when managing key updates, data manipulations and changes to the access control policy.        
There are three different approaches to the encryption keys storage:                a. Storing the encryption keys at the server side—The server has full access to the encryption keys. All computation is performed at the server side.        b. Storing encryption keys at the client side—The client never transfers the keys to the server and is responsible for performing all encryption and decryption operations. Where the database server has no access to the encryption keys, no computations can be performed at the server side since they entail revealing the database values.        c. Keys per session—The database server has full access to the encryption keys during the session but does not store them on disk. This ensures that the user transaction can be performed entirely at the server side, during the session. However, since the keys are never kept in the database server after a session terminates, an attacker can not learn anything about the database values as he has no access to the encryption keys.        
If the database server (e.g., database service provider) is not trusted, it is preferred that the database server would not be able to learn anything about the stored data, and thus the keys are kept only at the client side. In cases when the database server is fully trusted, except for its physical storage (e.g., external storage provider, backup tapes stored in an untrusted location), the keys can be stored at the server side in some protected region.
The Desired Properties of a Database Encryption Method
According to “A Database Encryption System with Subkeys” by Davida, G. I., Wells, D. L., and Kam, J. B. a database encryption method should meet the following requirements:                security—it is mandatory that the encryption method should be either theoretically or computationally secure (require a high work factor to break it) as it is the only guarantee for data security especially in cases where the database is stored in an untrusted site;        performance—encryption and decryption should be fast enough so as not to degrade system performance (not affect the complexity of the database operations);        data volume—the encrypted data should not have a significantly greater volume than the unencrypted data; the space complexity of the database storage before and after applying the encryption method should remain the same;        decryption granularity—in order to support efficient random access, the encryption method should support the decryption of single database records without the need to access other records; moreover, database records should be independent of other records since the DBMS may rearrange records in any given time (e.g., sort table files for matters of performance, solve fragmentation problems);        encrypting different columns under different keys—this should be supported; different users have different access rights and the encryption method should support the enforcement of access rights using encryption;        patterns matching and substitution attacks—the encryption method should protect against attacks that use patterns matching and substitution of encrypted values; any unauthorized substitution should be detected at decryption time;        unauthorized access detection—modified data by an unauthorized user should be noticed at decryption time; and        maintain database structure—the security mechanism should be flexible and not entail any change in the structure of the database. The structure of the database refers to two main aspects: (a) the internal database files and algorithms representing the implementation of the DBMS, (b) the SQL queries together with all the interface commands used in order to manipulate and retrieve data. Preferably applying the new encryption method should not entail any changes to the internal representation or implementation of the database or change the way the user interacts with the DBMS.        
A naive approach for database encryption is to encrypt each cell separately. This approach has several drawbacks.
First, two equal plaintext values are encrypted to equal ciphertext values. Therefore, it is possible, for example, to collect statistical information as to how many different values a specified column currently has. The same holds for the ability to execute a join operation between two tables and collect information from the results.
Second, it is possible to switch unnoticed between two ciphertext values. Different ciphertext values for equal plaintext values can be achieved using a polyalphabetic cipher, for example Vernam cipher. However, in this solution decryption of a record depends on other records and thus requirement of decryption granularity described above is violated.
Encryption Granularity
Table/Index encryption can be performed at various levels of granularity: single values, records/nodes, pages or whole table/index. When choosing the level of granularity, the following should be considered:                a. Information Leakage—The higher the level of encryption granularity, the less information is revealed. Single values level encryption of the table/index reveals sensitive information, such as frequencies of the table/index values. Whole Index level encryption ensures that information about the data can not be leaked, since it is encrypted as one unit.        b. Unauthorized Modifications—Encryption at higher levels of granularity makes it harder for the attacker to tamper with the data. Single values level encryption of the table/index allows an attacker to switch two ciphertext values without being noticed. whole table/index level encryption implies that a minor modification to the encrypted table/index has a major effect on the plaintext table/index and can easily be detected.        c. Structure Perseverance—Higher levels of encryption granularity conceal the table/index structure. Whole table/index level encryption changes the structure of the index, since the basic element of reference is changed from a single value to the entire table/index. Single values level encryption of the table/index preserves its structure.        d. Performance—Finer encryption granularity affords more flexibility in allowing the server to choose what data to encrypt or decrypt. Whole table/index level encryption requires the whole table/index to be decrypted, even if a small number of table/index nodes are involved in the query. Single values level encryption of the table/index enables decryption of values of interest only.        
Better performance and preserving the structure of the database can not be achieved using pages or whole table/index encryption granularity. However, special techniques can be used in order to cope with unauthorized modifications and information leakage, when single values or records/nodes granularity encryption are used.
Hereinafter, it is assumed that the encryption keys are kept per session and that the table and index are encrypted at the single values level of granularity.