Database is an integral part of almost every information system. The key features databases propose are shared access, minimal redundancy, data consistency, data integrity and controlled access. The case where databases hold critical and sensitive information is quite common, therefore an adequate level of protection to database content has to be provided.
Businesses and organizations must continually evolve and learn to manage information correctly, in order to realize their objectives and survive in the digital era. Indeed, the very survival of these organizations may depend on securing this information.
To illustrate the scope of the problem, it can be assumed a client-server scenario where the client has a combination of sensitive and non-sensitive data stored in a database at the server side. In such a scenario there are three major vulnerabilities with respect to client data: (1) Data-in-motion: assuming that the client and server are not co-located, it is vital to secure the communications between them, a solution to this can be accomplished via a standard SSL or VPN connection; (2) Data-in-use: an adversary can directly access the memory of the database server and extract sensitive information and (3) Data-at-rest: refers to all data in the database server while excluding data that is traversing a network or temporarily residing in the server's memory.
Typically, database management systems (DBMSs) protect stored data through access control mechanisms. However, an access control mechanism by itself is insufficient to protect stored data since it can be bypassed in a number of ways: by accessing database files following a path other than through the DBMS (e.g., an intruder who infiltrates the information system and tries to mine the database footprint on disk); by physical removal of the storage media; or by accessing the database backup files.
Another source of threats comes from the fact that many databases are today outsourced to Database Service Providers (DSP). In such instances, data owners have no other choice than to trust DSP's who claim that their systems are fully secured and their employees are beyond any suspicion, assertions frequently repudiated by facts. Finally, a database administrator (DBA) may have sufficient privileges to tamper with the access control definition.
An old and important principle called defense in depth involves multiple layers of security control such that attackers must get through layer after layer of defense. In this context, encryption, which can complement and reinforce access control, has recently received much attention from the database community. The purpose of database encryption is to ensure database opacity by keeping the information hidden to any unauthorized person (e.g., intruder).
Even if attackers get through the firewall and bypass access control mechanisms, they still need the encryption keys to decrypt data. Encryption can provide strong security for data at rest, but developing a database encryption solution must take many factors into consideration.
Database encryption schemes should fulfill the following requirements, including the necessity for protecting data confidentiality, detecting unauthorized modifications and maintaining high performance.
The Attacker Model
Attackers can be categorized into three classes: intruders, insiders and administrators. Intruder is a person who gains access to a computer system and tries to extract valuable information. Insider is a person who belongs to the group of trusted users and tries to get information that he is unauthorized to access. Administrator is a person who has privileges in administering a computer system, but abuses his rights in order to extract valuable information. In many cases, a DBA has access to the whole database and protecting it against him while simultaneously enabling him to perform his tasks becomes a tremendous challenge.
All of the above attackers may use different attack strategies: Direct Storage Attacks are attacks against storage, which may be carried out by accessing database files by means other than the database software, such as by physically removing the storage media or by accessing the database backup files.
In Indirect Storage Attacks, the adversary can access schema information and metadata, such as table and column names, column statistics and values written to recovery logs, in order to estimate data distributions. In Memory Attacks, the adversary may access the memory of the database software directly. In many cases, the memory contains the database cache which holds large amounts of the database for optimization reasons.
A secure database should not reveal any information about the database plaintext values to unauthorized users. This requirement can be extended and the different types of passive attacks can be categorized as follows: (a) Static Leakage—gaining information about the database plaintext values by observing a snapshot of the database at a certain time. For example, if the values in a table are encrypted in such a way that equal plaintext values are encrypted to equal cipher-text values, statistics about the plaintext values, such as their frequencies, can easily be collected from the encrypted values. (b) Linkage Leakage—gaining information about the database plaintext values by linking a database value to its position in the index. For example, if the database value and the index value are encrypted the same way (both ciphertext values are equal), an observer can search the database ciphertext value in the index, determine its position and estimate its plaintext value; and (c) Dynamic Leakage—gaining information about the database plaintext values by observing and analyzing access patterns and changes in the database over a period of time. For example, if a user monitors the index for a period of time and, if in this period of time only one value is inserted (no values are updated or deleted), the observer can estimate its plaintext value based on its new position in the index.
In addition to the passive attacks in which data is compromised as a result of observations, there are also different types of active attacks, which modify the database, as follows: (a) Spoofing—replacing a ciphertext value with a generated value. Assuming that the encryption keys were not compromised, this attack is rarely practical. (b) Splicing—replacing a ciphertext value with a different ciphertext value. In this type of attack, the encrypted content from a different location is copied to a new location. For example, if the maximal salary value encrypted is revealed through a leakage attack, swapping it with the attacker's encrypted salary will generate a valid value as his new salary; and (c) Replay—replacing a ciphertext value with an old version previously updated or deleted.
An important aspect of data security relates to the support of multi-user access control of an encrypted database environment where each user can only access (decrypt) the database objects (e.g., groups of cells, rows and columns) to which access was granted. Hereinafter this property is referred to as Cryptographic Access Control. Encrypting the entire database using the same key, even if traditional access control mechanisms are used, will not introduce a sufficient level of security. Encrypting objects from different security groups using various keys ensures that a user who owns a specific key can decrypt only those objects within his security group. Another important issue which relates to the encryption keys is their management: where and how the encryption keys should be stored; how are they being distributed to the users; and how to recover the encryption keys in case they are lost.
Performance
Security mechanisms typically introduce significant computational overhead. This overhead may constitute a fundamental problem for the DBMS, since the performance of the DBMS has a direct influence on the performance of the whole information system. When trying to minimize the performance overhead that results from encrypting the database, the following issues should be considered: (a) Selective encryption—it would be desirable to encrypt only sensitive data while keeping insensitive data unencrypted. Furthermore, only relevant data should be encrypted/decrypted when executing a query. For example, if only one attribute participates in a query, it would be unnecessary to encrypt/decrypt the entire record.
(b) Indexes and other DBMS optimization mechanisms—encrypting the database content may prevent some crucial DBMS optimization mechanisms from functioning properly. For example, some DBMS vendors do not permit building indexes on encrypted columns, while others allow it based on the column's encrypted values (in case they are not salted, (salted in cryptography means that a random value is concatenated to the plaintext value before encryption)). The latter approach results in a loss of some of the most obvious characteristics of indexes, the range searches, since a typical encryption algorithm is not order-preserving.
(c) Encryption overhead—it is desirable that the time spent for encrypting/decrypting data is minimized. For example, encrypting the same amount of data using a single encryption operation is more efficient than encrypting it in parts using several encryption operations as described in B. Iyer, S. Mehrotra, E. Mykletun, G. Tsudik, Y. Wu, A framework for efficient storage security in rdbms, Advances in Database Technology—EDBT 2004 (2004) 627-628.
Incorporating an encryption solution over an existing DBMS should be easy to integrate, namely, it is desirable to minimize the following:
(a) The influence on the application layer—some encryption solutions require modifying the implementation of the application layer, for example, by changing the SQL queries to include encryption operations. Such modifications may constitute a fundamental problem for legacy applications, wherein most cases, the process of making changes to their implementation is extremely costly, and in some cases, might not be possible at all. Therefore, a practical database encryption solution should not require a major modification to the implementation of the application layer.
(b) The influence on the DBMS architecture—it is desirable to avoid fundamental changes to the DBMS implementation. Database technology has been around for more than 30 years. Redesigning the relational model to support a new encryption model is unacceptable. It is fundamental in respect to the practicality of a DBMS encryption solution that it will be built on top of an existing DBMS implementation, including all of its functionality, such as indexing, foreign key mechanisms and locking schemes.
(c) The influence on the DBA—it is desirable to allow the DBA to perform his administrative tasks directly over the encrypted data, without the need to decrypt it first (and as a consequence, prevent sensitive data from being disclosed to the DBA).
(d) The storage overhead—Although storage nowadays is relatively cheap, it is preferable that the encrypted database should not require much more storage than the original non-encrypted database.
The encryption operation according to prior art can take place at different layers, as illustrated in FIG. 1a. Several architectures for the encryption of databases are known in the art. In the Operating System layer 102, pages are encrypted/decrypted by the operating system when they are written/read from disk 101. This layer has the advantage of being totally transparent, thus avoiding any changes to the DBMS and to existing applications. Furthermore, encryption in this layer is relatively resistant to information leakage and unauthorized modifications as a large number of database objects are encrypted in one chunk. However, it suffers from several fundamental problems: (1) Since the operating system has no knowledge of database objects and their internal structure, it is impossible to encrypt different parts of the page using different encryption keys (e.g., when those parts belong to users with different authorizations) and thus cryptographic access control cannot be supported. (2) It is not possible to encrypt specific portions of the database and leave others in their plaintext form. Furthermore, not only relevant data is decrypted during a query execution since each access requires the decryption of an entire page. Therefore, selective encryption is very limited. (3) The DBA cannot perform any administrative task (e.g., dropping a column) without possessing the encryption keys. (4) The database cache, which usually contains a large amount of disk page copies for improving performance, is kept in its plaintext form, and is thus vulnerable to data-in-use attacks.
The next possible encryption layer is the Storage Engine 103. Similarly to the operating system layer 102, pages in this layer are encrypted/decrypted when they are written/read from disk 101. However, as opposed to the operating system layer 102, encryption/decryption operations are performed by the DBMS 110, at the cell-level granularity. In other words, each time a page is loaded from disk, all encrypted values in that page are decrypted (each one separately), and each time a page is stored to disk, all sensitive values in that page are encrypted (again, each one separately). However, although the use of cell-level encryption granularity allows different values within a page to be encrypted using different keys, when a page is read from the disk into the database cache, the whole page must be decrypted, even if the initiating user does not have authorization to access all values in that page. Moreover, the fact that each time a page is written/read from disk 101, multiple encryption/decryption operations are performed, may degrade performance substantially, compared to the single encryption/decryption operation per page in the operating system layer. Note that encryption in this layer 103, is located beneath the query execution engine 105 and is therefore transparent to the query execution engine 105 and all layers above it (including the application).
SQL Interface layer 106, is a layer where data is encrypted using predefined stored procedures, views and triggers. While encryption in this layer 106, is easy to implement and does not usually require significant changes to the application layer, it has the following limitations: (1) encryption takes place above the query execution engine, and thus some database mechanisms (e.g., indexes and foreign keys) may not function properly; (2) the use of stored procedures entails a context switch from SQL to the stored procedure language which usually has a high negative impact on performance; (3) those mechanisms (namely: triggers, views and stored procedures) can be disabled by a malicious DBA.
The next layer, is the application layer 107. In this layer, sensitive data is encrypted in the application layer 107, before it is sent to the database and decrypted before usage. It supports the highest degree of freedom in terms of enforcing cryptographic access control. However, it suffers from the following disadvantages: (1) modifying legacy applications may require a lot of resources i.e., time and money; (2) as encryption takes place above the query execution engine, different database mechanisms cannot function properly and need to be re-implemented by the application; (3) it re-invents the wheel for each new application that is being developed.
The next layer is the client layer 108, which may promise the highest degree of security since the only one that is able to access the sensitive data is the legitimate client. However, it implies limiting the ability of the database server to process the encrypted data and in extreme cases, to use the database server for storage only.
When implementing a database encryption solution, one needs to decide on the combination of: (1) trust in the database server; (2) encryption granularity; and (3) layer of encryption.
However, choosing the layer of encryption dictates the trust in the database server and the encryption granularity. The client and application layers dictate a full-mistrust in the database server, while the SQL interface, storage engine and operating system layers are implemented at the server side, and thus consider the database server to be partially trusted. The operating system layer 102 is the only layer that is unaware of the database objects' internal structure and thus requires a page level encryption granularity, while all other layers dictate a cell level encryption granularity.
Existing commercial products for database encryption commonly implement encryption at the SQL interface layer (e.g., SafeNet ProtectDB), the storage engine layer (e.g., Oracle TDE), or the operating system layer (e.g., SQL Server TDE).
The client layer 108 and the application layer 107 architectures, while providing the highest level of security, are impractical in most cases due to their high impact on performance and to the changes that they impose on the application layer.
FIG. 5 is a schematic table, which summarizes the properties of the different architectures.
It is therefore an object of the present invention to provide a system and method for database encryption, which should not require a major modification to the implementation of the application layer.
It is another object of the present invention to provide a system and method for database encryption, which is built on top of an existing DBMS implementation, including all of its functionality, such as indexing, foreign key mechanisms and locking schemes.
It is yet another object of the present invention to provide a system and method for database encryption, which allows the DBA to perform his administrative tasks directly over the encrypted data, without the need to decrypt it first.
It is yet another object of the present invention to provide a system and method for database encryption, which should not require much more storage than the original non-encrypted database.
Further purposes and advantages of this invention will appear as the description proceeds.