1. Field of the Invention
The present invention relates to cryptographic techniques for protecting stored data that is maintained in and transferred between various media.
2. Description of the Related Art
Cryptographic techniques often are employed to achieve data security objectives such as confidentiality, integrity, origin authentication, and version verification. Examples of cryptographic operations used to realize these security objectives include encryption, one-way hash functions, pseudorandom number generators, and digital signatures.
Data to be protected may be in transit over an interconnection network or at rest on a storage device. A logical data object consists of an arbitrary quantity of data as well as identifying metadata such as the object name, object size, etc. In storage or in transit, the data object may be logically encoded using N bits. A data object that has been subjected to operations to ensure one or more security goals (such as confidentiality and integrity) is henceforth referred to as a protected data object.
Symmetric-key encryption algorithms such as the Data Encryption Standard (DES) or the Advanced Encryption Standard (AES) often are used to provide confidentiality for data objects. Encryption algorithms provide confidentiality by disguising and encoding sensitive data such that an unauthorized entity cannot obtain the encrypted data given a reasonable amount of time and computation resources. FIG. 1 illustrates the operation of a symmetric-key encryption algorithm. A publicly known symmetric-key encryption algorithm accepts as inputs a secret key, K, and the sensitive data to be encrypted, P. The input data P, which is also known as plaintext, may consist of a multiple data objects, a single data object, or a subset of bits from a data object. The secret key is a quantity of information (often random and ranging from 1 bit to thousands of bits in size) that is only known to authorized parties. The output of the encryption algorithm is the ciphertext, C, which can be stored on publicly-accessible data storage media without significant risk of exposing P so long as K remains secret. Upon retrieving C, only authorized entities can compute P using a publicly known symmetric-key decryption algorithm that corresponds to the chosen encryption algorithm. FIG. 2 illustrates a symmetric-key decryption algorithm. As shown in FIG. 2, given the ciphertext C and the secret key K, the decryption algorithm outputs the plaintext P. In many systems, this same secret key K is used to protect multiple plaintext inputs.
Cryptographically-strong one-way hash functions (examples of which include SHA-256 and SHA-512) often are used to provide data integrity for data objects. These hash functions are used to generate keyed message authentication codes (e.g., HMACs) that serve as “fingerprints” for data objects. Fingerprints are appended to their associated data object, and they can be inspected to ensure that the data has not been modified by an unauthorized party. FIG. 3 illustrates the generation of a Hash Message Authentication Code (“HMAC”) by an authorized entity using a one-way hash function. The HMAC generation function F accepts a secret key K and an arbitrarily-sized input Z. Only authorized entities possess knowledge of the secret key K. Z consists of data to be protected as well as other data such as the logical or absolute location of the data within the data object. The function F employs multiple iterations of the hash function over encodings of the two inputs to produce a hash fingerprint (i.e., the HMAC) G. The hash fingerprint G is a fixed-sized value (often between 128 and 512 bits in size) that corresponds to the input values, and G is stored or transmitted along with the data Z.
The verification of an HMAC by an authorized entity is illustrated in FIG. 4. An authorized entity applies the secret key K and the data input Z (retrieved from a data object or other sources) as inputs to the function F. The output of F is the HMAC G′. Let G be the value of an HMAC retrieved from the associated data object. The authorized entity then compares the value of G to the value of G′. If they are equivalent, then the integrity of Z is verified; if not, corruption or forgery, i.e., unauthorized writing of new data, has occurred.
By the properties of strong hash functions, it is highly unlikely that two different sets of inputs will yield the same fingerprint, and it is highly unlikely that the value of K can be calculated given a hash fingerprint G and the data input Z. These properties ensure that if an unauthorized entity attempts to write data to a data object without knowledge of K, then it is highly unlikely that the entity would be able to compute a new valid HMAC. Thus, if an unauthorized write occurs, this can be detected by an authorized entity during a subsequent reading of a data object through the process illustrated in FIG. 4.
Symmetric-key encryption algorithms and one-way hash functions can provide a high degree of protection for data, but performance and access control issues do arise in many data security systems when applying these cryptographic techniques.
First, in order to read or write an arbitrary number of bytes at an arbitrary offset within a data object, the security system may require that all or a significant portion of the data object be decrypted/encrypted and hashed to complete the read or write operation. For example, if the encryption algorithm is implemented in the Cipher Block Chaining (CBC) mode of operation, certain write requests will require the entire data object to be written and encrypted to generate the desired ciphertext. This requirement can be mitigated to some degree by employing alternative modes of operation for encryption, but performance issues remain with respect to hashing for data integrity and other security goals.
If one or a small number of HMACs are created and stored for a given data object, the entire data object often may need to be processed in order to verify the HMAC(s) or to generate a new HMAC. That is, many systems require the entire encrypted or plaintext data object to be hashed even when verifying the integrity of a small number of bytes. In systems that employ relatively large data objects, the performance impact of entire-object hashing per each read/write can be prohibitively expensive.
Second, as described above, secret keys must either be provided to or maintained by authorized entities in order for an authorized entity to perform read and write operations on protected data objects. At a given time, there exist a set of entities that are authorized to access a data object, and those access rights are enabled by the ability to retrieve the relevant data object keys. These keys are used as inputs to encryption and one-way hashing algorithms to ensure data confidentiality, data integrity, and other goals. Any entity that has knowledge of such keys and the ability to read/write a data object may possess the ability to interpret the plaintext contents of the data object as well as write arbitrary data to the data object. Thus, if access to a data object for a particular entity is revoked, it is essential that access to the keys corresponding to that data object is prevented for that entity.
In systems where data object keys are made available directly or indirectly to authorized entities in order to perform data object input/output operations, it is possible for the entity to retain direct or indirect knowledge of the data object keys for use at a future time. Thus, in order to guarantee that unauthorized entities cannot access a data object to which they formerly enjoyed access, the data object must be “re-keyed”. That is, the data object must be decrypted and hashed with the current data object keys and then re-encrypted and re-hashed with new data object keys. This process ensures that future data written to a data object is not readable by revoked or unauthorized entities, and it ensures that revoked or unauthorized entities cannot write new data to the data object without being detected. If data objects were not re-keyed following access revocation, a revoked entity could employ previously acquired knowledge of the cryptographic keys in conjunction with the most current version of the data object to read or write arbitrary bytes stored within the data object. When access modifications are frequent, this requirement leads to frequent re-keying, which is a significant performance problem in systems with many data objects or with large data objects.