The fundamental purpose of record keeping is to establish irrefutable proof and accurate details of events that have occurred. However, critical records, such as business communications, financial statements, and medical images, are increasingly stored in electronic form, which makes them relatively easy to clandestinely destroy or modify. The threat of intentional and inside attacks is very real, given the extremely high stakes that may be involved in tampering with the records. With recent corporate misconduct and the ensuing attempts to change history, a growing fraction of records is now subject to regulations (e.g., Sarbanes-Oxley Act, SEC Rule 17a-¾, HIPPA, DOD 5015.2) regarding records maintenance.
To protect data records from tampering, the current industry practice and regulatory requirements (e.g. SEC Rule 17a-4) rely on storing records in write-once read-many (WORM) storage for preservation. Conventional solutions have focused on protecting the integrity of individual data records. However, given the large amount of data stored in current electronic record repositories, it is not practical to scan through all data when a portion of the data records requires retrieval. Instead, data are often accessed through some type of metadata structure. Examples of such metadata structures include directories in a file system or search indexes created by a search engine.
Protecting such metadata structures is at least as important as protecting the data records. Even if all data records are stored on WORM storage and protected from malicious altering, a tampered index can still hide an existing data record. Hiding an existing data record effectively “erases” the data record by rendering it inaccessible in an efficient manner or within a reasonable amount of time. Similarly, a tampered index may point to a different data record than the correct one, effectively “replacing” the original data record.
While data records covered by regulations are generally fixed-content data, the metadata structures for indexing the data records are dynamic data structures that are updated as new data records are added into the system or expired data records are purged. Conventional approaches for generating such dynamic data structures typically require rewritable storage, which leads to greater risk of tampering. Recently there has been research on index structures that do not require rewritable storage. Such an index grows in an append-only fashion without overwriting any previously written data and therefore can also be stored on WORM storage.
Although this technology has proven to be useful, it would be desirable to present additional improvements. Conventional WORM solutions fall short when data needs to be extracted out of the trusted WORM device. The WORM storage prevents overwrites to data only as long as the data is stored inside the WORM system. However, unless the user requesting the data has direct access to the WORM system, the retrieved data could be tampered during data transfer. This could happen, for example, when a query result is transferred from the data repository received the query to the user initiated the query, or, when data records and metadata structures are migrated from a source system to a target system. Even if the target system is also a WORM system, data is still vulnerable during the migration process such as when data is being transferred through a network.
Standard approaches for protecting data over untrusted communication channels such as encryption are inadequate given the high likelihood of insider attacks. Since the owners of the data records and the system are often the same group of people who may benefit from tampering with the data, an insider adversary in this case often has the highest (executive) level of support and insider access, privilege, and knowledge. The adversary cannot destroy records in a blatant fashion (for example, by physically destroying the storage devices), as such destruction is easy to detect and may lead to severe penalties and a presumption of guilt. However, the adversary may initiate a spurious migration of data records and attempt to modify selected records during the migration process.
Some existing WORM solutions produce a one-way hash for each data record based on the content of the data record and use the one-way hash as the record identifier. Such a hash value, sometimes called the content address of the data record, can be used to verify whether the content of the data record has been modified.
Although this approach has proven to be useful, it would be desirable to present additional improvements. A content address merely provides a way to verify whether the content of the data record matches with the content address. However, the content address tells nothing about whether such a data record actually exists within a particular system.
Furthermore, the content addresses are themselves part of the metadata that need to be protected from tampering. Content addresses stored on WORM storage are exposed to the same level of risk as the rest of the data during the migration process. Content addresses stored outside the WORM storage require additional mechanisms to protect the content addresses.
Computing a content address to verify the content of a corresponding data record requires accessing all the content of the data record. This is generally not an issue for data records since each data record is typically accessed as a whole. However, metadata structures such as indexes are constructed in such ways that a query execution only needs to access a small fraction of all the data in an index. Accessing all the data in the index to compute a content address to verify the integrity of the index results in unacceptable performance and defeats the purpose of having an index.
Unlike data records comprising content that is fixed after creation, metadata structures such as indexes are updated frequently. Consequently, either the content address of the index is computed immediately before a data migration or the content address is updated as the index is being built. Computing the content address of the index immediately before a data migration exposes the index to tampering while the index is being build.
What is therefore needed is a system, a service, a computer program product, and an associated method for verifying the integrity and completeness of records that preserves the trustworthiness of both data records and metadata structures, in particular, across data migrations. Such a system and method should also provide efficient verification of the correctness of query results. The need for such a solution has heretofore remained unsatisfied.