Many corporate and government entities collect data, and are governed by regulations dictating how the data is to be stored and retained. Different types of data are subject to different types of regulations. Data must often be secured against manipulation, so that it is difficult or impossible for changes to be made to the data without the creation of an audit trail.
Numerous financial reporting regulations require that certain types of data must be maintained for a fixed time period for examination be regulatory bodies. Other data, such as customer financial data or medical records, must be secured against accidental release, and must only be maintained for a defined time period. This can create difficulties for entities that must maintain one set of data for a first time period, and another set of data that cannot be stored for longer than a shorter time period.
In most corporate environments, data is stored on a centralized file system. Safeguards, such as access rights, can be implemented to allow segregated or tiered access to the various types of data on the server. For data security, the central file repository is typically backed-up to provide recovery ability in the event of catastrophic data loss. Backing-up the data typically results in all data being stored on a single backup media element such as a tape. This backup must them be stored for two competing storage times. Some of the data must be preserved, while other data should not be.
Furthermore, if court proceedings or an audit are ongoing, destruction of the backup to allow the required deletion is not permitted. This may result in a requirement for indefinite retention of documents associated with a particular case. It is exceedingly difficult to search through every storage device and piece of backup media to find the data of interest, and of course, for the duration of the order all such media must be preserved. Failure to comply completely has resulted in the most extreme sanctions, and in some cases may lead to criminal prosecution. On the other hand, any given piece of backup media may have information on it relating to thousands or millions of cases unrelated to the court order, the indefinite preservation of which leads to said unrelated data not being destroyed when it is prudent or legally necessary to do so.
The problem is compounded by the fact that it is usually necessary to “restore” a backup tape (i.e. copy it back to hard disk) to be able to search through its content for information of interest. In addition to being labor-intensive and time consuming, it typically requires a duplicate set of hardware upon which to perform the restore operation as the system that created the data is likely to be fully utilized in the day-to-day running of the business. Many times the deadlines for producing documents are on the order of 48 hours, which is typically insufficient to load and search every backup tape in a typical enterprise.
The conventional data center paradigm consists of servers, external primary storage (typically connected via a Storage Area Network), and backup tape drives (usually in the form of a “library” which is a robotic assembly holding a few tape drives and dozens or hundreds of tape media cassettes). This is inadequate for compliance with many regulations for a number of reasons.
The system administrator of a storage network has sufficient access rights so that he may covertly add, delete, or modify any business record in such a way that forensic examination is unlikely to reveal this activity. In a large corporation, there may be many individuals with administrator rights, so even if it was known that tampering had taken place, it would be impossible to determine who was responsible (or indeed, that it was a deliberate act at all and not an accident or software malfunction). Furthermore, for the reasons mentioned above it is not practical to accurately enforce document retention periods as there is no way to “surgically” delete a given record from a piece of backup media.
Conventional data centers do not encrypt the data on the primary storage devices nor the backup media, making them vulnerable to hackers or the loss or physical theft of backup media as it is in transit to the storage facility.
Attempts have been made to address these shortcomings in the conventional data center. One commonly used approach is to store business records on so-called “WORM” (Write Once Read Many) media, which is perceived to be more secure than ordinary computer media. However, the WORM approach has several serious weaknesses. Firstly, WORM media tends to be slow and unreliable. Second, in order to have a given document retention period, it is necessary to group documents together with similar expiry dates on a given piece of WORM media so that it can be destroyed as a unit on the appropriate date (e.g. by shredding or burning). The segmentation of data prior to backup is difficult to achieve in practice. Unfortunately, if a court or regulatory order is found to apply to a single file on the WORM media (which may be many gigabytes in size and hold millions of files), the entire WORM media must be preserved even if it is desirable or necessary to destroy the remaining files. Furthermore, the perceived tamper-resistance of WORM media is largely an illusion as it is a simple technical exercise to copy the contents of a WORM media to the perpetrator's computer, modify anything desired on the copy, and re-write the adulterated data back to a fresh piece of WORM media and substitute this new media for the old media. Lastly, since WORM media is typically stored off-line (e.g. in a box in a closet), there is no automated way to audit the data for completeness and stability. When the time comes to present the data to a court or regulator, only then it may be discovered to be unreadable or incomplete.
To address the limitations of WORM media, a new type of storage equipment was developed, specifically designed for the needs of fixed content data. Some variants were subsequently developed which added additional anti-tamper technologies, said variants commonly referred to as “compliant storage” devices.
A typical “compliant storage” device is the Centera™, manufactured by EMC Corporation. Although it addresses some limitations of conventional storage devices, such as providing assurance that data was not inadvertently modified or deliberately tampered with, it does not address all the issues. Data is not encrypted while inside the unit, thus it would be insecure to allow the data to be backed up to tape or optical media. Furthermore, the architecture requires integration with the proprietary Centera Application Programming Interface (API) which does not include an industry-standard access mechanism for reading or writing data. Lastly, it does not provide any mechanism by which a neutral third party can attest to the completeness or the records under management nor the times and dates said records were created.
Another limitation of prior art “compliant storage” devices is they lack any features, which allow the automated gathering of assets from mobile computing devices (e.g. laptop computers), or remote branch offices. A further limitation of these devices is that the provide no mechanism for deletion of files on offline media such as optical platters or tape.
It is, therefore, desirable to provide a file storage solution that provides encrypted storage with the ability to erase expired information but without providing an opportunity to modify data or the contents of the system without leaving a secure audit trail.