The invention generally relates to the field of redundant data storage systems, and relates in particular to data storage systems and architectures that provide data retrieval to earlier points in time.
Data retrieval to prior points in time is required in myriad contexts including, for example data recovery due to information system failures, as well as data retrieval for record keeping purposes such as for data retrieval in distributed electronic healthcare information systems, electronic news information record keeping systems, and Internet information record keeping systems.
Data may be stored in a variety of data storage mediums, including for example, disk array architecture systems. A conventional disk array system architecture is known as the RAID system architecture that includes a redundant array of independent/inexpensive disks (RAID). The RAID system architecture provides a large amount of data storage in a reasonably reliable manner. U.S. Pat. No. 5,526,482 briefly discusses several RAID architectures.
In general, each of the RAID system architectures includes a plurality of disks that are controlled by a disk controller. When a central processing unit (CPU) sends information to the disk controller for storage on disk, the controller directs how the information shall be stored on the plurality of disks to ensure that a recovery request of any one of the disks will not cause the system to lose information. RAID-1 provides 2N data redundancy to protect data while RAID-3 through RAID-5 store data in parity stripes across multiple disks to improve space efficiency and performance over RAID-1. The parity of a stripe is the Exclusive-OR (XOR) of all data elements in the stripe. If data from a disk at time t0 is needed, and the system requests such data at time t1, the data in the disk can be retrieved by doing the XOR among the good disks, which may finish at t2. The recovered data is exactly the same image of the data as it was at time t0. Further conventional RAID architecture systems seek to retrieve data from more than one disk.
Traditional techniques for protecting data from the above failures are mainly periodical (daily or weekly) backups and snapshots. These techniques usually require a significant amount of time to retrieve data. In addition, data between backups is vulnerable to data loss.
Retrieval of data is generally measured by two key parameters: recovery point objective (RPO) and recovery time objective (RTO). RPO measures the maximum acceptable age of data at the time of outage. For example, if an outage occurs at time t0, and the system found the outage at time t1, the ideal case is to recover data as it was right before t0, or as close to t0 as possible. A daily backup would represent RPO of approximately 24 hours because the worst-case scenario would be an outage during the backup, i.e., t0 is the time point when a backup is just started. RTO is the maximum acceptable length of time to resume normal data processing operations after an outage. RTO represents how long it takes to recover data. For the above example, if data is successfully recovered at time t2 after starting the recovery process at 6, then the RTO is t2−t1. Depending on the different values of RPO and RTO, there exist different storage architectures capable of recovering data upon an outage.
Data protection and retrieval have traditionally been done using periodic backups and snapshots. Backups are typically done nightly when data storage is not being used since the process is time consuming and degrades application performance. During the backup process, user data is transferred to a tape, a virtual tape, or a disk for disk-to-disk backup. Full backups may be performed weekly or monthly with daily incremental backups occurring between the full backups.
Data compression is often used to reduce backup storage space. A snapshot is a point-in-time image of a collection of data allowing on-line backup. A full-copy snapshot creates a copy of the entire data as a read only snapshot storage clone. To save space, copy-on-write snapshot copies a data block from the primary storage to the snapshot storage upon the first write to the block after the snapshot was created. A snapshot may also redirect all writes to the snapshot storage after the snapshot was created. Such data back-up systems, however, remain costly and highly intrusive batch operations that are prone to error and consume an exorbitant amount of time and resources.
Besides periodic data backups, data may also be protected at file system level using file versioning that records a history of changes to files. Typically, users need to create versions manually in these systems. There are also copy-on-write versioning systems that have automatic versions for some file operations. File versioning provides a time-shifting file system that allows a system to recover to a previous version of files. These versioning file systems have controllable RTO and RPO, but they are generally file system dependent and may not be directly applicable to enterprise data centers that use different file systems and databases. File versioning differs from periodic backups and snapshots in that file versioning works mainly at file system level not at block device level. Block level storages usually provide high performance and efficiency especially for applications such as databases that access raw devices.
To provide timely retrieval to any point-in-time at block device level, a log of changed data for each data block may be maintained in a time sequence. In the storage industry, this type of storage is usually referred to as CDP (Continuous Data Protection) storage. In such systems, a write operation will replace the old data in the same logic block address (LBA) to another disk storage instead of overwriting it. As a result, successive writes to the same LBA will generate a sequence of different versions of the block with associated timestamps indicating the time of the corresponding write operations. These replaced data blocks are stored in a log structure, maintaining a history of the data blocks that have been modified. Since every change on a block is kept, it is possible to view a storage volume as it existed at any point in time, dramatically reducing RPO. The RTO depends on the size of the storage for the logs, indexing structure, and consistency checks. The data image at the time of an outage is considered to be crash consistent at block level because the orders of all write operations are strictly preserved. A significant drawback of the CDP storage however, is the large amount of storage space required, which has thus far prevented it from being widely adopted.
Systems for providing data retrieval for record keeping purposes, typically require large volumes of storage to enable retrieval of data to a prior point in time. In electronic healthcare information systems, for example, patient data files stored in a distributed environment allowing healthcare providers at different locations to share and easily access a variety of electronic health records (EHR). As the use of such systems increases, a reliable, secure, and efficient data storage infrastructure is critical to future healthcare systems. There exist several technical challenges however, to increasing the size and usability of such systems, including reliability, security, and adequate online performance, that make the design and implementation of such distributed data storage systems difficult.
The first technical challenge regarding data reliability has to do with the importance of having EHR data available to authorized healthcare providers once they have been created and recorded for a patient. Such EHR should not only be playable (viewable) in real time but also should be re-playable (reviewable) as it was at any point-in-time in the past. Replay/review of the history of patient data is necessary because of requirements of medical audit, law suits, quality control and self-assessment. The requirement of being able to replay EHR data makes the design of the electronic Healthcare (eHealthcare) information system challenging because of the fundamental differences between paper records and electronic records. With paper records, one can easily review the history by following the paper trails of the records. With electronic records, on the other hand, existing data storage designs do not have the paper trails; any change to a piece of data in the data storage is destructive if a data write operation overwrites/destroys previous data in the same file or record. For example, any time when one saves or writes a changed word document or a spreadsheet file, the previous version of the file is overwritten and replaced. Similarly, a database transaction will also overwrite previous record of the same table. Even the meta data that records the time of last change or last access are also changed in a destructive way.
Realizing the importance of replaying history data, there has been extensive research in data storage and database systems in terms of data protection and recovery, file versioning, and database testing. Data protection and recovery technologies periodically make backups or snapshots of data so that data can be recovered to a point-in-time in the past in case of failures or disastrous events. The granularity of backups/snapshots varies depending on the reliability requirement and cost.
As mentioned above, continuous data protection (CDP) makes a copy of old data upon each write operation. CDP provides the finest granularity for data recovery at the cost of huge amount of data storage that is several orders of magnitude larger than the amount of normal real time data. File versioning systems keep different versions of files when file changes occur. The number of versions and the frequency of making file versions can be specified by users. In addition to the negative performance impacts, file versioning is file system dependent and requires users to be familiar with the file system. Database replays were originally designed for the purpose of database testing of production database systems that need upgrade or changes. By storing real transactions happening in production systems in a separate storage system, database replay makes testing of new database installation more realistic. Again, such database replays require users to explicitly define when and for how long to capture transactions in the production system. The major issue is that it is practically infeasible to enable SQL tracing on the entire database system because of high overheads.
The second technical challenge is data security and privacy of EHR system. Because data in an EHR are stored and transmitted in a distributed environment over a network, data encryption and access authentication are very important to protect privacy of patient data. As is well known, data encryption and decryption are very time consuming process especially for large amount of patient data. Supporting replay of EHR data aggregates this problem even further because the amount of data transmitted and stored to enable data replay is several orders of magnitude larger than production data due to repetitive overwrites. As a result, online performance of such EHR system will be dragged down dramatically by storage systems supporting data replay and data security.
There is a need, therefore, for an improved redundant data storage system, and in particular, for a system architecture for retrieving data at time t2 to the data image of t0 after it is determined at time t1 that data needs to be retrieved to time t0.