The value of data that is stored in a company's storage servers is often far greater than the value of the storage servers themselves. As a result, the loss of data in a disaster may be far more catastrophic to the company than the loss of the server hardware that stores the data. In fact, in some industries the loss of data on a large scale may signal the end of a company.
In an attempt to protect against a large-scale data loss disaster, companies often invest in storage technologies that geographically disperse data, such as by backing data up from one server computer to another remotely located server computer. In this way, a disaster at one data site will not destroy all of a company's data. Instead, business continuity can be restored from a geographically remote server computer where the data has been backed up.
The process of backing up data on-line from one server to another is called replication, or remote mirroring. The servers involved are traditionally called the “primary” and “secondary” servers, or the “primary” and “replica” servers. Replication differs from traditional off-line backups, such as tape backup, by virtue of being performed on-line, i.e., while a storage volume is fielding input/output operations (“I/Os”) normally to clients. As data availability becomes more and more important, off-line backups are increasingly becoming archaic.
When a primary server fails disastrously, a system administrator has to restart the company's business from the secondary server. This is done manually, by remounting various volumes from the secondary server instead of the primary server, and restarting affected applications. This operation is called a “fail-over.” When a fail-over occurs, the secondary server acts as the recipient of I/Os from clients. The cost of keeping a system running during fail-over may be higher due to the greater cost of the network connection to the secondary server. In many deployments, however, this cost is far lower than the cost of not doing business for the duration of disaster recovery.
When the primary server has been recovered, either by repairing it, recovering it, or installing a new server computer, the system administrator will most likely want to move control back to the primary server. The process of returning responsibility for fielding client I/Os to the primary server is called “fail-back.” Fail-back is also done in a disconnected fashion, with volumes being reconnected and applications being restarted before I/Os are shipped back to the primary server.
In the replication context, two important measures have been defined to measure the effectiveness of a replication deployment. The first measure is defined as the duration of time that elapses between the failure of a primary server and the act of a secondary server taking over control by a fail-over. This is called the recovery time objective (“RTO”). The second measure is defined as the amount of data loss that is permissible during fail-over. In several situations, such as source code control for example, data loss of a few minutes is acceptable and can be recovered from without severe consequences. In other data storage scenarios, such as banking or airline reservations systems, a single second of data loss can cause irreparable damage. The amount of data loss that can be tolerated, measured in units of time preceding disaster, is called the recovery point objective (“RPO”).
Different solutions have been built and deployed that attempt to provide an appropriate RPO and RTO for the particular data storage scenario. As an example, the costliest form of replication, but which has both an RPO and RTO of zero, is called active-active clustering/mirroring. In this form of replication, both the primary and the secondary servers are active and functioning at the same time; clients connect to both of the servers, and the servers maintain consistency with each other at all times. When a primary server fails, the secondary server seamlessly takes over the entire functionality of the system without necessitating a manual fail-over.
Another type of replication is referred to as synchronous replication. In a synchronous replication installation, only the primary server fields I/Os from clients. Every write operation that arrives to the primary server is also mirrored to the secondary server. The write is signaled to the client as being completed only when it has completed on both the primary and secondary servers. In this manner, applications are always guaranteed to have their writes written to both servers. If either the primary or secondary server fails, the non-failing server is guaranteed to contain all of the data stored on the failing server. The RPO of synchronous replication is, therefore, zero. It may, however, be necessary to manually fail-over a synchronous replication installation and, as a result, the RTO may be of the order of a few hours.
Synchronous replication deployments are expensive primarily because a high-speed connection is required between the primary and secondary servers in order to avoid adversely affecting performance. Where the primary and secondary servers are a great distance apart, on separate coasts of the United States for instance, the cost of a suitable high-speed connection may be prohibitive. In such applications, it may be desirable to utilize an asynchronous replication system instead. In an asynchronous replication system, I/Os are not sent from the primary to the secondary server inline with their arrival from clients. Rather, I/O operations are immediately written to the primary server and completed to clients, but are buffered at the primary server for a few seconds before they are transmitted to the secondary server.
Because data buffering improves bandwidth utilization, and because the buffered data may be compressed or otherwise optimized for size, the data communications link needed for asynchronous replication may be significantly slower and therefore less expensive than a link in a synchronous replication setup. The trade-off, however, is in the RPO. In previous asynchronous replication systems, any open buffers on the primary server at the time of a disaster will be lost. The secondary server, by virtue of being slightly behind the primary server, will exhibit this data loss to the clients. This is unacceptable in many types of storage installations.
Previous asynchronous replication systems also have difficulty maintaining write order fidelity. Write order fidelity refers the requirement by some types of applications that writes be completed in the order in which they are made to the primary server. In asynchronous replication systems, initiators are not directly in control of the order in which I/Os are sent to the secondary. Therefore, they cannot ensure that dependent writes are flushed in the correct order to the secondary server. Nonetheless, it is the responsibility of the asynchronous replication storage system to ensure that applications are able to recover smoothly from the secondary server if disaster strikes the primary server.
In most contemporary implementations of asynchronous replication, write order fidelity is maintained by collecting I/Os that are arriving at the primary server and sending the I/Os to the secondary server in exactly the same order, without any kind of framing or buffering. This process is guaranteed to ensure that write order fidelity is maintained. This process, however, eliminates the performance and bandwidth gains that asynchronous replication provides.
It is with respect to these considerations and others that the disclosure made herein is provided.