1. Field of the Invention
This invention relates to data storage management, and more particularly to apparatus and methods for maintaining data integrity.
2. Background of the Invention
Data is increasingly one of an organization's most valuable assets. Accordingly, it is paramount that an organization protect its data. One method for protecting data is to store multiple copies of the data so that if one copy is corrupted, a valid copy will still remain. In a mirror and copy (e.g., PPRC, Metro Mirror, Global Mirror, etc.) environment, two storage devices may be located some distance from one another to store two or more copies of the same data.
Typical mirror and copy systems, for example, may include a host device, a primary storage device and a secondary storage device. The host device may write data to the primary storage device, which in turn may copy the data to the secondary storage device. The I/O is only considered complete when the write to both the primary and secondary storage devices is complete.
In some mirror and copy systems, a delay of arbitrary duration may occur when messages (e.g., write commands) are transmitted from the primary storage device to the secondary storage device. That is, a period may exist between the time the primary storage device transmits a write command to the secondary storage device, and the time the data associated with the write command is actually written to the secondary storage device. These delays may be the result of network delays, queuing on storage device network adapters, or where the secondary storage device is temporarily unable to process a command because it is handling a fault or error. Situations may also occur in which messages sent by the primary storage device are lost and do not arrive at the secondary storage device. Such situations may compromise data integrity because they may create situations where data is lost or newer data is undesirably overwritten with older (stale) data.
In some mirror and copy systems, multiple communication paths may exist between a primary and secondary storage device. A write command that was unsuccessfully transmitted over one path may be re-transmitted over another path. For various reasons (e.g., network delays, hardware problems, etc.), data may travel more quickly down one path than another. As a result, commands sent from the primary storage device to the secondary storage device over different paths may not arrive in the order in which they were sent. In addition, some implementations could require multiple concurrent updates to the same areas of a secondary volume.
In the event that a write command needs to be re-driven, the first write command is typically aborted prior to sending the second write command. There may exist, however, a period of time in which the first write command cannot be reliably aborted. Even if the primary storage device aborts the first write command, the secondary storage device may receive and process the first write command prior to receiving the abort command. There is a possible situation where a second write command could arrive and be processed at the secondary storage device before the first write command is processed. This may create a situation wherein the first write command overwrites newer data and compromises data integrity.
Current methods do not provide a way to reliably solve the above-mentioned problems without incurring a possible significant performance penalty. For example, the primary storage device may read back data from the secondary storage device to determine what was actually written, or perform lengthy operations to flush commands prior to sending new commands. Some protocol standards may require the network to deliver or discard traffic within a certain period of time. Thus, in order to guarantee that subsequent commands do not arrive before the original command, which could compromise data integrity, primary storage device recovery may also include waiting long timeout periods before accessing the same secondary device. However, a significant drawback of these and other methods is that they do not perform operations quickly enough to meet stringent performance requirements.
In view of the foregoing, what is needed is an apparatus and method to preserve data integrity between primary and secondary storage devices in a mirror and copy environment. Ideally, such an apparatus and method would ensure that data associated with commands sent by the primary storage device to the secondary storage device and later aborted will not overwrite newer data on the secondary storage device. Further needed are apparatus and methods to ensure that commands that do not complete quickly can be reliably aborted and re-driven down another path to meet system performance requirements.