1. Field of the Invention
The present invention relates to data storage systems that utilize redundant data backup subsystems. More particularly, the invention concerns a method and apparatus employing a fairness approach to selectively allow or reject updates to a data storage subsystem, in order to avoid overrunning the data storage subsystem""s update buffer.
2. Description of the Related Art
In this information age, there is more data than ever to transmit, receive, process, and store. And, as people""s reliance upon machine readable data increases, they are more vulnerable to damage caused by data loss. Consequently, data backup systems have never been more important.
Generally, data backup systems copy a designated group of source data, such as a file, volume, storage device, partition, etc. If the source data is lost, applications can use the backup copy instead of the original, source data. The similarity between the backup copy and the source data may vary, depending upon how often the backup copy is updated to match the source data. If the backup copy is updated in step with the source data, the copy is said to be a xe2x80x9cmirrorxe2x80x9d of the source data, and is always xe2x80x9cconsistentxe2x80x9d with the source data.
Some competing concerns in data backup systems are cost, speed, and data consistency. Systems that guarantee data consistency often cost more, and operate more slowly. On the other hand, many faster backup systems typically cost less while sacrificing absolute consistency.
One example of a data backup system is the Extended Remote Copy (xe2x80x9cXRCxe2x80x9d) system, sold by International Business Machines Corp (xe2x80x9cIBMxe2x80x9d). In addition to the usual primary and secondary storage devices, the XRC system uses a xe2x80x9cdata moverxe2x80x9d machine coupled between primary and secondary devices. The data mover performs backup operations by copying data from the primary devices to the secondary devices. Storage operations in the XRC system are xe2x80x9casynchronous,xe2x80x9d since primary storage operations are committed to primary storage without regard for whether the corresponding data has been stored in secondary storage.
The secondary devices are guaranteed to be consistent with the state of the primary devices at some specific time in the past. This is because the XRC system time stamps data updates stored in the primary devices, enabling the secondary devices to implement the updates in the same order. Time stamping in the XRC system is done with a timer that is shared among all hosts coupled to primary storage. As an example, the common timer may comprise an IBM Sysplex Timer, PIN 9037-002. Since the secondary devices are always consistent with a past state of the primary devices, a limited amount of data is lost if the primary devices fail.
A different data backup system is IBM""s Peer-to-Peer Remote Copy (xe2x80x9cPPRCxe2x80x9d) system. The PPRC approach does not use a data mover machine. Instead, storage controllers of primary storage devices are coupled to controllers of counterpart secondary devices by suitable communications links, such as fiber optic cables. The primary storage devices send updates to their corresponding secondary controllers. With PPRC, a data storage operation does not succeed until updates to both primary and secondary devices complete. In contrast to the asynchronous XRC system, PPRC performs xe2x80x9csynchronousxe2x80x9d backups.
In many backup systems, recovery involves a common sequence of operations. First, backup data is used to restore user data to a known state, as of a known date and time. Next, xe2x80x9cupdatesxe2x80x9d to the primary subsystem that have not been transferred to the secondary subsystem are copied from the xe2x80x9clogxe2x80x9d where they are stored at the primary subsystem, and applied to the restored data. The logged updates represent data received after the last backup was made to the secondary subsystem, and are usually stored in the same chronological order according to when they were received by the primary subsystem. After applying the logged updates, the data is considered to be restored, and the user""s application program is permitted to access the restored data.
Although many of the foregoing technologies constitute significant advances, and may even enjoy significant commercial success today, IBM engineers are continually seeking to improve the performance and efficiency of today""s data backup systems. One area of possible focus concerns the management of updates received at the primary storage devices. Namely, in some cases, an excessive number of updates are received before there is an opportunity to transfer updates to the secondary subsystem, and clear the log. In this case, these updates can overrun the update log, possibly causing the backup session to fail.
One possible solution to this problem is to limit the number of updates placed into the update log by uniformly blocking all updates intended for certain storage devices. For instance, the primary subsystem may block updates intended for all devices in a particular channel group. However, as recognized by the present inventors, this approach may result in blocking a relatively small number of updates for some devices that unfortunately happen to reside in the blocked channel group. Thus, the effect of this solution is unfairly applied to the devices in that channel group. These devices are xe2x80x9cstarvedxe2x80x9d from receiving updates, which may cause delays in the application programs trying to store data on those devices. Although the effect of these delays varies by the nature of the application program, they range from user frustration to possible program crashes. Consequently, known update management approaches may not be completely adequate for some applications due to certain unsolved problems.
When a primary data storage subsystem receives updates for local storage and backup at a counterpart secondary storage subsystem, the primary subsystem institutes device-specific, fairness-driven update blocking to avoid overrunning the primary subsystem""s update buffer with updates destined for any one physical or logical device. Broadly, the primary subsystem initially receives update requests, logs the updates in an update buffer, stores the logged updates in primary storage, and finally copies the updates to the secondary storage subsystem. Each update request includes update data and also identifies a corresponding logical device, physical device, or other targeted subpart of primary storage. The primary subsystem maintains a counter or other xe2x80x9cupdate activity indicatorxe2x80x9d that represents update activity for each storage subpart. The update activity may comprise, for example, the number or size of updates contained in the buffer for that subpart. For each update request, the primary subsystem consults the update activity indicator to determine whether the identified subpart""s update activity exceeds a prescribed level. If not, the update data is stored in primary storage. Otherwise, if the update activity is excessive, the primary subsystem rejects the update. Optionally, the primary subsystem may selectively override certain rejections to prevent starving updates for that subpart.
The foregoing features may be implemented in a number of different forms. For example, the invention may be implemented to provide a method to apply fairness-driven update blocking to avoid overrunning the primary subsystem""s update buffer with updates destined for any one physical device, logical device or other storage subpart. In another embodiment, the invention may be implemented to provide an apparatus such as a data storage subsystem, configured to apply fairness-driven update blocking as explained herein. In still another embodiment, the invention may be implemented to provide a signal-bearing medium tangibly embodying a program of machine-readable instructions executable by a digital data processing apparatus to perform operations as discussed above. Another embodiment concerns logic circuitry having multiple interconnected electrically conductive elements configured to perform operations as discussed above.
The invention affords its users with a number of distinct advantages. Chiefly, the invention implements fairness-driven update blocking to regulate additions to the update buffer without starving devices that are not receiving updates at an excessive rate. With this technique, only the devices that have exceeded prescribed thresholds are blocked, allowing other applications to run properly. Advantageously, the invention also provides for reviewing and then dynamically tuning the blocking methodology. The invention also provides a number of other advantages and benefits, which should be apparent from the following description of the invention.