1. Field of the Invention
The present invention is directed to throttling data transfer between two or more systems.
2. Description of the Related Art
Disaster recovery systems typically address two types of failures, a sudden catastrophic failure at a single point in time or data loss over a period of time. In the second type of gradual disaster, updates to volumes on data storage may be lost. To assist in recovery of data updates, a copy of data may be provided at a remote location. Such dual or shadow copies are typically made as the application system is writing new data to a primary storage device at a primary storage subsystem. The copies are stored in a secondary storage device at a secondary storage subsystem.
In some configurations, multiple primary storage subsystems send data to a single secondary storage subsystem. The secondary storage subsystem may become overloaded by the amount of data being sent by the primary storage subsystems.
In some systems, the secondary storage subsystem issues an error condition to the primary storage subsystems when a threshold on the secondary storage subsystem was exceeded. The threshold relates to an amount of resource usage. This error condition informs the primary storage subsystems to halt all Input/Output (I/O) to the secondary storage subsystem for a fixed time-out period. In addition, when the threshold is reached, all data currently being transferred is discarded by the secondary storage subsystem and needs to be resent by the primary storage subsystems. Resending the data is wasteful. Also, waiting the time-out period is more wasteful in a high bandwidth environment (e.g., a Fibre channel environment) than in a low bandwidth environment.
The primary storage subsystems receive the error condition and wait for the fixed timeout period before resuming data transfer. Even if the secondary storage subsystem may be ready to process additional data before the fixed timeout period ends, the primary storage subsystems wait the entire fixed period of time, which results in the secondary storage subsystem idling when data transfer and processing could be taking place.
FIG. 1 illustrates, with a graph 100, effects of a prior art solution. The graph 100 depicts time along the horizontal axis and secondary resource usage percentage along the vertical axis. At point 110, a maximum resource usage level is reached at the secondary storage subsystem. The secondary storage subsystem sends a message to all primary storage subsystems to wait the fixed time-out period. At point 120, the secondary storage subsystem has enough resources available to resume processing data, but it is not until point 130 that the primary storage subsystems resume transferring data. Therefore, the time between point 120 and 130 is not being used efficiently.
Additionally, such a solution is not “fair” to all primary storage subsystems connected to the secondary storage subsystem in certain situations. For example, assume that primary storage subsystem A and primary storage subsystem B are connected to one secondary storage subsystem. Primary storage subsystem A is driving resource usage (by transferring data) on the secondary storage subsystem near, but just below, the maximum resource usage level. Primary storage subsystem B begins to drive resource usage on the secondary storage subsystem and pushes the resource usage over the maximum resource usage level. Then, the secondary storage system sends an error condition to both primary storage subsystems A and B. Both primary storage subsystems A and B wait the fixed time-out period. Then, both primary storage subsystems again start driving resource usage Primary storage subsystem B is given an error and needs to wait for the fixed timeout period. Primary storage subsystem A is transferring data at a faster rate and so again drives resource usage at the secondary storage subsystem near, but just below, the maximum resource usage level. Again, primary storage subsystem B, which is transferring data at a slower rate than primary storage subsystem A, pushes the resource usage over the maximum resource usage level. This cycle continues, and primary storage subsystem B is unfairly being allowed to send less data than primary storage subsystem A for a given period of time.
Thus, there is a need in the art for throttling data transfer between systems.