This invention relates to storage systems, and in particular to storage area networks in which copying and remote copying of data provided by a host is provided. Modern storage systems provide users with the capability of storing and backing up enormous amounts of data quickly over networks to various local and remote locations. In such systems, at the time an initial copy of information is stored on hard disk drives at a primary storage site, a remote copy is made to corresponding hard disk drives at a secondary storage site. The storage system is typically configured to automatically copy the entire disk, and configure the disks at the primary and remote storage sites as “remote copy pairs.” In performing these operations, data is provided to the hard disks at the primary site under control of a host system. The operation is typically performed by the host sending data and write requests to the primary storage system, which acknowledges receipt of those write requests. As the data arrives at the primary storage system from the host, it is usually stored in a cache memory before being written to hard disk drives in the storage system. Either synchronously with the writing of the data to the hard disk drives, or asynchronously, the data is also written to storage media in a secondary storage system, typically located remotely from the primary storage system. In this manner, highly reliable access to the data is provided, making the system less susceptible to natural disasters or other events which may damage or destroy one of the two storage systems.
One problem which occurs in storage systems is commonly known as cache puncture or cache overflow. Each time data is to be written to a selected portion of the storage system, for example, a particular hard disk drive or a group of hard disk drives, the data is first transferred to a cache memory. This allows the high speed host system and its affiliated bus network to operate at speeds much higher than those employed in the electromechanical writing of data onto hard disk drives. In this manner, the computer system will continue to operate and perform other calculations or provide other services to the user, while the data is being written to the disks in the much slower electromechanical operations common to hard disk drives or other storage systems.
Normally, the random nature of reading and writing data in a large storage system will not overwhelm any particular component because the components have been appropriately sized for operation in these circumstances. In some circumstances, however, access from the host to the primary storage system will be sequential, that is, with many consecutive sectors targeted at only one or a small group of hard disk drives, for example, operations such as batch or backup processes which create large loads on small portions of the storage system. Cache puncture is more likely in these circumstances.
FIG. 11 is a diagram which illustrates a typical circumstance of cache puncture. As shown in the second row of the diagram, host operations up through a given time t will be normal and have minimal impact on the cache memory. In the example depicted, however, beginning at time t the host accesses become heavy, to the point of exceeding the maximum capability of the remote copy (RC) or even the primary storage system. In these circumstances, as shown by the cross-hatched portion of the curve in FIG. 11, an overloaded condition occurs in which more data is being transmitted to the storage system than the storage system is capable of storing immediately. This large amount of data will be attempted to be stored in the cache memory, which is essentially functioning as a buffer between the high speed host the slower speed disk drives. If the large demand for storage continues as shown by the upper curve in FIG. 11, eventually the capability of the cache will be exceeded, as shown by the location designated “X” in FIG. 11. At this point the host will need to intervene to reschedule writing the data until a later time or take some other action. The lower two rows in FIG. 11 illustrate a normal operation in which the host access never reaches the maximum performance line (shown dashed). Thus, there will be no cache puncture, and the overall operation will be carried out in the normal manner.
In prior art systems, the solution to cache puncture was to send an error message and stop the writing operation until it could be rescheduled for a time when demands were lower. This slowed overall system operation. In addition, in some prior art systems, a wait condition was introduced to place host operations on hold for enough time to allow the cache memory to become available. This wait condition was introduced, however, only at the time of error messages, and not in circumstances as a preventive measure to preclude the error message in the first place. For example, IBM in its document entitled “Implementing ESS Copy Services on S/390” describes two techniques: (1) If data in cache exceeds a predefined limit, I/O will be blocked to the specific volume with the highest write activity, and (2) a temporary write peak will cause a session to be suspended. Accordingly, what is needed is an improved technique for controlling the operations of the host, a primary system and a secondary subsystem in a manner which precludes cache puncture, yet still provides high speed performance.