1. Field of the Invention
This invention relates generally to enhancement of performance for hierarchical caching of data and particularly to a system and method employing techniques which reduce host computer channel and control unit wait time while employing a RAID parity data recovery scheme.
2. Description of the Related Art
Modem high-performance data processors use a private high-speed hardware-managed buffer memory in front of the main data store to reduce average memory access delay at the Central Processing Unit (CPU). This high-speed buffer is denominated a "cache" because it is usually transparent to the applications programmer. Because hardware speed is generally directly proportional to hardware cost, the cached memory features can be cost-effectively improved by adding another faster cache in from of the first cache if made smaller. Such multilevel cache "hierarchies" are known in the art to give rise to a requirement for "coherence management" in shared memory multiprocessing configurations because each CPU is directly coupled only to its private cache. That is, the temporary contents of many separate private cache buffers must be somehow coordinated to ensure that only the most recent record copies are "committed" to the underlying main data store. The term "committed" typically means data is written and a commit message indicating the write operation is sent to a controller.
Another problem related to overall system performance arises in systems that employ multilevel data storage subsystems. For instance, a modern shared-storage multiprocessing system may include a plurality of host processors coupled through several cache buffer levels to a hierarchical data store that includes a random access memory level followed by one or more larger, slower storage levels such as Direct Access Storage Device (DASD) and tape library subsystems. Transfer of data up and down such a multilevel shared-storage hierarchy requires data transfer controllers at each level to optimize overall transfer efficiency.
Until electronically stored data is committed to a permanent form of storage, the data is volatile, i.e., subject to being lost if power is interrupted. For this reason, elaborate schemes have been employed in the art to protect data without the inherent time lag required for permanent DASD storage that usually employ head disk assemblies (HDAs) requiring mechanical movements to read and write data. Such a scheme involves some type of nonvolatile buffer that stores back-up copies of modified data stored in electronic high speed cache memory until the data is committed to an HDA. Additionally, the modified data is retained in cache memory until the data is committed. Typically the cache and nonvolatile buffer are part of or attached to a control unit that manages both of them. In keeping with data integrity requirements, this ensures that at least two coherent copies of the data are available, with at least one being stored on a nonvolatile medium.
The IBM 3990 storage controller is an example of a storage controller used to control data transfer between DASD-based storage libraries and host computer processors. This storage controller includes a local cache memory for buffering data transfers to and from the underlying DASD storage subsystem. Additionally, this controller is equipped with NVS for storing back-up copies of modified data in cache. The IBM 3990 storage control subsystem is fully described in "IBM 3990 Storage Control Planning, Installation and Storage Administration Guide" (IBM document GA32-0100-04, International Business Machines Corporation, copyright 1991) and in "IBM 3990 Storage Control Introduction" (IBM document GA32-0098-0, International Business Machines Corporation, copyright 1987). Each of these documents may be ordered directly from IBM.
One model of a typical NVS equipped storage controller, IBM 3990 Model 3, handles up to 16 channels from host computers and up to 64 logical DASDs. Another model, the IBM 3990-6, in a configuration supporting an architecture known as the Enterprise System Connection (ESCON) environment, can support up to 128 logical channels over eight physical channels. Within the storage controller are two multipath storage directors and four storage paths, two of which are associated with each multipath storage director. Each multipath storage director may be connected to up to eight incoming channels from host computers, for a total of 16 channels. Thus, each multipath storage director functions as an eight-by-two switch.
Another example storage controller is the 9340 and its descendants such as the IBM 9343. Similarly to the IBM 3990 storage controller, the IBM 9343 is used to control data transfer between DASD-based storage libraries and host computer processors. The IBM 9343 storage controller includes a local cache memory, but is not equipped with nonvolatile storage (NVS). The IBM 9340 storage control subsystem is fully described in "IBM 9340 Direct Access Storage Subsystems Reference" (IBM document GC26-4647-01). This document may be ordered directly from IBM.
A typical non-NVS equipped storage controller (IBM 9343 and 9345) holds up to eight channels from host computers and up to 64 logical DASDs. There are two storage clusters, each having four system adapters, allowing communication with four host channels per cluster. A device adapter attaches to a DASD port. There are two device adapters per cluster, or four per device. Thus, up to four writes may be processed simultaneously, and by accessing data stored in cache up to eight reads may be processed simultaneously through the system adapters.
As is known in the art, channels are physical links between a host computer processor and an external device, such as a DASD data storage subsystem. Usually, a host computer has a small number of channels, each physically connected to channel control multiplexers such as the IBM 3990 or 9343 storage controller. For instance, several host computer processors may be connected to one IBM 3990-3 or 3990-6 storage controller, which in turn is connected to sixty-four DASD volumes. When transferring data, the storage controller can secure any one of the plurality of channels and storage paths back to the host computer and forward to the DASD to establish a temporary input/output transaction data path. It is a feature of the IBM 3990 storage controller that such a data path between a host computer and a DASD subsystem may be severed into two separate connection intervals, each of which may be handled over a different physical channel and storage path. That is, a DASD access request need not be answered over the same channel on which it is received. This feature increases storage controller efficiency because the storage controller is free to handle other tasks during the disconnect interval between request and response.
Recent advances in DASD storage library art include exploitation of the Redundant Arrays of Inexpensive Disks (RAID) technology now well-known in the art. RAID theory is described by Patterson et al. "A Case for Redundant Arrays of Inexpensive Disks", Proc. ACM SIGMOD Conf., Chicago, Ill., Jun. 1988). RAID DASD technology has led to development of a DASD storage system rack incorporating a plurality of cached DASD modules each organized to emulate logical DASD storage volumes. Each module includes a high-speed cache buffer memory for facilitating data transfers between a specific plurality of DASDs and a channel to the adjacent storage controller. Such a module is herein denominated a Cached Storage Drawer (CSD) subsystem.
The independent development of a new CSD RAID type of DASD subsystem and a distributed host processor storage controller has given rise to a new variation of the cache hierarchy architecture known in the art. The IBM 3990 and the IBM 9340 types of storage controller both provide a cache buffer memory to support data transfer between host computer and DASD-based storage subsystem. The CSD subsystem provides internal cache buffer memory to support data transfers in and out of the RAID plurality of DASDs. Thus, connecting the IBM 3990 or the IBM 9340 type of storage controller to a CSD storage system creates an unplanned dual-cache hierarchy comprising either storage controller cache and the CSD cache. Each of these two attached cache memories is independently managed for different purposes, including the aging and demotion of cache entries according to a Least Recently Used (LRU) priority scheme and the like. This unplanned duplication presents novel problems and opportunities heretofore unknown in the hierarchical cache art.
A significant problem involved with including the CSD subsystem for implementation of RAID technology is the associated phenomenon known as the "RAID write penalty". Typically in a RAID architecture, parity data is created by some parity calculation, such as an exclusive OR operation and the parity is used to reconstruct user data in the event of some type of failure. In recent versions of the RAID architecture, for example RAID 5, this parity is spread across an array of drives to avoid bottlenecks, i.e., data traffic related to writing parity being concentrated at one drive. Unfortunately, this means that a write operation must occur at each drive on which parity is kept resulting in a significant lag time. Generally, the RAID write penalty refers to the extra overhead of reading the data from a drive, reading parity, generating parity, and the writing of the data and the parity.
In a RAID 5 architecture, a host channel implemented write follows the following general pattern. The storage controller selects a logical device in the CSD subsystem on which the host requested track image resides, and then orients to the correct host requested record. The logical device refers to emulated track images which appear to the control unit as a physical track on a COUNT, KEY, DATA, (CKD) formatted disk drive. The track images are actually stored in FBA format on HDAs using a small computer system interface (SCSI). The CKD to FBA format mapping is transparent to the storage controller which interacts as if the data is contained on a physical drive in CKD format. The CKD and FBA format and the related mapping techniques are incidental to the present invention and are well known in the art. The record is modified in the CSD cache, and a commit request is issued from the storage controller to the CSD subsystem. The control unit disconnects from the CSD subsystem, but the logical device in the CSD subsystem remains busy and is not available to service controller requests (e.g., as cache read misses, discussed in detail below). Additionally the storage controller cannot destage additional track images to the CSD cache until a commit reply is returned from the CSD subsystem. The NVS storage space dedicated to storing back-up copies of the modified records cannot be freed until a commit signal is returned from the CSD subsystem. Otherwise, without the NVS requirement, data integrity might be compromised because there would be no nonvolatile form of the data until it was written to HDA. In a system employing a storage controller without NVS storage, the RAID write penalty has the same effect, but the only way to ensure data integrity is to issue a commit to force the data to be written to HDA. Finally, upon finishing the RAID parity algorithm, the CSD subsystem issues a commit complete to the storage controller. It should be apparent to one skilled in the art that waiting for a synchronous commit on data to be written directly to HDA in a system using a RAID architecture is inherently disadvantageous because of the mechanical lag time associated with each HDA. On the other hand, the commit and writing the data to HDA serve a useful purpose in ensuring data integrity and coherency, such that the tension between performance versus integrity and coherency is significant in a system employing a RAID scheme.
In U.S. Pat. No. 4,875,155, Iskiyan et al. disclose a method for managing data in a peripheral subsystem in such a way that performance is balanced against the interest of ensuring data integrity. The '155 patent is herein incorporated by reference in its entirety for its disclosure related to general cache management in a mainframe environment. This patent is assigned to the assignee of the present invention. The '155 patent discloses an asynchronous destage of data in cache in which the order or priority of events is determined by optimizing how often a least recently used destage function is called. Essentially, as cache entries reach the bottom of LRU lists, the data needs to be destaged to DASD to make room for replacement data. A LRU scan function searches the LRU list to add modified entries to a LRU destage queue. According to the number of modified entries found, priority is determined for dispatching asynchronous destages. This method optimizes cache management, but does not address the problem of controller tie-up waiting for a commit or freeing nonvolatile storage before a commit is received. Nor does the '155 patent disclose a technique for avoiding extended control unit delays due to the RAID write penalty in a system employing either a controller having or not having NVS.
An article in International Business Machines Technical Disclosure Bulletin, by Beardsley et at., April 1990 at pages 70-75, describes cache management which employs a scheme to control cache by enabling commands under user control. The scheme involves a host operating system used to override the user to ensure that cache resources are not overallocated. The status of the cache is available to the storage controller on an asynchronous basis. Special interception commands make cache and NVS devices available and unavailable for use. However, the management of the actual storage space is not disclosed. Nor does this disclosure deal with the above-mentioned problems of controller tie-up and NVS space being unavailable while a commit reply is sought at the controller. The article also does not deal with the problem of minimizing a RAID write penalty in a cache hierarchy.
In U.S. Pat. No. 4,916,605 to Beardsley et at., and assigned to the assignee of the present invention, a technique is described for performing a fast write operation. The '605 patent is herein incorporated by reference in its entirety. The fast write technique is incorporated in the IBM 3990 in the form of a DASD fast write and a cache fast write. Both fast write capabilities permit write operations from a host computer channel to be implemented at cache speed. A DASD fast write maintains a copy of the data in NVS until it is destaged (i.e. written from cache to the DASD). Cache fast write is typically used with special kinds of data, such as temporary data created by a work file. Cache fast write does not use NVS. The present invention, while useful with either or any type of host initiated write operation, is especially useful with DASD fast writes. In the 3990-3, a DASD fast write requires that a commit request be sent from the control unit to the CSD after data has been written to cache memory in the CSD. Once the data is successfully destaged from the cache in the CSD to the RAID plurality of DASD HDAs, a commit reply is sent to the control unit indicating the data is written to HDA. For a host channel write operation, the control unit waits for the commit reply before presenting end status, thus completing the operation and freeing the control unit for handling other host initiated tasks. Additionally, the NVS space dedicated to back-up copies of the modified data is also unavailable until the commit reply is received.
It is clear from these references that the prior art is unconcerned with the new phenomenon of the RAID write penalty presented in a cache hierarchy employing a CSD having CSD cache and RAID DASD. Nor is the prior art concerned with the problems presented by employing a synchronous commit when data is destaged from controller cache to CSD cache. Finally, the prior art does not teach or suggest techniques to recover nonvolatile storage space without impacting data integrity, and while also overcoming the performance disadvantages of using a synchronous commit while employing a RAID scheme. When a CSD data storage library subsystem is coupled with a plurality of distributed host processors through one or more cache storage controllers, there is a clearly-felt need in the art for enhanced performance without degrading data integrity. The related unresolved deficiencies are solved by the present invention in the manner described below.