1. Field of the Invention
The invention relates generally to storage systems and more specifically relates to methods and structures for utilizing write-back caching features of disk drives in a storage subsystem so as to improve system performance while maintaining reliability.
2. Discussion of Related Art
Storage systems typically incorporate local storage controller features within the storage system and a plurality of storage devices such as disk drives for storing significant volumes of user data. User data is communicated from attached host systems through read and write I/O requests processed by the storage controller within the storage subsystem. The requests record (write) or retrieve (read) previously recorded data on the storage devices of the storage subsystem. Frequently, the storage devices in such storage subsystems are magnetic disk drives. A local disk controller is typically incorporated within each such disk drive and is adapted to control low level operations of the disk drive itself—operations such as controllably rotating the magnetic storage medium, controllably actuating control mechanisms to position a read/write head assembly and read/write channel electronics to record information on the disk drive storage medium or to retrieve information from the magnetic storage medium.
As is generally known in the art, it is common to utilize cache memory in such storage subsystems to improve performance of the storage subsystem as perceived by the attached host systems. Typically, a storage controller of such a storage subsystem utilizes cache memory to process requests from host systems. Using such a storage controller cache memory, write I/O requests are processed by recording the user supplied data in a write portion of the cache memory of the storage controller. A return status indicating successful completion of the host system write request is then returned to the host system to complete the request. In like manner, a read request may be processed by first looking for the requested data in the storage controller's cache memory. If found, the requested data is returned to the host system from the copy in cache memory. Since the recording and retrieving of such user data in a semi-conductor cache memory is much faster than the time required for recording the data on the magnetic disk media of the associated disk drives, the host systems perceive much faster response from the storage subsystem and hence higher overall performance.
In many storage applications, reliability of the stored data is critical. Once the data is communicated from a host system, the host system and applications may rely on the storage subsystem to properly and persistently record the supplied data on the magnetic storage media of the disk drive. Numerous techniques are known in the art to assure such reliability. For example, the cache memory of the storage controller is typically a nonvolatile semiconductor memory (i.e., battery backed up or other mechanisms for assuring integrity of the data stored in the controller's cache memory despite potential loss of power to the storage subsystem.]). Further, in case of catastrophic failure of a single storage controller in the storage subsystem, an alternate or redundant storage controller is often provided. The primary and alternate storage controller may be cooperative such that one controller is active at any given time and the other is passive in a backup mode waiting to take over in case of failure of the primary controller. Further, the redundant storage controllers may be configured such that both controllers are simultaneously, independently active processing I/O requests and also coordinating to permit takeover by one controller of the other controller's responsibility.
It is also a generally known that present-day disk drives have local disk controller logic that includes substantial local cache memory for use in control of the disk drive itself. Local cache memory in a disk drive is used in a manner similar to that of the storage controller. However, the cache memory associated with the local disk controller is most typically implemented using volatile memory components to maintain a lower cost for the disk drive components. Since the local cache memory of the disk controller is typically a volatile memory structure, loss of power or other failures of the disk drive of could result in loss of the data stored in the disk controller's local cache memory.
Generally, cache memory is used in either of two modes: write-through mode and write-back mode. For example, in using the cache memory of a storage controller in the write-through method, a host sends a write request to the storage system, the storage controller of the storage system receives the data from the host and saves it in its cache memory. The storage controller then writes the received data from its cache memory to one or more disk drives, and returns a status back to the host. Write-through is also used in writing data from a storage controller to the individual disk drives. The storage controller sends a block oriented write request to each affected disk drive. The disk drive controller receives the data from the storage controller into its local cache memory (typically lower cost volatile RAM as compared to the non-volatile memory used for the storage controller's cache memory). The disk controller immediately writes the data to magnetic/optical persistent storage media, and returns a status back to the storage controller which then returns an appropriate status to the host system.
In the write-back cache management method as between a host device and a storage system controller, a host sends a write request to the storage system and the storage controller receives the data from the host and saves it in its cache memory. The storage controller then immediately returns a successful status to the host. The supplied data is then securely stored in the non-volatile cache memory of the storage controller. The host may then proceed with other requests or processing. Some time later, the storage controller writes (often referred to as “flushing” or “committing”) the data from its cache memory to the affected disk/disks. The non-volatile cache memory of the storage controller retains the data until proper commitment to the disk drives is verified. The write-back method allows data from multiple host requests to remain in cache until it is convenient or necessary to write the cached data from the storage controller's cache memory to the affected disk drives.
Regardless of the mode of cache usage of the storage controller's cache memory with respect to host requests, the write-through mode is presently relied on for lower level operations between the storage controller and the local cache memory of the disk drive controller. Since the local cache memory of the disk drive controller uses volatile memory for its local cache memory features, the storage controller must rely only on the write-through mode to assure that the data is successfully flushed or committed before removing or altering the data stored in its non-volatile cache memory.
In general, write-back mode usually performs better than write-through mode because the device receiving the data into its cache memory may perform local optimizations based on the volume of data stored in its cache. In general, more cached data allows the device caching the data to make more local decisions to optimize its performance.
System performance issues relating to cached data are even more critical in the context of RAID storage systems (Redundant Array of Independent Disks). In RAID systems, stored data is combined with redundancy information to permit continued operation of the system despite failure of any single disk drive. In RAID “level 5” storage management techniques, data may be striped (distributed) over a plurality of disk drives operating in parallel. Parity data (redundancy data) is also striped over the drives in such a manner to allow continued operation despite failure of a single disk drive. When writing data to such a RAID storage system, the storage controller frequently must read previously stored data, update the parity values and then write the new data along with the updated parity information back to the affected disk drive or disk drives. Hence, the data to be written along with previous data and the updated parity data may reside in the storage controller's cache memory until it is successfully flushed or committed to the persistent storage media of the affected disk drives.
One important measure of improved performance is the write response time. Response time as used herein is the elapsed time between when a host issues a write request to the storage system until successful status has been returned. In both write-back and write-through caching methods, this response time includes the time to transfer the data from the host to the storage controller's cache memory. Write-through cache mode response time, unlike write-back cache mode response time includes the time to perform any necessary RAID related reads, generate updated RAID parity, and the time required to write the data and updated parity to disk. With a shorter response time, the application running on the host can continue processing sooner, thereby allowing it to start the next I/O sooner. In write-back mode, the RAID storage system still incurs the overhead of RAID reads, RAID parity generation, and the time required to write the data to disk, but it can optimize those activities. In write-back mode, cached data from multiple writes may be concatenated or grouped to make RAID write methods more efficient—thereby improving efficiency by writing more data in a single operation. With write-through cache operations, this capability is limited to the current number of queued write requests in the storage controller's cache memory. Similarly, so called “elevator-sorting” methods used to reduce disk seek times work better for write-back cached data in large part because there is a larger selection of data blocks to choose from.
Disk drive controllers in conventional disk drives (i.e. those with rotating media) are capable of both write-through and write-back cache management methods but, as noted above, their local cache memory is volatile instead of non-volatile to reduce cost of a disk drive unit. It is also cost prohibitive to protect all of the drives in an entire RAID storage system with an uninterruptible power supply (UPS). The power requirements to maintain operation of all disk drives in a large RAID storage subsystem would be too large for practical UPS solutions. As a result, write-back caching on a disk drive, though frequently available, is not used because of the potential for data loss in the event of a power failure or reset.
Some RAID storage system vendors allow a system administrator to enable write-back caching on the drives to improve performance with the understanding that data loss will occur if a drive is reset, or loses power. With this understanding, the system administrator configures the associated disk drives for non-critical and/or temporary data storage; typically any data that won't require a significant amount of time to recreate.
It is evident from the above discussion that a need exists for an improved cache management method and structure to allow better utilization of write-back cache operations in disk controllers of disk drives coupled to storage system controllers and in particular RAID storage controllers.