1. Field of the Invention
This invention relates to managing write cache in a storage controller and more particularly relates to preventing write starvation in a partitioned write cache of a storage controller.
2. Description of the Related Art
The present invention is an improvement over the prior art, which includes the patent to Kevin J. Ash, U.S. Pat. No. 6,775,738, issued Aug. 10, 2004, which is hereinafter incorporated by reference. Storage controllers such as the Enterprise Storage Server® from International Business Machines manage storage requests and retrieval from host computers on a network to one or more storage devices. Storage devices may include hard disk drives in various forms such as a Direct Access Storage Device (“DASD”), a Redundant Array of Inexpensive/Independent Disks (“RAID”), and Just a Bunch of Disks (“JBOD”). A storage controller may also access other storage devices such as tape drives, optical drives, and the like.
Storage controllers typically include general cache memory (cache), which is volatile memory where the contents are lost if power is lost to the storage controller, upon reboot, etc. In addition, many storage controllers include a write cache in the form of non-volatile storage (“NVS”) that includes some form of backup power, such as a battery, to prevent loss of the contents upon loss of power, reboot, etc.
Typically, when a storage controller receives a request to store a file update or a complete file onto a storage device accessible to the storage controller, the storage controller uses a fast write operation to the data. In a fast write operation, the storage controller writes one copy of the file update or complete file to the cache and one to the write cache and then notifies the host that the write process is complete. (For simplicity, hereafter the term update includes a file update, a complete file, or any other data requested to be stored on a storage device.) The storage controller then uses a destage process to copy the update from the cache to the target storage device. A fast write process is more efficient than maintaining a connection to a host while the update is written to the target storage device.
A copy of the update is stored in the write cache to ensure that the update is not lost if a power failure, system reboot, or other problem causes the contents of the cache to be lost prior to destaging the update to the target storage controller. After the update is destaged to the storage device, the location of the update in the cache and the write cache may be allocated for another use. Typically the write cache has a substantially smaller storage space than the cache available for write operations.
Without any limitations on how much write cache could be used in a storage request, a storage device may dominate usage of the write cache to the detriment of other hosts requesting storage of an update. For example, where there are multiple storage devices connected to the storage controller, such as a RAID array, the cache and write cache may store updates intended for the multiple storage devices. In the event that the write cache is substantially filled with updates for one target storage device and that target storage device fails, then complete status cannot be returned for writes directed toward the surviving storage devices if the write cache is filled with the data from the failed storage device. Complete status is not returned unless the update is copied to both cache and the write cache.
In another example, a storage device may dominate the write cache if the storage device processes storage requests at a slow rate. Other processes submitting storage requests may be delayed to the extent that updates are destaged to the slower, dominating storage device. Destage operations to a dominating storage device may be running at a slow rate if one or more disk drives in the storage device are being rebuilt as a result of a failed drive or if the updates to the dominating storage device in the write cache comprise mostly random (non-sequential) updates. Random updates may take longer to destage because they have longer disk access times and, in a RAID environment, require constant parity recalculations for each random update. One or more storage devices dominating the write cache and causing a delay in processing other storage requests may be termed write starvation.
An improvement was presented in the referenced patent to Ash (hereinafter “Ash”). In Ash, the storage devices accessible to a storage controller are allotted a maximum percentage of write cache that may be used by a storage device. First, the number of storage devices accessible to a storage controller is determined. A storage device write cache limit (NVS threshold) is then assigned to each rank. A rank may comprise a single storage device or a group of storage device, for example in a RAID array or JBOD. Each rank may then be assigned a uniform storage device write cache limit or each rank may be assigned a different storage device write cache limit.
For efficiency, the amount of write cache available to each rank may total more than 100% of the available write cache. Typically, if there are four or more ranks, the storage device write cache limit for each rank is 25%. Limiting the availability of write cache for a rank helps to solve the problem of a storage device or rank dominating the write cache and causing write starvation. However, the introduction of high capacity, low cost nearline storage devices presents an additional challenge in preventing write starvation of write cache. Nearline storage devices are a compromise between online storage devices and offline storage devices. Online storage devices may be characterized as having constant, very rapid access to data. Offline storage devices are characterized by infrequent access for backup purposes or long term storage.
Nearline storage devices, such as fiber channel ATA (Advanced Technology Attachment) drives or serial ATA drives, are attractive due to their low cost per byte. However, nearline storage devices have a different reliability characteristic than online, server-class storage devices which expose the nearline storage devices to failures when a server-class storage device workload is applied. Nearline storage devices compensate for their limitations by adjusting their operating behavior based on workload. To limit stress on mechanical parts and prevent subsequent failures of nearline storage devices, vendors have implemented methods to throttle the device activity. Throttling of device activity limits mechanical stress, but degrades time response characteristics and performance.
The introduction of nearline storage devices into the partitioned cache system described above creates a situation where write starvation may occur. If multiple nearline storage devices are accessible to a storage controller, while each may be limited by an allotted storage device write cache limit, the combination of nearline storage devices may each use their allotted write cache and cause write lockup. For example, four or more nearline storage devices may be accessible to a storage controller and have a storage device write cache limit of 25%. As the workload increases on the nearline devices, the nearline storage devices may start to throttle back. As the nearline storage devices throttle back, destaging to the nearline storage devices takes more time than optimum operation and each nearline storage device under load may then take up more write cache. Each of four nearline storage devices may take up to 25% of the write cache and dominate the entire write cache.
From the foregoing discussion, it should be apparent that a need exists for an apparatus, system, and method that prevent write starvation for a storage controller with access to nearline, low performance storage devices. Beneficially, such an apparatus, system, and method would limit the amount of write cache available to nearline, low performance storage devices accessible to a storage controller.