1. Field of the Invention
This invention relates in general to storage systems, and more particularly to a method and apparatus for disk caching for an intermediary controller.
2. Description of Related Art
Computing systems frequently are provided with storage subsystems having multiple storage devices connected to the computing system central processor through a device controller. For example, some computing systems include a plurality of disks arranged into a disk array with parity and sparing. Parity refers to organizing data into parity groups such that each modification of disk data that involves a relatively small write operation requires a read old data, read old parity, write new data, write new parity sequence of operations often referred to as a read-modified-write sequence. Sparing refers to providing spare data blocks to be used in the event of a disk failure.
A disk array controller is provided between the disk array and the computing system central processor unit (CPU) and includes a nonvolatile cache memory. A cache memory provides a fast, limited-size temporary storage for data and can reduce the number of times a disk must be accessed to retrieve a requested data block. As applications running in the central processor unit requests blocks of data from the disk array, the disk array controller checks a cache directory to determine if a copy of the requested data block is in the cache memory of the controller. If the disk array controller determines that the cache memory contains the most recent copy of the data block, referred to as a cache hit, then the controller provides the data block to a requesting application from the cache memory rather than from the particular disk where the data block is located.
If the most recent copy of the data block is not in the cache memory, referred to as a cache miss, then the disk array controller consults the cache memory directory to find a cache memory location containing a block that can be replaced, or overwritten, because the data in that location also resides on a disk. The controller reads a copy of the requested data block from the disk and puts it in the cache memory location to be overwritten. Lastly, the controller updates the cache directory to indicate that the old data block is no longer in the cache memory location and that the new data block has taken its place. Once the new data block is in the cache memory, it can be modified and updated.
Disk arrays with cache memory are desirable because they increase efficiency of expensive computing systems by reducing the number of times data blocks must be accessed from a disk. Accesses of data from a disk are typically slower than accesses of data from a cache memory. Therefore, getting data from a cache memory permits a computing system to carry out processing faster than is possible when getting the data from disk. This increased deficiency reduced the cost of operations.
However, new disk controllers are being developed wherein the new controller is inserted between a host system and a legacy disk controller to allow new host types to use legacy disk controllers and storage devices. Thus, these new controllers act as an intermediary between hosts and controller units. These intermediary controllers also provide additional caching.
Nevertheless, intermediary controllers present an interesting multi-level cache scenario. Data may be cached in the intermediate controller, in a hard disk of the intermediary controller, or in a disk controller cache. Access to the intermediary controller hard disk takes longer than a hit in the disk controller cache, yet access to the intermediary controller hard disk is still much faster than a read from a disk of the disk controller. The intermediary controller hard disk is also much larger than the disk controller cache.
Accordingly, the access to the intermediary controller hard disk is more costly than a level lower than it. Thus, the performance appears as if two levels have been swapped in a multilevel cache, i.e., the slower but larger one is higher in the hierarchy.
As a result, the best use of the intermediary controller cache must be addressed. One view is to always use the intermediary controller hard disk and process hits from there instead of going to the disk controller. This would mean some hits could have been faster by being in the disk controller cache. However, this would also avoid the slower legacy disk accesses. Hence it would work to smooth overall performance with a possible penalty in some scenarios. The opposite approach is to not process hits via the intermediary controller hard disk at all and rely on the disk controller cache entirely. Obviously scenarios can be found for which one of the above policies is best, but predicting work loads and configurations is difficult.
It can be seen then that there is a need for a method and apparatus for disk caching in an intermediary controller that will utilize the hardware best for varying scenarios.
It can also be seen that there is a need for a method and apparatus for disk caching in an intermediary controller that is based on system performance.