Mass storage technology including RAID (redundant arrays of independent disks) are commonly used for computer systems, especially in the area of high volume transaction processing. Such mass storage systems typically employ a large number of disk drives which are accessible by multiple host processors, either locally or over networks (often at great geographical distances).
Although disks provide large storage capacity at low cost, the access speed for reading and writing can be negatively affected resulting in slowed performance. The read and write head of a disk is a mechanically controlled device, and it must be moved to the correct location on the disk cylinder surface in order to access the appropriate data. Other factors, including head settling time, and latency due to multiple accesses can increase the access time. Since reading and writing from disks is still a relatively slow operation, most disk data storage systems use a cache memory to speed up the transfer of information to/from the disks. The cache memory is a high-speed memory that can store, at any given time, a portion of the data stored in the main memory of a host system for transfer to/from data storage systems. The host processors need to efficiently interact with the high-speed memory (cache) on all transfers to/from storage.
Along with the complicated process of selecting information from the main memory to put in the cache, the system must maintain a mapping of what data is in the cache, and how this data is addressed. Many mass storage systems provide complete cache support, including the mapping of cache to main memory. This allows the host processors to simply request and receive data from the mass storage systems, without concern for cache maintenance and updating. For a read operation by a host processor, the host processor simply requests the data by address. If the requested data is in the cache, the data is provided directly from cache to the host processor without the extra time needed to obtain the data from the slower disks. If the data is not in the cache, then the mass storage system will have to obtain the data from disk and supply the data to cache in due time.
If data is written by a host processor back into memory, the changed data is written to the cache, and then at a later time written back (often called "destaging") to the disks. The operation of writing changed data back to a disk is referred to as a write-back. To maintain the integrity of the data, all write-backs must be performed in an expeditious manner. If multiple hosts are accessing common data, the importance of properly handling write-backs increases. The complexity of the mapping and record keeping for write-backs also increases.
This problem is prevalent in all systems using independent access arrays, and becomes even more complicated if there are multiple redundant disks maintaining the same data, i.e. RAID systems using "shadowing", "mirroring", parity-based or other check data based on redundancy techniques A system using RAID provides a high level of protection from data loss because the data is redundantly stored on more than one disk, and therefore a single disk failure will not cause a permanent loss of data. Further, RAID data systems can provide extra speed by allowing the multiple disks (mirrors) to function independently from each other. For example, in RAID implementations with two disks maintaining identical copies of data, random-access reads can be performed on whichever disk volume (mirror) can access the requested data more quickly due to the proximity of the reading head to the proper track for the data. With write-backs, the data must be written to both disk volumes, however not necessarily at the same time. Data can be written back to one disk volume while the other disk volume is busy, wherein the other disk volume will be written to in due time. This mirrored data storage provides both safety in the redundancy of data, and faster access from increased availability of the multiple sources of the requested data.
Such high availability systems which are often accessed by many hosts and which provide data redundancy on several mirrors for a volume place a very high priority on information retrieving and writing. Any system or method for increasing the speed of access to the data elements is extremely valuable. The cache is accessed by many different processors and components, and undue delay in accessing the cache memory must be avoided. In typical systems, the number of parties trying to access cache memory is very high. As previously stated, the host processors are trying to access the cache memory to read out data and write back data. Further, the disks and mirrors are also contending to access the cache memory both to put data into the cache memory from disks and perform write-backs to the disks. In effect, gaining access to the cache memory becomes the new bottleneck, in place of the bottleneck of slow host memory accesses directly from disks.
Another problem with cache memory is the need to maintain the integrity of the data elements in cache memory which have changed and must be written back to the disks. A data table or structure must be maintained which indicates what data elements in cache memory have been changed and therefore need to be written back to one or more disk volumes. As host processors write data back into the cache which has changed, the data table must be marked to indicate which data elements must be written back to the disk volumes. At other times when the disk volumes are available to destage the write backs, the data table must be cleared to indicate that the data element on one or more disk volume(s) now corresponds to the data element in the cache memory. Therefore, the data table or structure is often kept in the same cache memory, in a special location allowing multiple entities to access and mark and change the data table. Data tables may be created for several different types of data elements including various forms of meta data or formatting information including, for example, information on the state of the track, the state of the logical volume, and the state of the data itself, as well as conventional I/O data elements. A write-back of meta data may be referred to as a "format-back".
As part of speeding up the process of reading and writing to disks, many mass storage systems depend on using proximity data to allow reading and writing of data at approximately similar locations on a disk. As previously described, disks employ a read and write head which tracks back and forth across the surface of a cylinder (or multiple cylinders). If data reads and writes are performed in a sequential order which minimizes movement of the read and write head, the system avoids long delays introduced by head seeks.
Therefore, many systems attempt to select and sort disk writes to minimize the movement of the read and write head The data table for indicating write-backs is often set up to allow a disk (disk volume) looking to do a write-back to search for other localized write-backs and thereby increase speed. One such data table set up is known as a "write tree", where the data is arranged in a tree-like structure which is sorted by a predetermined index system, for example by cylinder number. This cylinder number for example corresponds to a real or virtual cylinder number on one or more disk volumes. The disk volumes are able to search the write tree by cylinder numbers for proximate write-backs, and thereby minimize the head movement necessary to write data to the disk volumes.
However, searching a write tree in cache memory is a slow operation which requires multiple accesses to find the appropriate data. All these accesses to cache memory have the effect of reducing performance of cache memory while the accesses are being performed. Using redundant logical volumes increases this problem, since each disk volume separately and independently searches the same write tree. Further, cache systems have many states where particular write-backs can not be performed at specific times. Examples of these states include sequential accessing operations on the disk volumes, or locks on particular sections of the disk volumes to purposely delay writes to a later time. It often occurs that the write tree is searched for a write-back, but that particular write back is refused by the disk volume or some other portion of the mass storage system. Therefore, many disk volume accesses to the write tree are wasted, since the detected write-back can not be performed at that time, and the disk volume or disk controller then discards that write-back attempt and performs other operations. All this results in further degradation of system performance.