1. Field of the Invention
This invention relates in general to data storage systems, and more particularly to an apparatus for reducing the overhead of cache coherency processing on each primary controller and increasing the overall throughput of the system.
2. Description of Related Art
Disk drive systems have grown enormously in both size and sophistication in recent years. These systems can typically include many large disk drive units controlled by a complex multi-tasking disk drive controller. A large-scale disk drive system can typically receive commands from a number of host computers and can control a large number of disk drive mass storage units, each mass storage unit capable of storing in excess of several gigabytes of data. There is every reason to expect that both the sophistication and size of the disk drive systems will increase.
As the systems grow in complexity, so also does the user's reliance upon the system, for fast and reliable recovery and storage of data. Thus, it is more than a mere inconvenience to the user should the disk drive system go “down” or off-line; and even should only one disk drive go offline, substantial interruption to the operation of the entire system can occur. For example, a disk drive storage unit may be part of RAID array or may be part of a mirrored system.
As computer systems have become larger, faster, and more reliable, there has been a corresponding increase in need for storage capacity, speed and reliability of the storage devices. Simply adding storage units to increase storage capacity causes a corresponding increase in the probability that any one unit will fail. On the other hand, increasing the size of existing units, absent any other improvements, tends to reduce speed and does nothing to improve reliability.
Recently there has been considerable interest in arrays of direct access storage devices, configured to provide some level of data redundancy. Such arrays are commonly known as “RAIDs” (Redundant Array of Inexpensive Disks). RAID storage systems are commonly used in high-profile industries, such as the banking and airline industries, where the inability to access certain data for even a moment, let alone its loss, can spell disaster. RAID storage systems are often referred to as “fault-tolerant” due to their ability to access data even when one or more storage devices fail. RAID storage systems accomplish this by distributing redundant copies of data across multiple storage devices. RAID technology is independent of the type of storage device used, and thus may be applied to systems which use magnetic, optical, or semiconductor disk drives, or large capacity tape drives, or a mix of different type storage devices.
Several RAID architectures exist for providing redundant access of data. The particular RAID architecture used mandates both the format of the data across the multiple storage devices and the way in which the redundant data is accessed. RAID architectures are categorized in levels ranging from 1–5 according to the architecture of the storage format.
In a level 1 RAID storage system, a duplicate set of data is stored on pairs of “mirrored” storage devices. Accordingly, identical copies of data are stored to each storage device in each pair of mirrored storage devices. The RAID 1 level storage system provides absolute redundancy and therefore high reliability, but it requires twice the storage space. This method is therefore costly and space-consuming.
In a level 2 RAID storage system, each bit of each word or data, plus Error Detection and Correction (EDC) bits for each word, are stored on separate storage devices. Thus, in a 32-bit word architecture having 7 EDC bits, 39 separate storage devices are required to provide the redundancy. In this example, if one of the storage devices fails, the remaining 38 bits of each stored 39-bit word can be used to reconstruct each 32-bit word on a word-by-word basis as each data word is read from the storage devices, thereby obtaining fault tolerance. Although the redundancy is achieved not by duplicating the data but by reconstructing the accessible data, and therefore less actual storage space is required to achieve redundancy, the level 2 RAID storage system has the disadvantage that it requires one storage device for each bit of data and EDC, which can amount to a very large and costly system.
In a level 3 RAID storage system, each storage device itself includes error detection means. This is often achieved using a custom-designed Application Specific Integrated Circuit (ASIC) within the storage device itself that is designed to provide built-in hardware error detection and correction capabilities. Level 3 RAID systems accordingly do not need the more sophisticated multiple EDC bits, which allows a simpler exclusive-or parity checking scheme requiring only one bit to be used to generate parity information. Level 3 RAID storage systems thus only require one storage device to store parity information, which, in combination with each of the data bit storage devices, may be used to recover the accessible bits and reconstruct inaccessible data.
In the level 2 and 3 RAID storage systems, each bit of the data and parity is transferred to and from each respective distributed storage device in unison. In other words, this arrangement effectively provides only a single read/write head actuator for the entire storage device. For large files, this arrangement has a high data transfer bandwidth since each individual storage device actuator transfers part of a block of data, which allows an entire block to be accessed much faster than if a single storage device actuator were accessing the block. However, when the data files to be accessed are small, the random access performance of the drive array is adversely affected since only one data file at a time can be accessed by the “single” actuator.
A level 4 RAID storage system employs the same parity error correction scheme as the level 3 RAID architecture, but essentially decouples the individual storage device actuators to improve on the performance of small file access by reading and writing a larger minimum amount of data, such as a disk sector rather than a single bit, to each disk. This is also known as block striping. In the level 4 RAID architecture, however, writing a data block on any of the independently operating storage devices also requires writing a new parity block on the parity unit. The parity information stored on the parity unit must be read and XOR'd with the old data (to “remove” the information content of the old data), and the resulting sum must then be XOR'd with the new data (to “add” the new parity information). Both the data and the parity records must then be rewritten to the disk drives. This process is commonly referred to as a “Read-Modify-Write” (RMW) operation. Thus, a READ and a WRITE on the single parity storage device occurs each time a record is changed on any of the storage devices covered by a parity record on the parity storage device. The parity storage device becomes a bottleneck to data writing operations since the number of changes to records which can be made per unit of time is a function of the access rate of the parity storage device, as opposed to the faster access rate provided by parallel operation of the multiple storage devices.
A level 5 RAID storage system is similar to the level 4 RAID architecture in its parity error correction scheme and in its decoupling of the individual storage device actuators, but improves upon the performance of WRITE accesses by distributing the data and parity information over all of the available storage devices in a circular fashion. Accordingly, the number of WRITE operations which can be made per unit of time is no longer a function of the access rate of a single parity storage device because the parity information is distributed across all the storage devices. Typically, “N+1” storage devices in a set, or “redundancy group”, are divided into a plurality of equally sized address areas referred to as blocks. Each storage device generally contains the same number of blocks. Blocks from each storage device in a redundancy group having the same unit address ranges are referred to as “stripes”. Each stripe has N blocks of data, plus one parity block on one storage device containing parity for the N data blocks of the stripe. Further stripes each have a parity block, the parity blocks being distributed on different storage devices. Parity updating activity associated with every modification of data in a redundancy group is therefore distributed over the different storage devices. No single storage device is burdened with all of the parity update activity, and thus the parity storage device access bottleneck is diffused. For example, in a level 5 RAID system comprising five storage devices, the parity information for the first stripe of blocks may be written to the fifth drive; the parity information for the second stripe may be written to the fourth drive; the parity information for the third strip may be written to the third drive, and so on. The parity block for succeeding stripes typically circles around the storage devices in a helical pattern.
The RAID storage systems described above all handle the problem of providing access to redundant data if one or more storage devices fail. However, early RAID storage systems provided only one storage device array controller. In such a system, if the controller fails, data is inaccessible regardless of the RAID architecture level, so storage of redundant data is rendered moot.
Increasingly, there is a need to provide access to stored information or data on hard disk drives (or other storage devices) from a plurality of host servers and to also permit the data stored on any particular storage device to be accessed through alternative device controllers. Providing access to the data from multiple hosts would eliminate the need to store the data at more than one location (though the data may still be redundantly stored using known mirroring or Redundant Array of Independent Disk (RAID) techniques) and in theory assures that the identical data can be accessed by interested parties. Providing accesses to a storage device through multiple controllers would provide redundant access to the device from an alternate (or second) controller so that the data remains accessible in the event that the first controller fails.
A storage controller is a device which is capable of directing and data traffic from the host system to one or more non-volatile storage devices. It may or may not have an intermediary cache to stage data between the non-volatile storage device and the host system. A caching controller (or caching storage controller) is a device which is capable of directing the data traffic from a host system to one or more non-volatile storage devices which uses an intermediary data storage device (the cache memory) to stage data between the non-volatile storage device and the host system. In general, the intermediary storage device is built out of RAM to allow a quicker access time to the data. Furthermore, it provides a buffer in which to allow exclusive-or (XOR) operations to be completed for RAID 5 operations. Multiple active controllers are defined as a collection of storage controllers or caching storage controllers which work in a cooperative manner with each other. They provide the ability for recovering from a controller failure by allowing multiple paths to a storage volume.
The storage volume is a contiguous range of randomly accessible sector of data. For practical purposes, the sector numbering starts at 0 and goes to N, where N+1 is the total number of sectors available to the host system. A data extent is a range of data within a storage volume delineated by a starting sector and an ending sector. The storage volume is broken up into a number of data extents which are not required to be of the equivalent sizes, but may not overlap. These concepts are used in the discussion of the background and the detailed description of embodiments of the invention, and apply to both.
Caching storage controllers that work independently of one another to store information or data to a secondary storage unit, such as a hard disk drive, or tape unit, are conventionally available. There are also caching storage controllers that work with one or more other controller(s) to provide multiple controller access to a secondary storage unit and provide a fault tolerant environment. If two controllers are simultaneously providing access to a common set of storage devices and each is able to take over the other's functionality in the event of a failure, then those controllers are referred to as active-active or dual-active controllers.
Traditionally, RAID Storage subsystems employ either internal or external controllers. Typical designs of External Dual Active RAID controllers allow the controllers to share one or more buses on the backend or disk side of the controller. However, the cache coherency processing on each primary controller is too high and the throughput is too low.
It can be seen then that there is a need for a controller and controller system for reducing the overhead of cache coherency processing on each primary controller and increasing the overall throughput of the system.