Redundant Array of Inexpensive Disk (RAID) systems have become the predominant form of mass storage systems in most computer systems today that are used in applications that require high performance, large amounts of storage, and/or high data availability, such as transaction processing, banking, medical applications, database servers, internet servers, mail servers, scientific computing, and a host of other applications. A RAID controller controls a group of multiple physical disk drives in such a manner as to present a single logical disk drive (or multiple logical disk drives) to a computer operating system. RAID controllers employ the techniques of data striping and data redundancy to increase performance and data availability.
One technique for providing high data availability in RAID systems is to include redundant fault-tolerant RAID controllers in the system. Providing redundant fault-tolerant RAID controllers means providing two or more controllers such that if one of the controllers fails, one of the other redundant controllers continues to perform the function of the failed controller. For example, some RAID controllers include redundant hot-pluggable field replaceable units (FRUs) such that when a controller fails, an FRU can be quickly replaced in many cases to restore the system to its original data availability level.
An important characteristic of RAID controllers, particularly in certain applications such as transaction processing or real-time data capture of large data streams, is to provide fast write performance. In particular, the overall performance of the computer system may be greatly improved if the write latency of the RAID controller is relatively small. The write latency is the time the RAID controller takes to complete a write request from the computer system.
Many RAID controllers include a relatively large cache memory for caching user data from the disk drives. Caching the data enables the RAID controller to quickly return data to the computer system if the requested data is in the cache memory since the RAID controller does not have to perform the lengthy operation of reading the data from the disk drives. The cache memory may also be employed to reduce write request latency by enabling what is commonly referred to as posted-write operations, or write-caching operations. In a posted-write operation, the RAID controller receives the data specified by the computer system from the computer system into the RAID controller's cache memory and then immediately notifies the computer system that the write request is complete, even though the RAID controller has not yet written the data to the disk drives. Posted-writes are particularly useful in RAID controllers, since in some redundant RAID levels a read-modify-write operation to the disk drives must be performed in order to accomplish the system write request. That is, not only must the specified system data be written to the disk drives, but some of the disk drives may also have to be read before the user data and redundant data can be written to the disks, which, without the benefit of posted-writes, may make the write latency of a RAID controller even longer than a non-RAID controller.
However, posted-write operations make the system vulnerable to data loss in the event of a failure of the RAID controller before the user data has been written to the disk drives. To reduce the likelihood of data loss in the event of a write-caching RAID controller failure in a redundant RAID controller system, the user data is written to both of the RAID controllers so that if one controller fails, the other controller can flush the posted-write data to the disks. Writing the user data to the write cache of both RAID controllers is commonly referred to as a mirrored write operation. If write-posting is enabled, then the operation is a mirrored posted-write operation.
Mirrored posted-write operations require communication between the two controllers to provide synchronization between the write caches of the two controllers to insure the correct user data is written to the disk drives. This cache synchronization communication may be inefficient. In particular, the communication may introduce additional latencies into the mirrored posted-write operation and may consume precious processing bandwidth of the CPUs on the RAID controllers. Therefore what is needed is a more efficient means for performing mirrored posted-write operations in redundant RAID controller systems.