Redundant Array of Inexpensive Disk (RAID) systems have become the predominant form of mass storage systems in most computer systems today that are used in applications that require high performance, large amounts of storage, and/or high data availability, such as transaction processing, banking, medical applications, database servers, internet servers, mail servers, scientific computing, and a host of other applications. A RAID controller controls a group of multiple physical disk drives in such a manner as to present a single logical disk drive (or multiple logical disk drives) to a computer operating system. RAID controllers employ the techniques of data striping and data redundancy to increase performance and data availability.
An important characteristic of RAID controllers, particularly in certain applications such as transaction processing or real-time data capture of large data streams, is to provide fast write performance. In particular, the overall performance of the computer system may be greatly improved if the write latency of the RAID controller is relatively small. The write latency is the time the RAID controller takes to complete a write request from the computer system.
Many RAID controllers include a relatively large cache memory for caching user data from the disk drives. Caching the data enables the RAID controller to quickly return data to the computer system if the requested data is in the cache memory since the RAID controller does not have to perform the lengthy operation of reading the data from the disk drives. The cache memory may also be employed to reduce write request latency by enabling what is commonly referred to as posted-write operations. In a posted-write operation, the RAID controller reads the data specified by the computer system from the computer system into the RAID controller's cache memory and then immediately notifies the computer system that the write request is complete, even though the RAID controller has not yet written the data to the disk drives. Posted-writes are particularly useful in RAID controllers, since in some redundant RAID levels a read-modify-write operation to the disk drives must be performed in order to accomplish the system write request, i.e., not only must the specified system data be written to the disk drives, but some of the disk drives may also have to be read before the user data and redundant data can be written to the disks, which may make the write latency of a RAID controller even longer than a non-RAID controller.
However, posted-write operations make the system vulnerable to data loss in the event of a power failure. This is because the cache memory is a volatile memory that loses the user data when power is lost and the data has not yet been written to the disk drives.
To solve this problem, some RAID controllers include a battery to continue to provide power to the cache memory in the event of a loss of main power. Typically, the system automatically notifies a system administrator who attempts to restore power to the system. Although the battery greatly reduces the likelihood that user data will be lost, because the charge on the battery is finite, the possibility still exists that the battery power will run out before main power can be restored, in which case the user data will be lost. To avoid this possibility of user data loss, other RAID controllers include some form of non-volatile memory, such as a FLASH memory or small disk drive. When main power is lost, while the battery supplies power, the RAID controller copies the cache memory contents to the FLASH memory and then disables battery power. When main power is restored, the RAID controller restores the contents of the cache memory prior to the main power outage from the FLASH memory so that the posted-writes can be completed and the user data can be made available again.
However, the time required to restore the cache memory contents from the FLASH memory may be relatively lengthy, particularly, on the order of minutes. Assume, for example, a RAID controller that has 512 MB of cache memory and current FLASH memories that provide a sustained read rate of approximately 9 MB/second. In this example, the time required to restore the cache memory from FLASH memory is approximately one minute; that is, one minute more is required to boot the RAID controller after main power is restored. This is one minute more that the user data is not available to the host computer system, which in some user applications may translate to thousands of dollars of lost income. Furthermore, the additional time spent restoring the cache memory from FLASH may cause the predetermined timeout values of some server applications to be exceeded, thereby causing the application to fail. Finally, the restore time—and therefore user data unavailability time—is even greater for RAID controllers with larger cache memories than the example; and, the problem will be exacerbated even further as RAID controller cache memory sizes increase, which appears to be a definite trend.
Therefore, what is needed is an apparatus and method for reducing the data unavailability time after a loss of main power in a cached RAID controller with a non-volatile device for backing up the volatile cache memory.