Redundant Array of Inexpensive Disk (RAID) systems have become the predominant form of mass storage systems in most computer systems today that are used in applications that require high performance, large amounts of storage, and/or high data availability, such as transaction processing, banking, medical applications, database servers, internet servers, mail servers, scientific computing, and a host of other applications. A RAID controller controls a group of multiple physical disk drives in such a manner as to present a single logical disk drive (or multiple logical disk drives) to a computer operating system. RAID controllers employ the techniques of data striping and data redundancy to increase performance and data availability.
An important characteristic of RAID controllers, particularly in certain applications such as transaction processing or real-time data capture of large data streams, is to provide fast write performance. In particular, the overall performance of the computer system may be greatly improved if the write latency of the RAID controller is relatively small. The write latency is the time the RAID controller takes to complete a write request from the computer system.
Many RAID controllers include a relatively large cache memory for caching user data from the disk drives. Caching the data enables the RAID controller to quickly return data to the computer system if the requested data is in the cache memory since the RAID controller does not have to perform the lengthy operation of reading the data from the disk drives. The cache memory may also be employed to reduce write request latency by enabling what is commonly referred to as posted-write operations. In a posted-write operation, the RAID controller reads the data specified by the computer system from the computer system into the RAID controller's cache memory and then immediately notifies the computer system that the write request is complete, even though the RAID controller has not yet written the data to the disk drives. Posted-writes are particularly useful in RAID controllers, since in some redundant RAID levels a read-modify-write operation to the disk drives must be performed in order to accomplish the system write request. That is, not only must the specified system data be written to the disk drives, but some of the disk drives may also have to be read before the user data and redundant data can be written to the disks, which, without the benefit of posted-writes, may make the write latency of a RAID controller even longer than a non-RAID controller.
However, posted-write operations make the system vulnerable to data loss in the event of a power failure. This is because the cache memory is a volatile memory that loses the user data when power is lost and the data has not yet been written to the disk drives.
To solve this problem, some RAID controllers include a battery to continue to provide power to the cache memory in the event of a loss of main power. Although the battery greatly reduces the likelihood that user data will be lost, because the energy stored in the battery is finite, the possibility still exists that the battery energy will run out before main power can be restored, in which case the user data will be lost. The minimum length of time the battery must supply power to the cache memory varies among users of RAID systems; however, many consumers require at least 72 hours in the event a power failure occurs on a weekend.
However, there are some well-known limitations associated with the use of batteries in this application. First, batteries are a relatively expensive component of the RAID controller. Second, for many of the relevant battery technologies the ability of the battery to hold a charge begins to degrade within two or three years, which is typically less than the expected lifetime of the RAID controller. Consequently, the RAID controller must be designed with the battery as a field-replaceable unit, and in many cases, as a hot-pluggable field-replaceable unit. This adds further cost to the RAID controller. Third, the operating temperature range of batteries outside of which their lifetime and performance degrade significantly is relatively small. Fourth, after the battery has been drained due to a main power outage, the RAID controller must operate in lower performance write-through cache mode until the battery is re-charged, and the re-charge time of batteries is relatively long. Fifth, as the size of cache memories increases, so does the amount of energy the battery must provide during the main power outage; given contemporary battery energy densities, the size of the battery required to provide the required amount of energy may exceed the available space within the RAID controller.
To solve this problem, U.S. patent application Ser. No. 11/226,825, filed Sep. 14, 2005 describes a storage controller that includes a capacitor pack or battery, and a non-volatile memory, such as a FLASH memory. When main power is lost, the capacitor pack or battery supplies power from its stored energy for the controller to backup, or flush, the write cache data to the non-volatile memory. Thus, advantageously, even if the capacitor pack or battery is drained and no longer able to supply power before main power is restored, the write cache data is retained in the non-volatile memory so that when main power is restored and the controller is rebooted, the write cache data is restored to the write cache and subsequently flushed to the disk drives.
Whether using a battery or a capacitor pack as the rechargeable energy source to supply backup power, it is important to monitor the energy source to insure that the energy source continues to have the capacity to store enough energy to perform the backup operation; otherwise, write cache data may be lost. When the energy source no longer has the capacity to store enough energy to perform its intended function, such as to supply power to perform a backup operation, it is said to have reached its end of life, or its lifetime has expired. If the energy source is a battery, monitoring the lifetime of the battery is relatively simple, since the typical lifetime of a battery is relatively constant for a given battery technology. For example, the lifetime of a Lithium-ion battery commonly used for applications such as a write-caching storage controller is approximately 3 years. Consequently, the remaining lifetime of a battery can be monitored simply by keeping the actual real time, or calendar time, the battery is in existence, such as via a real-time clock circuit.
In contrast to a battery, the lifetime of a capacitor is largely a non-linear function of its temperature, operating voltage, polarity changes, and excessive current draw, and the lifetime may vary widely based on these factors. For example, in a given application at a given operating voltage, a capacitor may have a lifetime as large as one million hours at an operating temperature of 10 degrees Celsius, whereas the same capacitor may have a lifetime as small as one thousand hours at an operating temperature of 80 degrees Celsius. Similarly, at a given temperature, a capacitor may have a lifetime at an operating voltage of 1.8 Volts that is almost three times its lifetime at an operating voltage of 2.5 Volts. Therefore, the simple real-time clock technique used to monitor battery lifetime is inadequate for capacitors in many applications because the variability in capacitor lifetime may pose an unacceptable risk of data loss for write-caching storage controllers.
The recommended method by capacitor manufacturers for measuring the capacitance of a capacitor (which is effectively a measure of its lifetime because its capacitance determines the amount of energy it can store) is to discharge and then recharge the capacitor, measure the current draw and time required to recharge, and calculate the capacitance from the measured values. However, this method is undesirable for write-caching storage controller applications, since it would require the write cache to be placed into write-through mode during the discharging/recharging process in order to avoid the potential loss of write cache data due to the inability to perform the backup operation in the event of a main power loss.
Therefore, what is needed is an alternate method for determining the lifetime of a capacitor pack other than monitoring the capacitor pack's real time existence or measuring its capacitance by discharging and recharging it.
Furthermore, unlike a battery, the capacitor pack may not be field-replaceable, and the storage controller manufacturer may warrant a lifetime of the capacitor pack to the consumer, or user, of the storage controller. Therefore, given the large variability of a capacitor pack lifetime, what is needed is a way to increase the likelihood that the capacitor pack reaches the lifetime that the storage controller manufacturer warranted to the user.