The ever-increasing use of modern computer systems has created an increased demand to store and utilize large masses of data. Mass data storage is economically achieved by using an array of relatively inexpensive or independent disk drives. Such “RAID” (redundant array of independent disks) systems also offer considerable benefits in data reliability because of their well-known ability to reconstruct lost or corrupted data. The typical RAID mass storage system requires a computer to control the array of disk drives as well as to control the transfer of data when performing read and write operations on the disk drives.
To store data reliably, the data should be saved completely on the RAID disk drives before a write transaction is acknowledged. If the computer ceases functioning (“crashes”) as a result of a software hang-up or a hardware failure, or as a result of a power loss or reduction, or if the bus which transfers the data ceases to function because of a bus malfunction, before the data is saved completely to the disk drive, the data is lost. In general, the redundancy and reliability of data storage on a RAID storage system are available only after the data has been completely stored on the disk drives. Thereafter, failures or malfunctions of the disk drives will usually allow the data to be reconstructed.
Disk drives require a relatively long amount of time (“high latency”) to perform a read or write transaction, because mechanical components must be moved to perform the operation. Because of the relatively high latency, the speed and performance of the computer system will be adversely affected if the storage transactions are performed directly on the disk drives. The higher latency of the disk drives limits mass storage performance because considerable time is unused while waiting for the disk drives to perform the data transactions.
To avoid the latency problem of disk drives, it is typical to use an intermediate, high performance (“low latency”) solid-state memory upon which to perform the read and write data transactions. The data is transferred rapidly to the low latency intermediate memory, and then, in a separate transaction which does not adversely affect the processing performance, data is transferred from the intermediate memory to the disk drives. In this manner, the normal processing performance of the computer system is not diminished by mass storage transactions. However, should a crash, a power loss or a bus malfunction occur while the data is present in the intermediate memory and before the data has been completely transferred to the disk drives, that data will be lost and reconstruction of the data becomes impossible.
Intermediate solid-state memories have been made nonvolatile to maintain data in the event of power loss. Usually battery backup power is supplied to achieve such nonvolatility. Although certain types of solid-state memories have inherent nonvolatile characteristics, the semiconductor materials used in those types of memories to obtain inherent nonvolatile characteristics require greater time to write the data. Consequently, inherently nonvolatile memories generally have a relatively high, and therefore unacceptable, latency when performing read and write transactions. It is for this reason and others that inherent nonvolatile solid-state memories are not usually considered acceptable as intermediate nonvolatile memories in a mass storage computer system. Instead, low latency dynamic random access memories (DRAMs) with a battery backup are typically used as nonvolatile intermediate memories in a mass storage system.
In addition to intermediate memories in RAID mass storage systems, volatile DRAM is also widely used for storage during computations performed by a central processing unit or processor. Very low latency, high-performance memory is particularly important in computational situations because of the very high speed of modern processors. If memory read and write transactions are not performed as quickly as the processor executes instructions, the computational power of the computer system is diminished.
The typical DRAM requires refresh signals to be applied to it periodically in order to maintain the data written to it. The refresh signals are typically generated by circuitry external to the DRAM. So long as adequate power is available the external circuitry will continue to refresh the DRAM and therefore maintain the data within it. However, should a power interruption or reduction occur, it is necessary to place the DRAM into a self-refresh state and apply the battery power to the DRAM. In the self-refresh state, the DRAM automatically generates its own refresh signals, and the power to do so comes from the battery. It is essential that the DRAM be placed into the self-refresh state and the battery power be applied to it in order to assure non volatility of the data.
A similar situation exists with respect to a hardware or software crash or malfunction. In these cases, the usual technique for recovering from such malfunction condition is to power cycle the computer system by manually powering-down or terminating the application of power and then reapplying the power. If the memory is not placed into the self-refresh state and the battery power applied to it, powering-down the computer system when executing a power cycle will result in the loss of data within the volatile memory.
A hardware or software crash or malfunction can occur in a computer system from a variety of different causes. Some of the more common malfunction conditions are the unintentional loss of the main power to the computer system. Unintentional main power loss may occur while the computer systems operating normally and the applied AC mains power is interrupted or diminished because of a power distribution problem, a tripped circuit breaker, or a power cord is unintentionally disconnected, for example. Another malfunction condition may be caused by a sag in an internal voltage level within the computer system itself, such as the voltage which powers the logic circuits of a computer. A sag in the internal voltage may occur as a result of an AC power interruption or a malfunction of an internal power supply within the computer. Failing to adequately power the internal circuits within a computer can also result in a malfunction leading to a data loss. The loss of a bus clock signal on an internal peripheral expansion bus of the computer system is another malfunction condition. At the beginning of a normal power-down sequence, the typical computer system will maintain a bus clock signal on its peripheral expansion bus for sufficient amount of time to allow the components connected to the expansion bus to complete operations. However in some circumstances, the bus clock signal may cease before the memories are placed into the safe state. Powering-down or resetting the computer system is typically signaled by the assertion of a bus reset signal on the peripheral expansion bus. In response to the bus reset signal, the memory should place itself into the safe state. A hang up in the execution of the instructions by the processor of the computer system is also a malfunction condition. The typical way to avoid a software hang up is to reset or power-cycle the computer system, although some processors include watchdog circuits which will signal the event of a software hang up and attempt to clear the hang up condition before resetting the entire system. Lastly, some test and engineering equipment that may be connected to the computer system for diagnostic reasons may cause resetting at any time without warning. Malfunction conditions can result from other causes as well, but the vast majority of the malfunction conditions are characterized by or result from those situations described above.