Today's computer networks include vast amounts of storage, require high data throughput, and demand high data availability. Many networks support hundreds or even thousands of users connected to them. Many networks store extremely valuable data, such as bank account information, personal medical information, databases whose unavailability equates to huge sums of lost revenue due to inability to sell a product or provide a service, and scientific data gathered over large amounts of time and at great expense.
A typical computer network includes one or more computers connected to one or more storage devices, such as disk drives or tape drives, by one or more storage controllers. One technique for providing higher data availability in computer networks is to include redundant components in the network. Providing redundant components means providing two or more of the components such that if one of the components fails, one of the other redundant components continues to perform the function of the failed component. In many cases, the failed component can be quickly replaced to restore the system to its original data availability level.
A popular example of providing redundant components in a system is the notion of a redundant array of inexpensive disks (RAID). With a RAID, data is written to the plurality of disk drives in such a manner that if one of the disk drives fails, the data may be recovered from the remaining disk drives. In the simplest RAID configuration, commonly referred to as RAID level 1, all data is written to two disk drives which are maintained as a mirrored pair. If one of the mirrored drives fails, the desired data may be read from the remaining disk in the mirrored pair.
Another example of providing redundancy is within a storage controller in a computer network. High performance storage controllers typically include relatively large memories for buffering data transferred between the host computers and the storage devices. In particular, when a host computer writes data to a storage device via the storage controller, the storage controller receives the data from the host computer, writes the data into the storage controller memory, and informs the host computer that the data has been successfully transferred. Subsequently, the storage controller writes the data from its memory into the storage device. Buffering the data in this manner provides at least two advantages. First, the buffering serves to alleviate bottlenecks that might arise from transfer speed mismatches between the host/storage controller interface and the storage controller/storage device interface. Second, the buffered data may be cached, such that when a host subsequently reads the data, the storage controller can simply provide the data from its cache memory rather than having to first read the data from the storage device.
A potential problem with the buffered data approach described above is that if the storage controller memory fails, the data is lost forever. To alleviate this problem, a conventional approach is to provide two or more redundant memory subsystems. In the conventional redundant storage controller memory approach, the data is received from the host and written to the first redundant memory subsystem, and subsequently copied by the first memory subsystem to the second memory subsystem. By this approach, if the first memory subsystem fails, the second memory subsystem continues operation and writes the data to the storage device.
Unfortunately, there appears to be a paradigm in mass storage design such that performance and data availability are two opposing goals. Redundancy seems to imply lower performance. In the redundant storage controller example above, the redundant write is lower performing than a non-redundant write in at least two ways. First, the initial write of the data and the copy of the data are serialized, which means the redundant write takes approximately twice as long to perform as a non-redundant write. Second, the redundant write consumes considerably more resource bandwidth than a non-redundant write. In particular, the fact that the first memory is both written and read by a redundant write consumes twice the memory bandwidth of a non-redundant write. Additionally, the copy of the data from the first to the second memory subsystem consumes additional bandwidth on the bus connecting the two memory subsystems.
Therefore what is needed is an apparatus and method for providing higher performance redundant writes to redundant memory subsystems in storage controllers.