1. Field of the Invention
The present invention relates in general to computers, and more particularly to system, and computer program product embodiments for improving reliability in a computer storage environment.
The present invention relates in general to computers, and more particularly to method, system, and computer program product embodiments for improving reliability in a computer storage environment.
2. Description of the Related Art
Storage area networks, or SANs, consist of multiple storage devices connected by one or more fabrics. Storage devices can be of two types: host systems that access data and storage subsystems that are providers of data. In a large distributed computer system, a plurality of host systems are typically connected to a number of direct access storage devices (DASDs) making up the storage subsystems. A storage controller controls read and write operations between host computers of the host systems and the DASDs. The DASDs are comprised of hard disk drives (HDDs) and may be organized in a redundant array of independent disks, i.e., a RAID array. A RAID array is comprised of multiple, independent disks organized into a large, high-performance logical disk. A controller stripes data across the multiple disks in the array and accesses the disks in parallel to achieve higher data transfer rates.
To reduce the risk of system failure due to failure of a hard disk drive in a DASD system such as a RAID array, redundancy in the form of error-correcting codes to tolerate disk failures is typically employed. Further, to reduce a risk of failure at a point within the storage controller, the storage controller is typically designed to handle hardware failure. For example, the storage controller can have two storage clusters, each of which provides for selective connection between a host computer and a DASD. Each cluster has a cache and a non volatile storage unit (NVS). The cache buffers frequently used data. When a request is made to write data to a DASD attached to the storage controller, the storage controller may cache the data and delay writing the data to a DASD. Caching data can save time as writing operations involve time consuming mechanical operations. The cache and NVS in each cluster can intercommunicate, allowing for recovery and reconfiguration of the storage controller in the event that one of the memory elements is rendered unavailable. For instance, if one cluster and its cache fail, the NVS in the other cluster maintains a back-up of the cache in the failed cluster.
Other storage controllers include multiple storage clusters or have an “n-way” architecture. In such configurations, if one cluster and its cache fail, the NVS in the other clusters maintains a back-up of the cache in the failed cluster.