1. Field of the Invention
The present invention relates, in general, to data storage networking technology, and more particularly, to a system and method for controlling failover and failback in a host-transparent fashion within a data storage system by utilizing rule-based firmware in host bus adapters linked to redundant storage controllers.
2. Relevant Background
Storage area networks (SANs) and other data storage systems are revolutionizing how businesses configure and manage data storage. A typical SAN combines server-to-storage architecture and software that allows storage media, such as disk arrays, to be shared by many servers, i.e., host devices, running various operating systems (OS) and software applications. Many of these businesses demand very high data storage availability, which has led to the use of redundant storage media or device such as redundant arrays of inexpensive disks (RAID) and redundant paths from the host server to the data storage. In such data storage systems, the data storage is logically consolidated with high uptime while also reducing costs and infrastructure costs.
A SAN often includes multiple servers or hosts with a central processing unit (CPU) running various operating systems (OS) and including multiple OS device drivers. A host bus adapter (HBA) is provided as an interface between the host CPU and a storage or disk controller. Generally, the host bus adapter is a printed circuit board with firmware adapted to relieve the host of data storage and retrieval tasks and to provide a communication link between the host communication bus and storage network communication fabric (such as Fibre Channel loop or connections). The storage controller performs lower level input/output (I/O) operations for the host and acts as an interface between the host and the physical storage media or devices, which includes providing active and standby paths to the storage devices that are typically identified individually by logical unit numbers (LUNs). To provide ongoing I/O availability and storage redundancy, the storage system must be adapted to address storage controller malfunction, errors in storage devices, and interruptions to or failures in data paths.
The data storage industry has utilized failover and failback processes to enhance controller, storage device, and path failures but has only achieved partial success. Failover is a process in which a first storage controller coupled to a second storage controller assumes the responsibilities of the second controller when the second controller fails (and/or a device or path to the controller fails). Failback is the reverse operation in which the first controller recovers control over its attached storage devices after being repaired or replaced. To achieve failover and failback, control software for storage controllers was typically implemented independently in the different host operating system device drivers, i.e., in host-assisted failover mechanisms.
This has led to a number of problems in common multi-host and multi-operating system environments. Ineroperability that enables hosts, storage controllers, and storage devices to be added to a storage system has been difficult because there has not been any standards or failover and failback rules to insure compatibility of systems. For example, a proposal calling for asymmetric commands in the current generation of the small computer system interface architecture (SCSI-3). The proposal provides a model for providing a redundant controller target but does not define host usage. The proposal has not been widely implemented in targets and is not presently implemented in OS drivers, which forces hosts with the problem that controllers not implementing asymmetric commands need to be failed over when implementing this proposal.
Additionally, operating systems often provide hooks for controlling redundancy at improper levels which results in poor error handling and long latencies. Hooks and/or handshaking protocols that are used to allow the host and storage controller to act cooperatively in failover operations are lacking in industry-standard interconnects (such as SCSI-based interconnects, switches, and hubs), and have presently been built into host firmwave via the host device drivers, which has further led to problems as each host and each host OS may implement different hooks and protocols. Many OS models dictate that redundancy control come from components that may introduce undesirable delays and interdependencies.
Hence, there remains a need for an improved system and method for controlling failover and failback processes in a data storage system utilizing redundant storage controllers. Preferably, such a method and system would address the need for compatibility among host, interconnects, and storage controllers by allowing each host operating system to work without any special redundant storage controller's control software being implemented by the host device drivers.