The present disclosure relates in general to the field of computers and computer networks and, more particularly, to a method for failover in a storage area network (SAN) and system for using such method.
The demand for data storage capacity in computer networking environments increases dramatically each year. One of the reasons driving such demand is an increase in the number of data-intensive applications conducted over network environments. Examples of such tasks include internet applications, multimedia applications, data warehousing, online transaction processing, and medical imaging. Along with this need for increased storage capacity, users demand faster access to the data and greater reliability. In addition to these demands, network operators often desire methods and systems to continue to operate effectively after a network component has malfunctioned or failed.
A storage area network (SAN) is a network developed to address many of the above concerns. A conventional SAN includes a collection of data storage devices sometimes referred to as a xe2x80x9cstorage poolxe2x80x9d communicatively coupled to one or more hosts such as a workstation or server. In the present disclosure, the term xe2x80x9chostxe2x80x9d and xe2x80x9cserverxe2x80x9d are used interchangeably, with the understanding that a xe2x80x9cserverxe2x80x9d is one type of xe2x80x9chost.xe2x80x9d
The hosts can access the storage pool using Fiber Channel protocol, whose functionality is generally well known. Signals between a host and a storage pool are often directed through a switching element within a switch fabric and controllers which direct signals to and from data storage devices within the storage pool.
Often times, high availability storage area networks are configured with redundant components such that if one component fails, the duties of that component may be performed by its redundant counterpart. In this manner, a SAN may continue to operate, despite the failure of one component. For example, if a switch component malfunctions, signals between the host and the storage pool may be rerouted through the redundant component. These redundant components act as a back up. The rerouting of the signal through the redundant component is often referred to as xe2x80x9cfailoverxe2x80x9d.
One problem with conventional SANs is establishing which component is responsible for detecting a malfunction and initiating a failover operation. This problem is heightened by the fact that network components are often made by different manufacturers. Accordingly, components may have compatibility problems requiring failover solutions custom tailored for the specific components in the system.
One of these methods Uses a filter failover device driver. This method typically requires the use of vendor-specific software code which is stored on the host. The software code is specific to the storage pool and must be supported on each host operating system in the SAN. One disadvantage of this solution is that supporting the software within a network having multiple hosts and multiple operating systems requires costly development and testing. Also, the supporting software must be loaded and maintained on each host. Because hosts are often connected to multiple storage pools, each host must then maintain multiple filter failover device drivers. As the number of hosts and storage pools rises, maintenance of the filter failover device drivers becomes increasingly burdensome.
Another failover method is sometimes referred to as Auto LUN transfer. This method requires the host to reroute the input and output (I/O) requests to the storage unit on an alternate path when access to the storage unit fails on the primary path. Simple rerouting does not provide a standard method of determining the state of the storage unit and may require the host to monitor I/O coming from the storage unit controller to determine appropriate failover actions. The I/O signals used for monitoring the storage unit controller are often vendor specific. Thus, the host must monitor vendor specific status in the server OS, and is a disadvantage of this method.
Another failover method is to utilize Host Independent Controller Failover. Host Independent Controller Failover typically involves monitoring the path from within the storage pool. While this method works well when a controller within the storage unit fails, a failure in the path, such as a switch failure, may not be detected.
In accordance with teachings of the present disclosure, a system and method for failover in a storage area network are provided with significant advantages over prior developed methods and systems. The present disclosure presents a method and system to identify paths and controllers to enable more efficient, scalable failover within a SAN.
According to one aspect of the present disclosure, a computer network includes a host and a storage system associated with the host. The storage system has a first controller with a first port and a second controller with a second port. The first controller is assigned a common node name and the first port has a first port name. The second controller is assigned a common node name and the second port has a second port name. More specifically the common node name assigned to the first controller and the second controller may be a common world wide node name. Also, the first port name may be a world wide port name and the second port name may be a world wide port name. It is a technical advantage of the present invention to assign the same common node name to the first controller and the second controller. In this manner the host sees the first controller and the second controller as a single node which allows for simplified failover logic within the host.
According to another aspect of the present invention, a computer system for retrieving and storing information, including a storage device, a first controller associated with the storage device, and a second controller associated with the storage device is disclosed. The first controller preferably has a common node name and has a first port that is assigned a first port name. The second controller preferably has the same common node name and has a second port that is assigned a second port name. More specifically the first controller and the second controller are both associated with a common memory.
According to yet another aspect of the disclosure a method for transmitting data in a computer network includes assigning a common world wide node name to a first controller in a network. The first controller may be associated with a common storage within the computer network for use in storing and retrieving information from the common storage. The method also includes assigning the same common world wide node name to a second controller associated with the common storage for use in storing and retrieving information from the common storage. Also a first world wide port name may be assigned to a first port associated with the first controller, and a second world wide port name may be assigned to a second port associated with the second controller. A data request may then be sent to the common memory routed through the first controller and the first port. After determining that the data request was unsuccessful, the data request may be routed through the second port and second controller.
It is a technical advantage of the present invention to associate or pair controllers associated with common storage by assigning both controllers the same world wide node name. This simplifies the logic necessary to allow the host to reroute a data request after a data request has failed.
It is a further technical advantage of the present invention to provide a standardized host failover method that is not specific to the vendor of the controllers or the host.