Distributed computing systems are an increasingly important part of research, governmental, and enterprise computing systems. Among the advantages of such computing systems are their ability to handle a variety of different computing scenarios including large computational problems, high volume data processing situations, and high availability (HA) situations. Such distributed computing systems typically utilize one or more storage devices in support of the computing systems operations performed by one or more processing host computers. These storage devices may be quite numerous and/or heterogeneous. In an effort to aggregate such storage devices and to make such storage devices more manageable and flexible, storage virtualization techniques are often used. Storage virtualization techniques establish relationships between physical storage devices, e.g. disk drives, tape drives, optical drives, etc., and virtual or logical storage devices such as volumes, virtual disks, and logical units (sometimes referred to as LUNs). In so doing, virtualization techniques provide system-wide features, e.g., naming, sizing, and management, better suited to the entire computing system than those features dictated by the physical characteristics of storage devices. Additionally, virtualization techniques enable and/or enhance certain computing system operations such as clustering and data backup and restoration.
FIG. 1 is a simplified block diagram of a computing system 100. The members of the computing system 100 include hosts 130, 140, and 150. The hosts 130, 140, and 150 may typically be computer systems that include software and hardware components well known to those having skill in the art. In various settings, the hosts may also be referred to as nodes, reflecting their participation in a networked system. The hosts 130, 140, and 150 may operate as a cluster in which these hosts are interconnected and may share the computing load involved in various computing tasks. In support of various applications and operations, the hosts may exchange data over, for example, a network 120 such as an enterprise-wide intranet or other local area network (LAN), or over a wide area network (WAN) such as the Internet. Additionally, the network 120 may allow various client computer systems 110 to communicate with the hosts 130, 140, and 150. In addition to using the network 120, the hosts 130, 140, and 150 may communicate with each other and with other computing hosts over a private network 121 that more directly connects the hosts.
Other elements of computing system 100 may include a storage area network (SAN) 125 and storage devices such as a tape library 160 (typically including one or more tape drives), a group of disk drives 170 (i.e., “just a bunch of disks” or “JBOD”), and a storage array 180 such as an intelligent disk array. As shown in FIG. 1, the hosts 130, 140, and 150 may be coupled to the SAN 125. The SAN 125 is conventionally a high-speed network that allows the establishment of direct connections between the storage devices 160, 170, and 180 and the hosts 130, 140, and 150. The SAN 125 may also include one or more SAN-specific devices such as SAN switches, SAN routers, SAN hubs, or some type of storage appliance. The SAN 125 may also be coupled to additional hosts. Thus, the SAN 125 may be shared between the hosts may and allow for the sharing of storage devices between the hosts to provide greater availability and reliability of storage. Although the hosts 130, 140, and 150 are shown connected to the storage devices 160, 170, and 180 through the SAN 125, this need not be the case. Shared resources may be directly connected to some or all of the hosts in the computing system, and the computing system 100 need not include a SAN. Alternatively, the hosts 130, 140, and 150 may be connected to multiple SANs.
FIG. 2 is a simplified block diagram illustrating in greater detail several components of the computing system 100. For example, the storage array 180 is illustrated with two input/output (I/O) ports 181 and 186. Associated with each I/O port is a respective storage controller 182 and 187. In this illustration, the controller 182 is also referred to as alpha and the controller 187 is also referred to as beta. Each storage controller generally manages I/O operations to and from the storage array through the associated I/O port. In this example, the controller 182 includes a processor 183, a memory cache 184 and a regular memory 185. The processor 183 is coupled to the cache 184 and to the memory 185. Similarly, the controller 187 may include a processor 188, a memory cache 189 and a regular memory 190. The processor 188 is coupled to the cache 189 and to the memory 190.
Although one or more of each of these components is typical in storage arrays, other variations and combinations are well known in the art. The storage array may also include some number of disk drives accessible by both storage controllers. As illustrated, each disk drive is shown as a logical unit (LUN), which is generally an indivisible unit presented by a storage device to a host or hosts. In the illustrated example, the storage array 180 holds five LUNs 191-195, which are also referred to as LUNs A-E, respectively. Logical unit numbers, also sometimes referred to as LUNs, are typically assigned to logical units in a storage array so a host may address and access the data on those devices. In some implementations, a LUN may include multiple physical devices, e.g., several disk drives, that are logically presented as a single device. Similarly, in various implementations a LUN may consist of a portion of a physical device, such as a logical section of a single disk drive.
FIG. 2 also illustrates some of the software and hardware components present in the hosts 130, 140, and 150. The host 130 may execute one or more application programs 131. Such applications may include, but are not limited to, database administration systems (DBMS), file servers, application servers, web servers, backup and restore software, customer relationship management software, and the like. The applications and other software not shown, e.g., operating systems, file systems, and applications executing on client computer systems 110 may initiate or request I/O operations against storage devices such as the storage array 180. The host 130 may also execute a volume manager 133 that enables physical resources configured in the computing system to be managed as logical devices. An example of software that performs some or all of the functions of a volume manager 133 is the VERITAS Volume Manager™ product provided by VERITAS Software Corporation. The host 130 may take advantage of the fact that the storage array 180 has more than one I/O port by using a dynamic multipathing (DMP) driver 135 as well as multiple host bus adaptors (HBAs) 137 and 139. The HBAs may provide a hardware interface between the host bus and the storage network, typically implemented as a Fibre Channel network. The host 130 may have multiple HBAs to provide redundancy and/or to take better advantage of storage devices having multiple ports. Other hosts may also execute software, such as programs 141, a volume manager 143, and a DMP driver 145 on the host 140; and programs 151, a volume manager 153, and a DMP driver 155 on the host 150. The other hosts may also use HBAs, such as HBAs 147 and 149 on the host 140, and HBAs 157 and 159 on the host 150.
The DMP functionality may enable greater availability and performance by using path fail-over and load balancing. In general, the multipathing policy used by the DMP drivers 135, 145, and 155 depends on the characteristics of the storage array in use.
Active/active storage arrays (A/A arrays), for example, are one type of storage array. A/A arrays permit several paths to be used concurrently for I/O operations. Other types of storage arrays, such as active/passive arrays, may generally designate one path for accessing particular resources on the array, while other paths are reserved as redundant backups.
Active/passive arrays with so-called auto-trespass mode (A/P arrays) allow I/O operations on one or more primary path while one or more secondary path is available in case the primary path fails. For example, if the storage array 180 is implemented as an A/P array, then the storage array 180 may designate a primary path and a secondary path for each of the LUNs in the storage array.
For example, the storage array 180 may designate the controller 182 as the primary controller for the LUNs 191, 192, and 193. Accordingly, the primary paths for these LUNs would include the controller 182, the I/O port 181, relevant portions of the SAN 125, and one or both of the HBA's in each of the hosts 130, 140, and 150. The storage array may also designate secondary paths as redundant backup paths for access to LUNs 191, 192, and 193. The secondary paths would include a different controller than the primary controller 181. In the illustrated example, the secondary paths would include the controller 187, the I/O port 186, relevant portions of the SAN 125, and one or both of the HBAs in each of the hosts 130, 140, and 150.
While the controller 182 and the associated elements may be designated as the primary path for some of the LUNs, the controller 187 and the associated elements may be designated as the primary controller for other LUNs. For example, the LUNs 191, 192, and 193 may have a primary path that includes the controller 182 and a secondary path that includes the controller 187. At the same time, the LUNs 194 and 195 may have a primary path that includes the controller 187 and a secondary path that includes the controller 182.
To communicate with a LUN on a storage array, a host may normally use only one of the available paths. This path may be called the active path; the remaining path may be called the passive path. This arrangement allows the controllers 182 and 187 more readily to manage data traffic and caching for their respective LUNs. When a host communicates with a LUN over a path that is not the path designated for current use with that LUN, the communication is considered a trespass on that path.
In the event that the primary path for a LUN fails, a host will need to turn to that LUN's secondary path until external measures have corrected the problem with the primary path. This process of the host and the storage array switching paths in response to failure of the primary path may be known as a fail-over. Similarly, the process of the host and the storage array switching back to the primary path after the restoration of the primary path may be known as a fail-back.
In active/passive arrays with auto-trespass features, a trespass may be interpreted as a situation that requires a fail-over or a fail-back. Active/passive arrays may alternatively be configured without an automated response to trespasses. For example, active/passive arrays in explicit fail-over mode (A/PF arrays) may require a special command to be issued to the storage array for fail-over or fail-back to occur, such as a SCSI command or a Fibre Channel command. Yet another example of storage arrays are active/passive arrays with LUN group fail-over (A/PG arrays). A/PG treat a group of LUNs that are connected through a controller as a single fail-over entity. The primary and secondary controllers are each connected to a separate group of LUNs. If a single LUN in the primary controller's LUN group fails, all LUNs in that group fail over to the secondary controller's LUN group.
In situations where only one host communicates with a storage array, various techniques may be used to ensure that the host and the storage array stay synchronized regarding which path is currently being used to access a resource on the storage array. However, if multiple hosts share the resources of a storage array (or multiple storage arrays), the task of synchronization may become more intricate. A host needs to coordinate not only with the storage array, but also with the other hosts in the system regarding which paths are to be used for various resources on the storage array.
Under certain circumstances, it is possible for two or more hosts to lose synchronization with each other regarding which path—primary or secondary—is the path currently designated for use with a resource. Such inconsistencies between the hosts may have undesirable consequences. For example, one host may trigger an undesired fail-over or a fail-back by unintentionally communicating with a LUN on a path that is not considered the current path by other hosts.
Accordingly, it may be helpful to have automated tools for reducing or preventing mismatches in the designation of data paths used by processing hosts for communicating with storage arrays. Further, it may be helpful to employ techniques for reducing or preventing such mismatches during fail-back activities in clustered computing environments.