Distributed computing systems are an increasingly important part of research, governmental, and enterprise computing systems. Among the advantages of such computing systems are their ability to handle a variety of different computing scenarios including large computational problems, high volume data processing situations, and high availability situations. Such distributed computing systems typically utilize one or more storage devices in support of the computing systems operations. These storage devices can be quite numerous and/or heterogeneous. In an effort to aggregate such storage devices and to make such storage devices more manageable and flexible, storage virtualization techniques are often used. Storage virtualization techniques establish relationships between physical storage devices, e.g. disk drives, tape drives, optical drives, etc., and virtual or logical storage devices such as volumes, virtual disks, and virtual logical units (sometimes referred to as virtual LUNs). In so doing, virtualization techniques provide system-wide features, e.g., naming, sizing, and management, better suited to the entire computing system than those features dictated by the physical characteristics of storage devices. Additionally, virtualization techniques enable and/or enhance certain computing system operations such as clustering and data backup and restore.
FIG. 1 illustrates a simplified example of a computing system 100. The members of the computing system 100 include host 130 and host 140. As members of computing system 100, hosts 130 and 140, typically some type of application, data, or file server, are often referred to “nodes.” Hosts 130 and 140 can be designed to operate completely independently of each other, or may interoperate to form some manner of cluster. Thus, hosts 130 and 140 are typically individual computer systems having some or all of the software and hardware components well known to those having skill in the art. FIG. 8 (described below) illustrates some of the features common to such computer systems. In support of various applications and operations, hosts 130 and 140 can exchange data over, for example, network 120, typically a local area network (LAN), e.g., an enterprise-wide intranet, or a wide area network (WAN) such as the Internet. Additionally, network 120 provides a communication path for various client computer systems 110 to communicate with hosts 130 and 140. In addition to network 120, hosts 130 and 140 can communicate with each other over a private network (not shown).
Other elements of computing system 100 include storage area network (SAN) 150 and storage devices such as tape library 160 (typically including one or more tape drives), a group of disk drives 170 (i.e., “just a bunch of disks” or “JBOD”), and intelligent storage array 180. As shown in FIG. 1, both hosts 130 and 140 are coupled to SAN 150. SAN 150 is conventionally a high-speed network that allows the establishment of direct connections between storage devices 160, 170, and 180 and hosts 130 and 140. SAN 150 can also include one or more SAN specific devices such as SAN switches, SAN routers, SAN hubs, or some type of storage appliance. Thus, SAN 150 is shared between the hosts and allows for the sharing of storage devices between the hosts to provide greater availability and reliability of storage. Although hosts 130 and 140 are shown connected to storage devices 160, 170, and 180 through SAN 150, this need not be the case. Shared resources can be directly connected to some or all of the hosts in the computing system, and computing system 100 need not include a SAN. Alternatively, hosts 130 and 140 can be connected to multiple SANs.
FIG. 2 illustrates in greater detail several components of computing system 100. For example, disk array 180 is shown to include two input/output (I/O) ports 181 and 186. Associated with each I/O port is a respective storage controller (182 and 187), and each storage controller generally manages I/O operations to and from the storage array through the associated I/O port. In this example, each storage controller includes a processor (183 and 188), a cache memory (184 and 189) and a regular memory (185 and 190). Although one or more of each of these components is typical in disk arrays, other variations and combinations are well known in the art. The disk array also includes some number of disk drives (logical units (LUNs) 191-195) accessible by both storage controllers. As illustrated, each disk drive is shown as an LUN which is generally an indivisible unit presented by a storage device to its host(s). Logical unit numbers, also sometimes referred to as LUNs, are typically assigned to each disk drive in an array so the host can address and access the data on those devices. In some implementations, an LUN can include multiple devices, e.g., several disk drives, that are logically presented as a single device.
FIG. 2 also illustrates some of the software and hardware components present in hosts 130 and 140. Both hosts 130 and 140 execute one or more application programs (131 and 141) respectively. Such applications can include, but are not limited to, database administration systems (DBMS), file servers, application servers, web servers, backup and restore software, customer relationship management software, and the like. The applications and other software not shown, e.g., operating systems, file systems, and applications executing on client computer systems 110 can initiate or request I/O operations against storage devices such as disk array 180. Hosts 130 and 140 also execute volume manager (133 and 143) which enables physical resources configured in the computing system to be managed as logical devices. An example of software that performs some or all of the functions of volume manager 330 is the VERITAS Volume Manager™ product provided by VERITAS Software Corporation. Hosts 130 and 140 take advantage of the fact that disk array 180 has more than one I/O port using dynamic multipathing (DMP) drivers (135 and 145) as well as multiple host bus adaptors (HBAs) 137, 139, 147, and 149. The HBAs provide a hardware interface between the host bus and the storage network, typically implemented as a Fibre Channel network. Hosts 130 and 140 each have multiple HBAs to provide redundancy and/or to take better advantage of storage devices having multiple ports.
The DMP functionality enables greater reliability and performance by using path failover and load balancing. In general, the multipathing policy used by DMP drivers 135 and 145 depends on the characteristics of the disk array in use. Active/active disk arrays (A/A arrays) permit several paths to be used concurrently for I/O operations. Such arrays enable DMP to provide greater I/O throughput by balancing the I/O load uniformly across the multiple paths to the disk devices. In the event of a loss of one connection to an array, the DMP driver automatically routes I/O operations over the other available connections to the array. Active/passive arrays in so-called auto-trespass mode (A/P arrays) allow I/O operations on a primary (active) path while a secondary (passive) path is used if the primary path fails. Failover occurs when I/O is received or sent on the secondary path. Active/passive arrays in explicit failover mode (A/PF arrays) typically require a special command to be issued to the array for failover to occur. Active/passive arrays with LUN group failover (A/PG arrays) treat a group of LUNs that are connected through a controller as a single failover entity. Failover occurs at the controller level, and not at the LUN level (as would typically be the case for an A/P array in auto-trespass mode). The primary and secondary controller are each connected to a separate group of LUNs. If a single LUN in the primary controller's LUN group fails, all LUNs in that group fail over to the secondary controller's passive LUN group.
Because of their relative simplicity and lower costs, A/P disk arrays are commonly found in SAN environments. However, in A/P disk arrays with two I/O ports, both of the access ports generally cannot be used concurrently without causing substantial I/O performance degradation. For example, since the active port is typically the port used for I/O to disk array, the active port is usually the primary port of the array, but it could be the secondary port if the primary port is not available due to, for example, failure. Hosts discover the primary and/or secondary port, but it is not known without some communication among the sharing hosts which port should be selected as an active port. Thus, among the problems associated with sharing the disks of an A/P (e.g., an A/P, A/PG, or A/PF) array from multiple hosts are: (I) arriving at a consensus among hosts about the appropriate access port to use; (2) arriving at a consensus among hosts about the appropriate access port for failover; and (3) performing actual failover.
Accordingly, it is desirable to have efficient and convenient mechanisms for storage device and particularly disk array I/O path coordination among storage device clients such as hosts in SAN environments.