Distributed computing systems are an increasingly important part of research, governmental, and enterprise computing systems. Among the advantages of such computing systems are their ability to handle a variety of different computing scenarios including large computational problems, high volume data processing situations, and high availability situations. Such distributed computing systems typically utilize one or more storage devices in support of the computing systems operations. These storage devices can be quite numerous and/or heterogeneous. In an effort to aggregate such storage devices and to make such storage devices more manageable and flexible, storage virtualization techniques are often used. Storage virtualization techniques establish relationships between physical storage devices, e.g. disk drives, tape drives, optical drives, etc., and virtual or logical storage devices such as volumes, virtual disks, and virtual logical units (sometimes referred to as virtual LUNs). In so doing, virtualization techniques provide system-wide features, e.g., naming, sizing, and management, better suited to the entire computing system than those features dictated by the physical characteristics of storage devices. Additionally, virtualization techniques enable and/or enhance certain computing system operations such as clustering and data backup and restore.
FIG. 1 illustrates a simplified example of a computing system 100. The members of the computing system 100 include host 130 and host 140. As members of computing system 100, hosts 130 and 140, typically some type of application, data, or file server, are often referred to “nodes.” Hosts 130 and 140 can be designed to operate completely independently of each other, or may interoperate to form some manner of cluster. Thus, hosts 130 and 140 are typically individual computer systems having some or all of the software and hardware components well known to those having skill in the art. FIG. 5 (described below) illustrates some of the features common to such computer systems. In support of various applications and operations, hosts 130 and 140 can exchange data over, for example, network 120, typically a local area network (LAN), e.g., an enterprise-wide intranet, or a wide area network (WAN) such as the Internet. Additionally, network 120 provides a communication path for various client computer systems 110 to communicate with hosts 130 and 140. In addition to network 120, hosts 130 and 140 can communicate with each other over a private network (not shown).
Other elements of computing system 100 include storage area network (SAN) 150, SAN switch 160, and storage devices such as tape library 170 (typically including one or more tape drives), a group of disk drives 180 (i.e., “just a bunch of disks” or “JBOD”), and intelligent storage array 190. As shown in FIG. 1, both hosts 130 and 140 are coupled to SAN 150. SAN 150 is conventionally a high-speed network that allows the establishment of direct connections between storage devices 170, 180, and 190 and hosts 130 and 140. Thus, SAN 150 is shared between the hosts and allows for the sharing of storage devices between the hosts to provide greater availability and reliability of storage.
Although hosts 130 and 140 are shown connected to storage devices 170, 180, and 190 through SAN switch 160 and SAN 150, this need not be the case. Shared resources can be directly connected to some or all of the hosts in the computing system, and computing system 100 need not include a SAN. Alternatively, hosts 130 and 140 can be connected to multiple SANs. Additionally, SAN switch 160 can be replaced with a SAN router, a SAN hub, or some type of storage appliance.
Storage virtualization can general be implemented at the host level, e.g., hosts 130 and 140, at the storage device level, e.g., intelligent disk array 190, and/or at the appliance level, e.g., SAN switch 160. Host-based storage virtualization is perhaps the most common virtualization solution and is typically either packaged with the host's operating system or made available as an add-on product. Host-based virtualization allows administrators to access advanced storage management functions such as mirroring, RAID sets, redundant pathing, and hot backups (by using mirror splits or snap-shots). However, it typically adds some additional overhead to the host system and the management of the virtualization is typically performed on a host-by-host basis, making global storage management difficult.
An alternative to host-based virtualization is storage-based virtualization. Storage-based virtualization solutions typically implement intelligent storage devices such as intelligent disk arrays that implement virtualization functions. For example, such devices can allow for movement between different RAID groups without data loss, as well as automatic migration of data from one RAID group to another based upon the frequency of data access. In addition, these products typically permit the creation of multiple data mirrors, which provide additional availability when one of the minors is split for hot backups. Storage-based virtualization can also be advantageous in providing the flexibility to modify LUN size, the ability to have multiple hosts see the same LUNs (which is particularly critical with high availability clustering), and remote replication. However, the more heterogeneous the storage devices, the more likely it is that there are multiple virtualization schemes with which a host-level or client-computer-system level application or user will have to contend.
Still another alternative is appliance-based virtualization. Appliance-based virtualization provides users with virtualization between the hosts and the storage, allowing for the same level of control and centralization across the storage architecture. There are, in general two kinds of appliance-based virtualization products: in-band and out-of-band. An in-band virtualization appliance is physically located between the host and the storage. The appliance takes the disk requests from the host and fulfills the host's request from the storage attached to the other side of the appliance. This functionality is essentially transparent to the host because the appliance presents itself as disk. The physical location of the appliance is the primary difference between out-of-band and in-band appliances. Out-of-band appliances logically present themselves as if they are located between the host and storage, but they actually reside to the side. This is accomplished with the installation of a driver under the host's disk driver. The appliance driver then receives logical to physical block mappings from the appliance.
In providing a common virtualization scheme for all storage devices, appliance-based virtualization does simplify the presentation and use of virtual storage devices by client computer systems. However, certain information about the virtual devices may only be available at the appliance-level and thus inaccessible by client computer systems. FIGS. 2A-2C illustrates one problem associated with such a system. FIGS. 2A-2C schematically illustrate the creation of a point-in-time copy or snapshot of an existing virtual storage device. Point-in-time copies are typically used for performing system backup, upgrade and other maintenance tasks while providing continuous availability of the original volume data. Moreover, processing of point-in-time copies can be offloaded onto another host to avoid contention for system resources on a production server.
As shown in FIG. 2A, volume 210 corresponds to one or more portions of one or more physical disks 200. In order to simply the volume's exposure to storage clients such as hosts 130 and 140 and client computer systems 110, one or more virtual disks 220 are established corresponding to volume 210. Virtual disk 220 appears to a storage client much in the same way a physical disk would. For example, a storage client can send and receive I/O commands to/from virtual disk 220 as if it is a physical disk.
In FIG. 2B, the first step in creating a snapshot volume is illustrated. Snapshot mirror 230 is a copy of the data in volume 210. When initially created, snapshot mirror 230 is typically stored on the same physical disk or disks containing volume 210. FIG. 2C illustrates the final step in which snapshot mirror 230 is organized into a separate snapshot volume 250. As part of this process, the data contained in snapshot mirror 230 is typically moved to one or more physical disks 240 that are different from those corresponding to volume 210. Additionally, one or more virtual disks 260 corresponding to snapshot volume 250 are established.
Since a storage client generally treats virtual disks 220 and 260 in a similar manner, the storage client typically does not have any information about relationships among the virtual disks. Such information, e.g., that virtual disk 260 represents a snapshot of volume 210 is only available to the software and/or hardware entity, e.g., the storage appliance, that created the relationship. Accordingly, it is desirable to have an efficient and convenient mechanism for storage clients to obtain storage virtualization information.