1. Field of the Invention
The present invention is generally related to the management and control of distributed virtual storage systems and, in particular, to the management and re-signaturing of data storage units within distributed virtual storage systems.
2. Description of the Related Art
As computer systems scale to enterprise levels, particularly in the context of supporting large-scale data centers, the underlying data storage systems frequently adopt the use of storage area networks (SANs). As is conventionally well appreciated, SANs provide a number of technical capabilities and operational benefits, fundamentally including virtualization of data storage devices, redundancy of physical devices with transparent fault-tolerant fail-over and fail-safe controls, geographically distributed and replicated storage, and centralized oversight and storage configuration management decoupled from client-centric computer systems management.
Architecturally, a SAN storage subsystem is characteristically implemented as a large array of Small Computer System Interface (SCSI) protocol-based storage devices. One or more physical SCSI controllers operate as the externally accessible targets for data storage commands and data transfer operations. The target controllers internally support bus connections to the data storage devices, identified as logical units (LUNs). The storage array is collectively managed internally by a storage system manager to virtualize the physical data storage devices. That is, the SCSI storage devices are internally routed and respond to the virtual storage system manager as functionally the sole host initiator accessing the SCSI device array. The virtual storage system manager is thus able to aggregate the physical devices present in the storage array into one or more logical storage containers. Virtualized segments of these containers can then be allocated by the virtual storage system as externally visible and accessible LUNs with uniquely identifiable target identifiers. A SAN storage subsystem thus presents the appearance of simply constituting a set of SCSI targets hosting respective sets of LUNs. While specific storage system manager implementation details differ as between different SAN storage device manufacturers, the desired consistent result is that the externally visible SAN targets and LUNs fully implement the expected SCSI semantics necessary to respond to and complete initiated transactions against the managed container.
A SAN storage subsystem is typically accessed by a server computer system implementing a physical host bus adapter (HBA) that connects to the SAN through network connections. Within the server, above the host bus adapter, storage access abstractions are characteristically implemented through a series of software layers, beginning with a low-level SCSI driver layer and ending in a operating system specific filesystem layer. The driver layer, which enables basic access to the target ports and LUNs, is typically vendor specific to the implementation of the SAN storage subsystem. A data access layer may be implemented above the device driver to support multipath consolidation of the LUNs visible through the host bus adapter and other data assess control and management functions. A logical volume manager (LVM), typically implemented intermediate between the driver and conventional operating system filesystem layers, supports volume oriented virtualization and management of the LUNs accessible through the host bus adapter. Multiple LUNs can be gathered and managed together as a volume under the control of the logical volume manager for presentation to and use by the filesystem layer as an integral LUN.
In typical implementation, SAN systems connect with upper-tiers of client and server computer systems through a communications matrix frequently implemented using a Fibre Channel (FC) based communications network. Logically, a Fibre Channel network is a bidirectional, full-duplex, point-to-point, serial data channel structured specifically for high performance data communication. Physically, the Fibre Channel is an interconnection of multiple communication ports, called N_Ports, implemented by the host bus adapters and target controllers. These communication ports are interconnected by a switching network deployed as a n-way fabric, a set of point-to-point links, or as an arbitrated loop.
Strictly defined, Fibre Channel is a generalized transport mechanism that has no high-level data flow protocol of its own or native input/output command set. While a wide variety of existing Upper Level Protocols (ULPs) can be implemented on Fibre Channel, the most frequently implemented is the SCSI protocol. The SCSI Fibre Channel Protocol (FCP) standard defines a Fibre Channel mapping layer that enables transmission of SCSI command, data, and status information between a source host bus adapter, acting as a SCSI initiator, and a destination SCSI target controller, over any Fibre Channel connection path as specified by a Fibre Channel path identifier. As defined relative to a target, a FC path identifier is a reference to the destination port and logical unit of the SAN storage system. The port is uniquely specified by a World Wide Port Name (WWPN). The LUN identifier is a unique, hardware independent SCSI protocol compliant identifier value retrievable in response to a standard SCSI Inquiry command.
A common alternative transport mechanism to Fibre Channel is defined by the Internet Small Computer System Interface (iSCSI) standard. Instead of relying on a new FC media infrastructure, the iSCSI standard is designed to leverage existing TCP/IP networks including specifically the existing mixed-media infrastructure, including typical intranet and internet networks, and to use internet protocol (IP) layer for upper-level command and data transport. Unlike Fibre Channel, the SCSI protocol is the exclusive upper-level protocol supported by iSCSI. That is, the iSCSI protocol semantics (IETF Internet Draft draft-ietf-ips-iSCSI-08.txt; www.ietf.org) specifically requires the transmission of SCSI command, data, and status information between SCSI initiators and SCSI targets over an IP network. Similar to the FC path, an iSCSI path, as specified by a SCSI initiator, is a combination of a target IP address and LUN identifier.
SAN virtualization of client LUNs enables a number of specific capabilities, including a more efficient use of the storage space within a particular container, dynamic extensibility and reconfiguration of the container storage space by adding and replacing physical devices and shifting unused storage space between localized containers, and comprehensive management of the virtual LUNs. In addition, modern SAN systems enable multiple network path (multipath) access between the SAN connected computer systems and multiple, different physical SAN storage systems. Multipath routing functionally enables configuration of redundant network connections and channel bonding to achieve fundamental increases in the total available bandwidth between clients and their data stores.
A particular benefit of conventional SAN systems is the ability to implement consistent, system-oriented data integrity protection policies. Given the scope of data stored by individual SANs, overall storage system reliability and ongoing data integrity are baseline requirements. To provide the various real-time, hot-backup, and similar capabilities of conventional SAN systems, these systems will typically implement a periodic or administratively driven data replication-based data integrity protection scheme. Persistently scheduled checkpoint events are typically used to initiate image replication, also referred to as snap-shot copy, of established, externally visible LUNs. Subject to the details of the various sparse and progressive data copy techniques that may be used by any particular proprietary SAN implementation, each checkpoint event drives the creation of point-in-time copies of the event-specified externally visible LUNs.
Administratively, checkpoints will be set to encompass full client computer system volumes in order to preserve potential internal data dependencies between the LUNs that make up individual volumes. LUN replication services, as implemented by SAN systems, conventionally executes independent of volume identification; snap-shot copies are made of individual, externally visible LUNs without regard to volume participation. To prevent logical identification collisions between the source LUNs and replicated LUN copies, the check-point LUN copies are marked inactive integral to the copy process. Thus, beyond possibly initial administrative identification of LUNs for replication, there is little required administrator intervention and essentially no user visible burden arising from the direct execution of LUN replication operations.
Although a generally infrequent requirement, various circumstances may require a rollback of a client computer system volume to a prior point-in-time. In other circumstances, a user may wish to have concurrent access to both a prior checkpointed instance and the current instance of a volume. Unfortunately, restoring a checkpointed volume or simply making a checkpointed volume currently accessible is typically burdensome. Conventionally, a manual selection operation is required to parse through all of the prior replicated LUNs to select just the single consistent set of LUNs that represent a specific, desired point-in-time replicated volume. Given the multiplicity of checkpoint events, the number of LUNs under management by a SAN system, and the number of SAN systems within the use scope of a given server computer system, the process of identifying, validating, restoring, and mounting a prior replicated volume set of LUNs is complex and time consuming. Moreover, manually managed volume restoration presents a significant risk that the data integrity of the restored volume will not be maintained. If the integrity of a restored volume is lost, then the integrity of one if not multiple other replicated generations of volumes will be corrupted.
Even where a proper set of replicated LUNs are identified for restoration, restoring a prior copy of a current volume is complicated by the inherent duplicate nature of the replicated LUNs. As expected, the internal data structures of the replicated LUNs, including the LUN internal metadata, are preserved by the replication process. Remounting a prior point-in-time copy of a volume can create an identity ambiguity since the replicated LUNs will report their original LUN identifiers. Furthermore, since there is no default defined manner of handling the LUNs of replicated volumes in a SAN environment, different client computer systems may operate inconsistently in recognizing the identity of the LUNs that constitute a currently active volume. Inconsistent recognition of LUNs is of particular concern given that current and replicated LUNs are not guaranteed visible to all client computer systems at all times in a SAN environment. Even where a volume consists of just a single LUN, different client computer system could fail to distinguish which is the currently active volume, leading to inconsistent use.
Client-based software and protocols to resolve these ambiguities create a further layer of complexity, are dependent on the mutual communication between the client computer systems, and requires reliable connectivity to ensure continuous communication. Therefore, before a replicated LUN set can be mounted by any client computer system as an active volume, the replicated LUNs must be suitably re-signatured to ensure both LUN and volume uniqueness within the scope of any accessing client or server computer system. Conventional administrative tools do support the rewriting of selected LUN identifiers and volume signatures as part of the manual process of selecting and remounting a prior point-in-time volume. Still, this re-signaturing process is an additional and required step in the already complicated manual process of enabling access to a prior point-in-time copy of a volume. Additional complexities arise in the context of rollbacks where there is a need to repeatedly select between and activate multiple replicated volumes, while ensuring that the original state of the client computer systems volumes can be maintained and reliably restored.
Therefore, particularly as the use of SAN systems and reliance on internal replication services grows, there is a present need for users to be able to easily, reliably, and preferably transparently, manage and maintain LUN volume sets to enable rollback and remounting of checkpointed data storage volumes.