1. Field of the Invention
This invention is related to the field of computer systems and, more particularly, to network data storage systems.
2. Description of the Related Art
In the past, large organizations relied heavily on parallel SCSI technology to provide the performance required for their enterprise data storage needs. More recently, organizations are recognizing that the restrictions imposed by SCSI architecture may be too costly for SCSI to continue as a viable solution. Such restrictions include the following:                SCSI disk arrays must be located no more than 25 meters from the host server;        The parallel SCSI bus is susceptible to data errors resulting from slight timing discrepancies or improper port termination; and        SCSI array servicing frequently requires downtime for every disk in the array.        
One solution has been to create technology that enables storage arrays to reside directly on the network, where disk accesses may be made directly rather than through the server's SCSI connection. This network-attached storage (NAS) model eliminates SCSI's restrictive cable distance, signal timing, and termination requirements. However, it adds a significant load to the network, which frequently is already starved for bandwidth. Gigabit Ethernet technology only alleviates this bottleneck for the short term, so a more elegant solution is desirable.
The storage area network (SAN) model places storage on its own dedicated network, removing data storage from both the server-to-disk SCSI bus and the main user network. This dedicated network most commonly uses Fibre Channel technology, a versatile, high-speed transport. The SAN includes one or more hosts that provide a point of interface with LAN users, as well as (in the case of large SANs) one or more fabric switches, SAN hubs and other devices to accommodate a large number of storage devices. The hardware (e.g. fabric switches, hubs, bridges, routers, cables, etc.) that connects workstations and servers to storage devices in a SAN is referred to as a “fabric.” The SAN fabric may enable server-to-storage device connectivity through Fibre Channel switching technology to a wide range of servers and storage devices.
SAN management conventionally includes provisioning of and control over host access to individual LUNs (Logical Unit Numbers) within an array or other collection of potentially heterogeneous storage devices. SANs may include storage devices and systems from various storage array providers, for example Hitachi Data Systems, Hewlett Packard and EMC. Ensuring that SAN applications have the required storage resources may include providing secure storage from storage devices (e.g. disk arrays, tape backup devices, etc.) to hosts within the SAN.
A LUN is the SCSI (Small Computer System Interface) identifier of a logical unit within a target, the system component that receives a SCSI I/O command. A logical unit is an entity within a SCSI target that executes I/O commands. SCSI I/O commands are sent to a target and executed by a logical unit within that target. A SCSI physical disk may have a single logical unit, or alternatively may have more than one logical unit. Tape drives and array controllers may incorporate multiple logical units to which I/O commands can be addressed. Each logical unit exported by an array controller corresponds to a virtual disk.
LUN binding refers to the creation of access paths between an addressable unit (which may also be referred to as an AddrUnit, an AU, a unit, a volume, a logical unit, a logical disk, or a logical device) within a disk array and a port on the array. Masking, or LUN masking, may be used to refer to the process of enabling access to a particular Addressable Unit (AU) of storage for a host on the SAN.
FIG. 1 illustrates LUN binding. In the LUN binding process, an AU 108 is bound to a specified array port 106 (e.g. array port 106A or 106B) in a specified storage device 104 (e.g. a storage system/disk array). This results in the creation of a LUN 102. AUs 108A, 108B, 108C, and 108D are storage volumes built out of one or more physical discs within the storage device 104. Array ports 106A and 106B may be connected to a SAN fabric, and function as SCSI targets behind which the AUs 108 bound to those ports 1066 are visible.
“LUN” is the term for the access path itself between an AU and an array port, so LUN binding is actually the process of creating LUNs. However, a LUN is also frequently identified with the AU behind it and treated as though it had the properties of that AU. For the sake of convenience, a LUN may be thought of as being the equivalent of the AU it represents. Note, however, that two different LUNs may represent two different paths to a single volume. A LUN may be bound to one or more array ports. A LUN may be bound to multiple array ports, for example, for failover, switching from one array port to another array port if a problem occurs.
FIG. 2 illustrates LUN masking in a SAN. LUN masking is a security operation that indicates that a particular host 120 (e.g. host 120A or 120B), HBA (Host Bus Adapter) 122 (e.g. HBA 122A or 122B), or HBA port 124 (e.g. HBA port 124A or 124B) is able to communicate with a particular LUN 102. In the LUN masking process, a bound AU 108 (e.g. AU 108A, 108B, 108C or 108D) may be masked to a specified HBA port 124, HBA 122, or host 120 (e.g. all HBAs on the host) through a specified array port 106 in a specified storage device 104. When an array LUN 102 is masked, an entry is added to the Access Control List (ACL) 110 (e.g. ACL 110A, 110B, 110C, 110D, or 110E) for that LUN 102. Each ACL 110 includes the World Wide Name of each HBA port 124 that has permission to use that access path—that is, to access that AU 108 through the particular array port 106 represented by the LUN 102.
LUN masking may be thought of as the removal of a mask between an AU and a host to allow the host to communicate with the LUN. The default behavior of the storage device may be to prohibit all access to LUNs unless a host has explicit permission to view the LUNs. The default behavior may depend on the array model and, in some cases, the software used to create the AU.
The storage management process of allocating units of storage (e.g., AUs on storage devices) to storage consumers (e.g., host systems, or hosts) in a storage network may be referred to as storage provisioning. In a storage network, one or more hosts are each provisioned (granted access to) portions of the storage to meet their current storage requirements. However, storage requirements may, and in many storage networks tend to, increase over time. Rather than purchasing and installing storage on an as-needed basis, a conventional practice in storage management is to pre-provision storage, to purchase and install extra storage in advance to meet anticipated increases in storage requirements of a particular host for at least the near future.
There are conventional mechanisms for provisioning units of extra storage to hosts in a storage network to meet the storage requirements of the hosts as they increase to require more storage. One conventional provisioning mechanism is to purchase and install one or more storage devices (e.g., disk arrays) that provide the required storage as well as extra storage, create a set of LUNs for accessing AUs of the storage to meet the current storage requirements, and allocate the LUNs to the host system. When a host needs additional storage, one or more allocated LUNs may then be used for accessing one or more spare AUs on the storage allocated to the host. Note that, in order for the host to gain access to the newly-allocated storage, the SAN may have to be reconfigured, and the host itself may have to be reconfigured and rebooted. Alternatively, one or more spare LUNs may be preconfigured for accessing the AUs of the extra storage. When a host needs more storage, the storage network and/or the host itself may need to be reconfigured to give the host access to one or more of the spare LUNs.
Storage networks may be fragile, and such a dynamic reconfiguration of the SAN may cause problems elsewhere. Reconfiguring the SAN to give a host access to a LUN may involve configuration of ports on the storage device, switches in between the host and the storage device, and the host itself. The host system may need to be reconfigured and/or rebooted to see the newly-allocated LUN(s). Because of these and other issues, many data centers do not allow this mechanism of dynamic reconfiguration of storage networks. In some data centers, requests for additional storage are required to go through a change control process that may take days or even weeks. Reconfiguration of the storage network to allocate units of spare storage to hosts may be limited by the change control process to periodic maintenance or change windows. This makes it difficult and time-consuming to obtain additional storage when needed, and has led to other provisioning solutions such as thin provisioning.
Another conventional provisioning mechanism is referred to as thin provisioning. Thin provisioning is a provisioning mechanism in which virtual LUNs are over-provisioned to the hosts. In thin provisioning, more storage is provisioned through the virtual LUNs than is actually available. The actual extra storage that is available to be allocated to the virtual LUNs may be referred to as the backing store. For example, in thin provisioning, five hosts may each be assigned virtual LUNs for accessing a terabyte of storage. Thus, five terabytes of storage have been assigned, but the backing store may actually only have one terabyte of actual storage space available. As storage is consumed by the hosts, the array allocates the backing store to the virtual LUNs when needed. Thin provisioning is an “allocate on demand” mechanism. Using thin provisioning, the SAN typically does not have to be reconfigured for the hosts to access additional storage. Thin provisioning thus may help avoid at least some of the configuration problems encountered with other provisioning mechanisms as described above.
However, in a storage network using thin provisioning, if the array ever runs out of real storage (the backing store), the storage network may start failing I/O write operations, and hosts and applications on hosts using that array are typically not prepared to handle these types of I/O errors gracefully. As storage is allocated from the backing store, new storage may have to be installed in the storage network. If the backing store is exhausted, the hosts would think they have storage space available that is not actually available. I/O writes may be blocked or I/O errors may be generated when the hosts attempt to write to the storage. Many systems and applications do not expect to get I/O errors on writes to the storage that they think they actually have, and there are generally no mechanisms for gracefully recovering from these types of errors on writes to storage. A host may have the storage mirrored, and so may try to write to the other mirror, which also does not have storage, so that I/O attempt fails as well.
As an example, a typical file system may attempt to write some data, and get an I/O error on the write if the backing store has been exhausted. For example, the data may be a critical piece of file system metadata. After receiving the I/O error on the write, the file system may not know what is actually on the disk; the file system data may all be corrupted.
As another example, if a database writes into its log and attempts to write back into its table spaces, and then receives I/O errors because backing store has been exhausted, then the database may crash and corrupt itself. Databases are typically not prepared to deal with I/O errors on writes to storage. The database application may record the I/O errors, eventually filling up its error log, and then just halt. This is a difficult type of failure from which to recover.