1. Technical Field
The present invention relates to network technology. More particularly, the present invention relates to methods and apparatus for implementing MUD logging in a system implementing virtualization of storage within a storage area network.
2. Description of the Related Art
In recent years, the capacity of storage devices has not increased as fast as the demand for storage. Therefore a given server or other host must access multiple, physically distinct storage nodes (typically disks). In order to solve these storage limitations, the storage area network (SAN) was developed. Generally, a storage area network is a high-speed special-purpose network that interconnects different data storage devices and associated data hosts on behalf of a larger network of users. However, although a SAN enables a storage device to be configured for use by various network devices and/or entities within a network, data storage needs are often dynamic rather than static.
The concept of virtual memory has traditionally been used to enable physical memory to be virtualized through the translation between physical addresses in physical memory and virtual addresses in virtual memory. Recently, the concept of “virtualization” has been implemented in storage area networks through various mechanisms. Virtualization interconverts physical storage and virtual storage on a storage network. The hosts (initiators) see virtual disks as targets. The virtual disks represent available physical storage in a defined but somewhat flexible manner. Virtualization provides hosts with a representation of available physical storage that is not constrained by certain physical arrangements/allocation of the storage.
Virtualization in the storage array is one of the most common storage virtualization solutions in use today. Through this approach, virtual volumes are created over the storage space of a specific storage subsystem (e.g., disk array). Creating virtual volumes at the storage subsystem level provides host independence, since virtualization of the storage pool is invisible to the hosts. In addition, virtualization at the storage system level enables optimization of memory access and therefore high performance. However, such a virtualization scheme typically will allow a uniform management structure only for a homogenous storage environment and even then only with limited flexibility. Further, since virtualization is performed at the storage subsystem level, the physical-virtual limitations set at the storage subsystem level are imposed on all hosts in the storage area network. Moreover, each storage subsystem (or disk array) is managed independently. Virtualization at the storage level therefore rarely allows a virtual volume to span over multiple storage subsystems (e.g., disk arrays), thus limiting the scalability of the storage-based approach.
When virtualization is implemented on each host, it is possible to span multiple storage subsystems (e.g., disk arrays). A host-based approach has an additional advantage, in that a limitation on one host does not impact the operation of other hosts in a storage area network. However, virtualization at the host-level requires the existence of a software layer running on each host (e.g., server) that implements the virtualization function. Running this software therefore impacts the performance of the hosts running this software. Another key difficulty with this method is that it assumes a prior partitioning of the available storage to the various hosts. Since such partitioning is supported at the host-level and the virtualization function of each host is performed independently of the other hosts in the storage area network, it is difficult to coordinate storage access across the hosts. The host-based approach therefore fails to provide an adequate level of security. Due to this security limitation, it is difficult to implement a variety of redundancy schemes such as RAID which require the “locking” of memory during read and write operations. In addition, when mirroring is performed, the host must replicate the data multiple times, increasing its input-output and CPU load, and increasing the traffic over the SAN.
Virtualization in a storage area network appliance placed between the hosts and the storage solves some of the difficulties of the host-based and storage-based approaches. The storage appliance globally manages the mapping and allocation of physical storage to virtual volumes. Typically, the storage appliance manages a central table that provides the current mapping of physical to virtual. Thus, the storage appliance-based approach enables the virtual volumes to be implemented independently from both the hosts and the storage subsystems on the storage area network, thereby providing a higher level of security. Moreover, this approach supports virtualization across multiple storage subsystems. The key drawback of many implementations of this architecture is that every input/output (I/O) of every host must be sent through the storage area network appliance, causing significant performance degradation and a storage area network bottleneck. This is particularly disadvantageous in systems supporting a redundancy scheme such as RAID, since data must be mirrored across multiple disks. In another storage appliance-based approach, the appliance makes sure that all hosts receive the current version of the table. Thus, in order to enable the hosts to receive the table from the appliance, a software shim from the appliance to the hosts is required, adding to the complexity of the system. Moreover, since the software layer is implemented on the host, many of the disadvantages of the host-based approach are also present.
Patent application Ser. No. 10/056,238, entitled “Methods and Apparatus for Implementing Virtualization of Storage in a Storage Area Network,” by Edsall et al, filed on Jan. 23, 2002, discloses a system in which network-based virtualization is supported. In other words, virtualization is supported in the network, rather than at the hosts or storage devices. In this system, virtualization is supported by one or more network devices placed in a data path between the hosts and the storage devices. More particularly, virtualization may be implemented on a per-port basis via “intelligent ports.”
In a system implementing storage virtualization, virtual volumes are typically created over the storage space of a specific storage subsystem (e.g., disk array). More particularly, data is often mirrored across multiple storage devices (e.g., disks) such that the same data is stored across each of the storage devices. Storage devices storing the same data are typically referred to as mirrors. Through the use of mirroring, redundancy may be accomplished. As a result, the data that is stored in each of the mirrors will remain accessible to hosts in the event of a problem with one of the mirrors.
In the event that one of the storage devices goes offline, it is desirable to bring the storage device up to date when the storage device is brought back online. This process typically involves copying all of the data from one of the mirrors to the temporarily detached mirror. Unfortunately, this process could take hours. As a result, the host will typically detect a disruption to data access.
In order to alleviate the need to copy all of the data from a mirror during the recovery process, a Modified User Data (MUD) log is often used. A MUD log is typically maintained on a per-mirror basis. In other words, a separate log is maintained for each storage device. While this MUD logging process is effective for systems implementing disk-based virtualization, this type of process is ineffective in a system implementing network-based virtualization.
In a system in which a volume is exported by multiple network devices or multiple ports that may implemented on different network devices, the standard MUD logging process is ineffective. More particularly, write commands may be sent via different intelligent ports, as well as different network devices. Although the data in a volume could be modified through several intelligent ports or network devices, the intelligent ports or network devices cannot coordinate amongst themselves to maintain a consistent MUD log for the volume. As a result, managing and maintaining MUD logs becomes a difficult process.