As computer systems scale to enterprise levels, particularly in the context of supporting large-scale data centers, the underlying data storage systems frequently employ a storage area network (SAN) or network attached storage (NAS). As is conventionally well appreciated, SAN or NAS provides a number of technical capabilities and operational benefits, fundamentally including virtualization of data storage devices, redundancy of physical devices with transparent fault-tolerant fail-over and fail-safe controls, geographically distributed and replicated storage, and centralized oversight and storage configuration management decoupled from client-centric computer systems management.
Architecturally, the storage devices in a SAN storage system (e.g., disk arrays, etc.) are typically connected to network switches (e.g., Fibre Channel switches, etc.) which are then connected to servers or “hosts” that require access to the data in the storage devices. The servers, switches and storage devices in a SAN typically communicate using the Small Computer System Interface (SCSI) protocol which transfers data across the network at the level of disk data blocks. In contrast, a NAS device is typically a device that internally contains one or more storage drives and that is connected to the hosts (or intermediating switches) through a network protocol such as Ethernet. In addition to containing storage devices, the NAS device has also pre-formatted its storage devices in accordance with a network-based file system, such as Network File System (NFS) or Common Internet File System (CIFS). As such, as opposed to a SAN which exposes disks (referred to as LUNs and further detailed below) to the hosts, which then need to be formatted and then mounted according to a file system utilized by the hosts, the NAS device's network-based file system (which needs to be supported by the operating system of the hosts) causes the NAS device to appear as a file server to the operating systems of hosts, which can then mount or map the NAS device, for example, as a network drive accessible by the operating system. It should be recognized that with the continuing innovation and release of new products by storage system vendors, clear distinctions between SAN and NAS storage systems continue to fade, with actual storage system implementations often exhibiting characteristics of both, offering both file-level protocols (NAS) and block-level protocols (SAN) in the same system. For example, in an alternative NAS architecture, a NAS “head” or “gateway” device is networked to the host rather than a traditional NAS device. Such a NAS gateway device does not itself contain storage drives, but enables external storage devices to be connected to the NAS gateway device (e.g., via a Fibre Channel interface, etc.). Such a NAS gateway device, which is perceived by the hosts in a similar fashion as a traditional NAS device, provides a capability to significantly increase the capacity of a NAS based storage architecture (e.g., at storage capacity levels more traditionally supported by SANs) while retaining the simplicity of file-level storage access.
SCSI and other block protocol-based storage devices, such as a storage system 30 shown in FIG. 1A, utilize a storage system manager 31, which represents one or more programmed storage processors, to aggregate the storage units or drives in the storage device and present them as one or more LUNs (Logical Unit Numbers) 34 each with a uniquely identifiable number. LUNs 34 are accessed by one or more computer systems 10 through a physical host bus adapter (HBA) 11 over a network 20 (e.g., Fiber Channel, etc.). Within computer system 10 and above HBA 11, storage access abstractions are characteristically implemented through a series of software layers, beginning with a low-level device driver layer 12 and ending in an operating system specific file system layers 15. Device driver layer 12, which enables basic access to LUNs 34, is typically specific to the communication protocol used by the storage system (e.g., SCSI, etc.). A data access layer 13 may be implemented above device driver layer 12 to support multipath consolidation of LUNs 34 visible through HBA 11 and other data access control and management functions. A logical volume manager 14, typically implemented between data access layer 13 and conventional operating system file system layers 15, supports volume-oriented virtualization and management of LUNs 34 that are accessible through HBA 11. Multiple LUNs 34 can be gathered and managed together as a volume under the control of logical volume manager 14 for presentation to and use by file system layers 15 as a logical device.
Storage system manager 31 implements a virtualization of physical, typically disk drive-based storage units, referred to in FIG. 1A as spindles 32, that reside in storage system 30. From a logical perspective, each of these spindles 32 can be thought of as a sequential array of fixed sized extents 33. Storage system manager 31 abstracts away complexities of targeting read and write operations to addresses of the actual spindles and extents of the disk drives by exposing to connected computer systems, such as computer systems 10, a contiguous logical storage space divided into a set of virtual SCSI devices, known as LUNs 34. Each LUN represents some capacity that is assigned for use by computer system 10 by virtue of existence of such LUN, and presentation of such LUN to computer systems 10. Storage system manager 31 maintains metadata that includes a mapping for each such LUN to an ordered list of extents, wherein each such extent can be identified as a spindle-extent pair <spindle #, extent #> and may therefore be located in any of the various spindles 32.
FIG. 1B is a block diagram of a conventional NAS or file-level based storage system 40 that is connected to one or more computer systems 10 via network interface cards (NIC) 11′ over a network 21 (e.g., Ethernet). Storage system 40 includes a storage system manager 41, which represents one or more programmed storage processors. Storage system manager 41 implements a file system 45 on top of physical, typically disk drive-based storage units, referred to in FIG. 1B as spindles 42, that reside in storage system 40. From a logical perspective, each of these spindles can be thought of as a sequential array of fixed sized extents 43. File system 45 abstracts away complexities of targeting read and write operations to addresses of the actual spindles and extents of the disk drives by exposing to connected computer systems, such as computer systems 10, a namespace comprising directories and files that may be organized into file system level volumes 44 (hereinafter referred to as “FS volumes”) that are accessed through their respective mount points.
Even with the advancements in storage systems described above, it has been widely recognized that they are not sufficiently scalable to meet the particular needs of virtualized computer systems. For example, a cluster of server machines may service as many as 10,000 virtual machines (VMs), each VM using a multiple number of “virtual disks” and a multiple number of “snapshots,” each which may be stored, for example, as a file on a particular LUN or FS volume. Even at a scaled down estimation of 2 virtual disks and 2 snapshots per VM, this amounts to 60,000 distinct disks for the storage system to support if VMs were directly connected to physical disks (i.e., 1 virtual disk or snapshot per physical disk). In addition, storage device and topology management at this scale are known to be difficult. As a result, the concept of datastores in which VMs are multiplexed onto a smaller set of physical storage entities (e.g., LUN-based VMFS clustered file systems or FS volumes), such as described in U.S. Pat. No. 7,849,098, entitled “Providing Multiple Concurrent Access to a File System,” incorporated by reference herein, was developed.
In conventional storage systems employing LUNs or FS volumes, workloads from multiple VMs are typically serviced by a single LUN or a single FS volume. As a result, resource demands from one VM workload will affect the service levels provided to another VM workload on the same LUN or FS volume. Efficiency measures for storage such as latency and input/output operations (TO) per second, or TOPS, thus vary depending on the number of workloads in a given LUN or FS volume and cannot be guaranteed. Consequently, storage policies for storage systems employing LUNs or FS volumes cannot be executed on a per-VM basis and service level agreement (SLA) guarantees cannot be given on a per-VM basis. In addition, data services provided by storage system vendors, such as snapshot, replication, encryption, and deduplication, are provided at a granularity of the LUNs or FS volumes, not at the granularity of a VM's virtual disk. As a result, snapshots can be created for the entire LUN or the entire FS volume using the data services provided by storage system vendors, but a snapshot for a single virtual disk of a VM cannot be created separately from the LUN or the file system in which the virtual disk is stored.