As computer systems scale to enterprise levels, particularly in the context of supporting large-scale data centers, the underlying data storage systems frequently employ a storage area network (SAN) or network attached storage (NAS). As is conventionally well appreciated, SAN or NAS provides a number of technical capabilities and operational benefits, fundamentally including virtualization of data storage devices, redundancy of physical devices with transparent fault-tolerant fail-over and fail-safe controls, geographically distributed and replicated storage, and centralized oversight and storage configuration management decoupled from client-centric computer systems management.
SCSI and other block protocol-based storage devices, such as a storage system 30 shown in FIG. 1A, utilize a storage system manager 31, which represents one or more programmed storage processors, to aggregate the storage units or drives in the storage device and present them as one or more LUNs (Logical Unit Numbers) 34 each with a uniquely identifiable number. LUNs 34 are accessed by one or more computer systems 10 through a physical host bus adapter (HBA) 11 over a network 20 (e.g., Fibre Channel, etc.). Within computer system 10 and above HBA 11, storage access abstractions are characteristically implemented through a series of software layers, beginning with a low-level device driver layer 12 and ending in an operating system specific file system layers 15. Device driver layer 12, which enables basic access to LUNs 34, is typically specific to the communication protocol used by the storage system (e.g., SCSI, etc.). A data access layer 13 may be implemented above device driver layer 12 to support multipath consolidation of LUNs 34 visible through HBA 11 and other data access control and management functions. A logical volume manager 14, typically implemented between data access layer 13 and conventional operating system file system layers 15, supports volume-oriented virtualization and management of LUNs 34 that are accessible through HBA 11. Multiple LUNs 34 can be gathered and managed together as a volume under the control of logical volume manager 14 for presentation to and use by file system layers 15 as a logical device.
FIG. 1B is a block diagram of a conventional NAS or file-level based storage system 40 that is connected to one or more computer systems 10 via network interface cards (NIC) 11′ over a network 21 (e.g., Ethernet). Storage system 40 includes a storage system manager 41, which represents one or more programmed storage processors. Storage system manager 41 implements a file system 45 on top of physical, typically disk drive-based storage units, referred to in FIG. 1B as spindles 42, that reside in storage system 40. From a logical perspective, each of these spindles can be thought of as a sequential array of fixed sized extents 43. File system 45 abstracts away complexities of targeting read and write operations to addresses of the actual spindles and extents of the disk drives by exposing to connected computer systems, such as computer systems 10, a namespace comprising directories and files that may be organized into file system level volumes 44 (hereinafter referred to as “FS volumes”) that are accessed through their respective mount points.
It has been recognized that the storage systems described above are not sufficiently scalable to meet the particular needs of virtualized computer systems. For example, a cluster of server machines may service as many as 10,000 virtual machines (VMs), each VM using a multiple number of “virtual disks” and a multiple number of “snapshots,” each of which may be stored, for example, as a file on a particular LUN or FS volume. Even at a scaled down estimation of 2 virtual disks and 2 snapshots per VM, this amounts to 60,000 distinct disks for the storage system to support if VMs were directly connected to physical disks (i.e., 1 virtual disk or snapshot per physical disk). In addition, storage device and topology management at this scale are known to be difficult. As a result, the concept of datastores in which VMs are multiplexed onto a smaller set of physical storage entities (e.g., LUN-based VMFS clustered file systems or FS volumes), such as described in U.S. Pat. No. 7,849,098, entitled “Providing Multiple Concurrent Access to a File System,” incorporated by reference herein, was developed.
In conventional storage systems employing LUNs or FS volumes, workloads from multiple VMs are typically serviced by a single LUN or a single FS volume. As a result, resource demands from one VM workload will affect the service levels provided to another VM workload on the same LUN or FS volume. Efficiency measures for storage such as latency and input/output operations per second, or IOPS, thus vary depending on the number of workloads in a given LUN or FS volume and cannot be guaranteed. Consequently, storage policies for storage systems employing LUNs or FS volumes cannot be executed on a per-VM basis and service level agreement (SLA) guarantees cannot be given on a per-VM basis. In addition, data services provided by storage system vendors, such as snapshot, replication, encryption, and deduplication, are provided at a granularity of the LUNs or FS volumes, not at the granularity of a VM's virtual disk. As a result, snapshots can be created for the entire LUN or the entire FS volume using the data services provided by storage system vendors, but a snapshot for a single virtual disk of a VM cannot be created separately from the LUN or the file system in which the virtual disk is stored.
An object-based storage system disclosed in U.S. patent application Ser. No. 13/219,358, filed Aug. 26, 2011, incorporated by reference herein, provides a solution by exporting logical storage volumes that are provisioned as storage objects, referred to herein as “virtual volumes.” These storage objects are accessed on demand by connected computer systems using standard protocols, such as SCSI and NFS, through logical endpoints for the protocol traffic that are configured in the storage system. Logical storage volumes are created from one or more logical storage containers having an address space that maps to storage locations of the physical data storage units. The reliance on logical storage containers provide users of the object-based storage system with flexibility in designing their storage solutions, because a single logical storage container may span more than one physical storage system and logical storage containers of different customers can be provisioned from the same physical storage system with appropriate security settings. In addition, storage operations such as snapshots, cloning, etc. of the virtual disks may be offloaded to the storage system using the logical storage volumes.
Storage systems typically employ a copy-on-write (COW) approach to take disk snapshots. Under such an approach, IOs are performed to a base disk, while snapshots are maintained as mainly read-only point-in-time copies of the disk to which contents of base disk may be copied on writes to preserve the state of the disk at the previous times. In contrast, the connected computer system may employ redo-based disk snapshotting. When a redo-based snapshot is taken, a redo log is created that points to an immediately prior redo log, which itself points to another redo log, and so on, until a base disk is reached. The chain of redo logs tracks differences between the base disk state and later disk states. As COW- and redo-based approaches are semantic opposites, redo-based snapshots are typically not supported by COW-based storage systems, i.e., redo-based snapshots cannot be readily offloaded to such systems. Further, the storage pools for snapshots in the COW-based storage system may be optimized for read-only snapshots and therefore not optimal for writable redo logs.