The need for efficient storage systems and methods for massive amounts of data continues to grow. Currently, large data centers commonly employ blade servers that access a traditional storage system that includes scalable arrangements of physical shelves of memory devices (disks and/or flash) and storage controllers. Typically, the servers access the storage controllers over a network (Internet, local area network (LAN), storage area network (SAN), etc.), while the storage controllers communicate among themselves via a private backplane and communicate with shelves via fibre channel or serial-attached SCSI. The servers generally host applications or virtual machines (VMs), which allow for dynamic allocation of hardware resources, and have become a characteristic of modern data centers.
These traditional storage systems recently have been taking on an increasing variety of compute-intensive storage processing functions that now include: SHA-1 fingerprinting to support deduplication, compressing data to save storage capacity, encrypting data for security, replicating data for disaster recovery, computing erasure codes for RAID, computing checksums to ensure data integrity, garbage collection in log-structured file systems, managing multiple tiers of memory devices (e.g. RAM, flash and disk) including migrating data among the tiers to maximize performance, and maintaining complex data structures to support snapshots, rapid cloning, and thin-provisioning. To support these increasing loads, storage controllers have become ever more powerful, with more numerous and more powerful CPUs and larger amounts of RAM, to the point that the storage controllers are sometimes more powerful, and expensive, than the compute servers themselves.
A common characteristic of traditional storage systems is that they include a non-volatile memory in some form to address the problem of the high write latency of many persistent storage devices such as disk drives. The compute-intensive storage processing mentioned above can further increase write latency if it must be performed before the write data are written in processed form to the persistent storage devices. This non-volatile memory allows the storage system to acknowledge a new write as “safe” even before all the processing has occurred or before the data or the processed data has been written to the persistent storage devices. In traditional storage systems, the non-volatile memory is transparent to the servers writing the data. The compute servers write data to the storage system, the storage system buffers the data in non-volatile memory and acknowledges the write to the server. In the background, the storage system may do storage processing of the data and write the data to a persistent storage device without the compute server performing any additional operation on the data. Typically, compute servers do not include such non-volatile memory.
The compute servers typically share the resources of these traditional storage systems, including the capacity and performance of the memory devices as well as the storage processing capabilities the storage controllers. One disadvantage of this configuration is the slowdowns caused by contention for the shared resources. For example, assume that data is to be stored compressed, deduped with an associated, computed fingerprint, and encrypted. If the required computations are to be done within a storage controller, then performing the necessary computations for one server may cause an unacceptable or at least undesirable delay in servicing the requests from other servers. Such contention is not easy to detect or manage, which makes it difficult if not impossible to guarantee performance and any particular workload. One obvious way to reduce this risk of overloading of the computation resource in the nodes is to increase their computational power. Especially given that there may be many storage controllers, this approach is not only expensive, but also typically amounts to wasteful over-provisioning for most normal storage operations. Further, if the storage controllers are found to be underpowered, replacing them with a more powerful model can be expensive and require down time during the replacement or even require the migration of all data to a completely new storage system, causing significant disruption and typically taking weeks or months to complete.
Another characteristic of modern data centers is the increased use of solid state drive (SSD) devices (e.g. Flash devices) for caching data at various points in the storage architecture to increase I/O operations per second (IOPS). While current, traditional storage architectures for VMs improve upon older designs, they retain some legacy characteristics that prevent these architectures from being optimally efficient, both in terms of cost and in terms of ease of use. For example, current storage systems must define an elaborate data storage structure (LUNs, Volumes, etc.). Some current systems also require a layer of software to translate multiple transfer protocols into one proprietary protocol (see for example, SpinNP, and NetApp).
In environments that include virtualized storage, one trend found today is away from traditional storage architectures and structures and towards what is known as “hyper-converged” architectures, in which physical memory devices and server computational resources are all included in a single physical unit. Some of the claimed advantages of this architecture are that it avoids the expense of dedicated storage controllers and that the processing power available for storage processing grows as more such combined units are added to the system. Another claimed advantage is greater control over provisioning in an environment with virtual servers, as well as enabling a management console that integrates information from multiple components into a unified display (sometimes referred to as “single pane of glass” management. Examples of hyper-converged storage products include “Virtual SAN” by VMware, Inc., “OmniCube” by SimpliVity Corp., “Atomic Unit” by Nimboxx, Inc., and the “Virtual Compute Platform” by Nutanix, Inc.
Hyper-convergence has disadvantages as well, however. First, when different host platforms are expected to access a common storage resource: If one host depends on another host to access its data, then performance will often depend on how busy the VMs on the other host are. Because of this “noisy neighbor problem”, one particularly busy VM can degrade the performance of other VMs on the same host, and, if pooled, the noisiness may extend beyond the boundaries of that host to the whole pool. If needed data for a given VM is on another host, that VM may have to wait because it is being slowed down by a different VM on the hyper-converged host that includes the storage device that's needed. In short, the performance of a given VM may be constrained by other VMs on other hosts.
Another disadvantage of hyper-converged systems is that it is difficult to scale storage and compute resources independently. If the environment needs more storage capacity, it may be necessary to add a whole host with its included memory devices to the system even though the host's compute resources are not needed. Conversely, if more computing resources are needed, they will come with additional storage capacity whether or not it is needed.
What is needed is therefore a distributed storage system that provides both flexibility and scalability, that leverages the computational power of each server for storage processing and the high performance of local flash memory devices, and that also provides data sharing among all the servers in a group from an independently scalable pool of storage but minimizes inter-server coordination and communication to minimize the noisy-neighbor problem.