Multi-device storage systems utilize multiple discrete storage devices, generally disk drives (solid-state drives, hard disk drives, hybrid drives, tape drives, etc.) for storing large quantities of data. These multi-device storage systems are generally arranged in an array of drives interconnected by a common communication fabric and, in many cases, controlled by a storage controller, redundant array of independent disks (RAID) controller, or general controller, for coordinating storage and system activities across the array of drives. The data stored in the array may be stored according to a defined RAID level, a combination of RAID schemas, or other configurations for providing desired data redundancy, performance, and capacity utilization. In general, these data storage configurations may involve some combination of redundant copies (mirroring), data striping, and/or parity (calculation and storage), and may incorporate other data management, error correction, and data recovery processes, sometimes specific to the type of disk drives being used (e.g., solid-state drives versus hard disk drives).
There is an emerging trend in the storage industry to deploy disaggregated storage. Disaggregated storage brings significant cost savings via decoupling compute and storage node life cycles and allowing different nodes or subsystems to have different compute to storage ratios. In addition, disaggregated storage allows significant flexibility in migrating compute jobs from one physical server to another for availability and load balancing purposes.
Disaggregated storage has been implemented using a number of system architectures, including the passive Just-a-Bunch-of-Disks (JBOD) architecture, the traditional All-Flash Architecture (AFA), and Ethernet Attached Bunch of Flash (EBOF) disaggregated storage, which typically uses specialized chips from Mellanox or Kazan to translate commands from external NVMe-OF™ (Non-Volatile Memory Express' over Fabrics) protocol to internal NVMe (NVM Express™) protocol. These architectures may not make the best use of the I/O bandwidth, processing, and buffer memory of the individual storage devices, such as solid-state drives (SSDs) in such systems. In addition, some of these architectures place significant compute resources in a centralized storage controller, which may lead to challenges scaling solutions as the number and size of SSDs increases.
Therefore, there still exists a need for disaggregated storage architectures that distribute memory and compute resources across storage devices, such as SSDs, and enable reliable data management services in the face of drive failures and/or system power interruptions.