Each rack in a datacenter contains a combination of sleds and/or trays for compute and storage devices. A hyperscale datacenter may include several performance optimized datacenters (PODs), and each POD can include multiple racks and hundreds and thousands of compute and/or storage devices.
Non-volatile memory express (NVMe) defines a register-level interface for host software to communicate with a non-volatile memory subsystem (e.g., a solid-state drive (SSD)) over a peripheral component interconnect express (PCIe) bus. NVMe over fabrics (NVMeoF) (or NVMf in short) defines a common architecture that supports an NVMe block storage protocol over a wide range of storage networking fabrics such as Ethernet, Fibre Channel, InfiniBand, a transmission control protocol (TCP) network, and other network fabrics.
In an NVMeoF-based storage system, each storage device, also referred to as an NVMeoF-compatible SSD or Ethernet SSD, is identified by its subsystem NVMe Qualified Name (NQN), Internet Protocol (IP) address, port ID, and/or controller ID. Each POD manager needs to know the status and availability of the storage devices on the fabrics (e.g., Ethernet) to allocate and schedule workloads. Some of the storage devices may be down for maintenance, and workloads cannot be assigned to those storage devices. Further, the collection, maintenance, and update of the discovery information from hundreds and thousands of storage devices that are distributed over the fabrics in a hyperscale datacenter environment is not a trivial task.