Storage devices, such as solid state drives (SSDs) and hard disk drives (HDDs), have generally managed the logical-to-physical mapping for data written to the media devices within the drive, such as dies of flash memory or spinning magnetic disks. This may enable individual storage devices to internally manage defects, wear levelling, garbage collection, allocation of spare storage medium capacity, and other functions to meet device capacity and performance metrics.
Some data storage system configurations are being developed that expose the physical structure of the storage medium to enable host systems to manage input/output (I/O) to physical locations. For example, Open-Channel SSD defines a class of SSDs that expose the internal parallelism of the SSD to the host and allow the host to manage the I/O through physical page addressing (PPA). This open architecture may enable the host to divide the capacity of the SSD into logical units that map to the physical units of the media devices. Control of I/O at the PPA-level may also allow the host to aggressively manage latency by controlling when and where reads and writes are scheduled and placed within the SSD. Workload optimizations may be implemented within a custom flash translation layer (FTL), file system, or within host applications.
Some target storage devices, such as Open Channel SSDs, may not implement recovery mechanisms, such as redundant array of independent disk (RAID)-like configurations across media devices, write caching to non-volatile memory, and similar approaches for preventing data loss during I/O operations. Uncorrectable errors, such as errors that cannot be recovered through error correction codes (UECCs), may be relatively common with some storage mediums.
Host systems and applications may compensate for this risk of data loss by implementing RAID configurations across dies. These RAID group writes may be managed at the host-level in applications, such as Open Channel SSD, storage network interface card (NIC), RAID host bus adapter (HBA), that may be capable of physical addressing of I/O to storage device media devices.
In some configurations, maintaining atomicity across the write group may be a problem, such as when the storage device supports cached writes and does not support power fail protection. For performance reasons, the storage device may acknowledge writes after writing to volatile cache before they are flushed to the non-volatile media devices, such as flash. A power fail or similar event after writes have been acknowledged but before the data is written to storage locations in the flash memory, introduces potential write holes in the write group, such as a RAID stripe, that the host assumes is consistent. It may be difficult for the host to determine what write groups have or have not been completed in the event of a power failure, absent a consistency check or similar data scan. Consistency checks after a power failure may be prohibitively long and costly in terms of availability of the storage device and its data for I/O operations.
Therefore, there still exists a need for storage devices that enable group writes to physical storage locations, while protecting against data corruption from lost writes to the write group.