The present invention relates generally to the field of logical block addressing (LB A), and more particularly to recovery from LBA errors.
Logical block addressing (LBA) is used to define respective locations of blocks of data stored on computer storage devices (for example, secondary storage systems such as hard disk drives or solid-state disk drives (SSD)). LBA functionality is typically implemented as a simple linear addressing scheme. In LBA, blocks of data are located by an index value in the form of an integer, with the first block being designated as LBA 0, the second block being designated as LBA 1, and so on. Various LBA standards are characterized by various numbers of bits, such as 22-bit LBA, 28-bit LBA, 32-bit LBA, 48-bit LBA and 64-bit LBA. The number of bits refers to the maximum size of entries of the data structures holding the LBA addresses. The larger the number of bit, the larger the number of logical blocks that can be given unique addresses by the LBA system. Data storage devices (that is, non-volatile storage devices) typically use data protection layers (DPL) to allow for error detection and/or recovery of corrupted data stored in the storage device. To do so, DPLs add redundant data. As the term is used herein “redundant data” refer to any type additional data stored for the purposes of error detection and/or recovery. Depending on the DPL, the actual number of errors may differ from the number of errors that can be detected or corrected and some DPLs may only detect errors but not correct them. Some currently conventional types of DPLs include: parity data, error correction codes, cyclic redundancy check (CRC) data, erasure coding, replication and the like.
Modern all-flash arrays typically use forms of LBA that are implemented with several data protection layers (DPL). Each DPL typically performs error detection and correction to protect against a large variety of media-related failures and system-level failures. The error detection and/or correction techniques at a given DPL may involve the use of redundant data (for example, parity data, mirrored data) and/or striping. Typically, on the lowest DPL of an all-flash array, error correction codes (ECC) are used to detect and correct flash media errors within each codeword in a physical flash page. The next higher DPL uses parity information inside a flashcard (or SSD) to protect against chip, channel, and plane failures. Parity information is also typically used on top of flashcards (or SSDs) to protect against the failure of one or more flashcards in the array. The parity schemes used at the flashcard and array level are typically RAID-like parity schemes, for example, RAID-5/6 or similar (note: RAID is short for Redundant Array of Independent Storage Devices). Storage systems may further implement DPLs that use other data protection algorithms such as replication or erasure coding or combinations thereof. Different DPL layers, with their respective types of redundant data schemes, typically operate independently of each other. Herein, the error correction performed by the DPL is called DPL-CLR (short for data protection layer corrupt LBA recovery).
U.S. Pat. No. 9,569,306 states as follows: “A data storage system includes a controller and a non-volatile memory array having a plurality of blocks each including a plurality of physical pages. The controller maintains a logical-to-physical translation (LPT) data structure that maps logical addresses to physical addresses and implements a first data protection scheme that stripes write data over the plurality of physical blocks. In response to a read request requesting data from a target page stripe, the controller detecting errors in multiple physical pages of the target page stripe. In responsive to detecting errors in multiple physical pages of the target page stripe, the controller scans the LPT data structure to identify a set of logical addresses mapped to the target page stripe and triggers recovery of the target page stripe by a higher level controller that implements a second data protection scheme, wherein triggering recovery includes transmitting the set of logical addresses to the higher level controller.”