The present invention relates to data storage systems, and more specifically, this invention relates to leveraging distributed metadata to achieve file specific data scrubbing.
The continued increase of resilience for filesystems is an issue which is addressed on ongoing basis. Most filesystems initially start as a superblock which is located at an arbitrary address in a logical unit number (LUN), volume, logical disk, etc. From the superblock, a structure is built which ultimately becomes the filesystem. Files (e.g., unique groupings of data) stored on the filesystem are typically separated into a number of blocks which are spread across the persistent storage space of memory in order to achieve even distribution, e.g., for performance reasons. Thus, in order to describe each of these blocks and keep track of where they are located in the persistent storage space, a structure called a central inode list is used. This inode list identifies where the logical block addresses (LBAs) that constitute the various portions (e.g., blocks) of a given file are located in the memory space, and how these correlate to the corresponding locations in the persistent storage space. Accordingly, each file has an inode which is stored in one or more arrays depending on the architecture.
However, conventional products experience issues which revolve around the decay of data storage stability over time. For instance, a majority of the data written in large conventional storage systems can experience long periods of idleness, broken up by occasional data accesses. In such conventional systems, memory components (e.g., data storage disks) may fail, and as a result, the data stored therein can be lost and/or corrupted without any forewarning. These conventional systems can lose data for a variety of reasons, including failures at the device level and/or the block level.
In an attempt to overcome these issues, efforts have been made to detect precursors of these issues in the early stages of development such that built-in redundancy can be used to protect the data stored in memory. One technique which has been used is called “disk scrubbing” in which system hard disks are periodically accessed to detect drive failure. By scrubbing the data stored on the hard disks, block failures can be detected and compensated for by rebuilding the affected blocks.
However, because conventional data scrubbing operations are performed at the block level in storage systems block-based memory, the conventional storage devices can at most report the LBA of any data identified as having storage issues, which is also referred to herein as “bad data”. Accordingly, converting the reported LBAs to actual displacements of the files which correspond to this identified bad data has not been possible in conventional products, and certainly not in an efficient manner. This is particularly undesirable as processing delay is increase while system efficiency is decreased as a result of being unable to make this translation.