The present application relates generally to an improved data processing apparatus and method and more specifically to enhanced fsck mechanisms for improved consistency in case of erasure coded object storage architecture built using clustered file system.
The system utility fsck (for “file system consistency check”) is a tool for checking the consistency of a file system in UNIX™ and Unix-like operating systems, such as LINUX™ and OS X®. A similar command, CHKDSK exists in Microsoft® Windows®. Generally, fsck is run either automatically at boot time or manually by the system administrator. The command works directly on data structures stored on disk, which are internal and specific to the particular file system in use—so a matching fsck command tailored is generally required. The exact behaviors of various fsck implementations vary, but they typically follow a common order of internal operations and provide a common command-line interface to the user.
Most fsck utilities provide options for either interactively repairing damaged file systems (the user must decide how to fix specific problems), automatically deciding how to fix specific problems (so the user does not have to answer any questions), or reviewing the problems that need to be resolved on a file system without actually fixing them. Partially recovered files where the original file name cannot be reconstructed are typically recovered to a “lost+found” directory that is stored at the root of the file system. The file system is normally checked while unmounted, mounted read-only, or with the system in a special maintenance mode.
Erasure coding (EC) is a method of data protection in which data is broken into fragments, expanded and encoded with redundant data pieces and stored across a set of different locations or storage media. The goal of erasure coding is to enable data that becomes corrupted at some point in the disk storage process to be reconstructed by using information about the data that's stored elsewhere in the array. Erasure codes are often used instead of traditional redundant array of independent disks (RAID) because of their ability to reduce the time and overhead required to reconstruct data. The drawback of erasure coding is that it can be more processor-intensive, and that can translate into increased latency. Erasure coding can be useful with large quantities of data and any applications or systems that need to tolerate failures, such as disk array systems, data grids, distributed storage applications, object stores and archival storage. One common current use case for erasure coding is object-based cloud storage.
Erasure coding creates a mathematical function to describe a set of numbers so they can be checked for accuracy and recovered if one is lost. Referred to as polynomial interpolation or oversampling, this is the key concept behind erasure codes. In mathematical terms, the protection offered by erasure coding can be represented in simple form by the following equation: n=k+m. The variable “k” is the original amount of data or symbols. The variable “m” stands for the extra or redundant symbols that are added to provide protection from failures. The variable “n” is the total number of symbols created after the erasure coding process. For instance, in a 10 of 16 configuration, or EC 10/16, six extra symbols (m) would be added to the 10 base symbols (k). The 16 data fragments (n) would be spread across 16 drives, nodes or geographic locations. The original file could be reconstructed from 10 verified fragments.