In a deduplicated file system, such as Data Domain™ file system from EMC® Corporation, there are two components responsible to manage the files in the system. The first one is directory manager (DM), which is a hierarchical mapping from the path to the inode representing a file. The second one is content store (CS), which manages the content of the file. Each file has a content handle (CH) that is stored in the inode that is created by CS every time the file content changes. Each CH represents a file that is abstracted as a Merkle tree of segments. A file tree can have up to multiple levels, such as 7 levels: L0, . . . , L6. The L0 segments represent user data and are the leaves of the tree. The L6 is the root of the segment tree. Segments from L1 to L6 are referred to as metadata segments or Lp segments. They represent the metadata of a file associated with a file tree. An L1 segment is an array of L0 references. Similarly an L2 is an array of L1 references and so on.
A segment is considered live if it can be referenced by any live content in the file system. The file system packs the segments into containers which are written to disk in a log-structured manner. Each container is structured into sections. The first section is the metadata section and the following sections are referred to as compression regions (CRs). A CR is a set of compressed segments. In the metadata section there are all the references or fingerprints that identify the segments in the container. A field called content type is also stored therein, which describes the content of the container. For instance, it describes which compression algorithm has been used, which type of segments the container has (L0, . . . , L6), etc. There is a container manager that is responsible to maintain the log-structured container set and provide a mapping from container identifiers (CID) to block offset on disk. This mapping is entirely stored in memory. It also contains additional information, e.g., the content type of each container. Hence, it is easy to traverse the container manager metadata and filter containers to load from disk based on their content type. For instance, processing logic can traverse the entire container set and only read containers that have L6 segments in them.
A cleaning process (also referred to as a garbage collection process) of the file system is responsible for enumerating all live segments in the live content handles of the file system. In a conventional logical enumeration algorithm, which is a depth-first traversal of all the file trees, each file tree is entirely traversed within a single context. Therefore it is possible to roll a checksum from the L0 segments toward the root of the tree and validate the checksum every time a file tree is traversed. However, with physical garbage collection the enumeration algorithm has been changed to carry out a breadth-first traversal of all the files in the file system. Hence the notion of a file tree does not exist since it does a level-by-level scan of all the trees simultaneously. Therefore the best one can do in terms of hardening the algorithm against bugs in the traversal algorithm is to roll a per-level checksum and match them in the end.
A physical garbage collector does not understand the concept of file trees. It traverses all the files simultaneously using a breadth-first approach. Hence it cannot roll a per-file-tree checksum that would allow the garbage collector identifying whether any metadata segment is missed, as one would do with the old algorithm based on depth-first traversal of each individual tree. This is a critical problem because the cleaning process implemented through the physical garbage collector could aggravate a corruption state that the file system is already at. Hence it is strategically important to harden the physical garbage collector to be resilient to undetected hardware/software bugs that may lead to corruption. Prior to performing a physical garbage collection, data integrity of the segments must be verified to avoid any data corruption. There has been a lack of efficient mechanism for verifying data integrity in such a scenario.