As is known in the art, multi-version concurrency control (MVCC) is a technique used by databases and storage systems to provide concurrent access to data. With MVCC, each user (e.g., system processes and processes that handle user traffic) sees a snapshot of the data at a particular instant in time. Any changes made by a user will not be seen by other users until the changes are committed. Among other advantages, MVCC provides non-blocking access to a shared resource (e.g., data).
Many storage systems use search trees (e.g., B+ trees) to provide efficient access to stored data. Distributed storage systems (or “clusters”) may manage thousands of search trees, each having a very large number (e.g., millions or even billions) of elements. Large search trees are typically stored to disk or other type of non-volatile memory.
To provide MVCC with search trees, a storage system may treat elements of a search tree as immutable. Under MVCC, a search tree may be updated by storing the new/updated data to unused portions of disk, and scheduling a tree update. During a tree update, at least one tree element is updated. In the case of a B+ tree, which includes a root node, internal nodes, and leaves, a tree update requires generating a new leaf to store the data, a new root node, and possibly new internal nodes. These new tree elements may be linked with existing tree elements to form a new search tree. Tree updates result in unused tree elements left on disk and, thus, storage systems typically include a process for detecting and reclaiming unused tree elements (referred to as “garbage collection”).
In some existing storage systems, storage space may partitioned into a set of fixed size blocks (referred to as “storage chunks”), which may store search tree elements. Under MVCC, storage chunks may be appended to, but are otherwise immutable. As a result, garbage collection can only be implemented at the chunk level, and only after it is confirmed that a storage chunk does not contain any referenced (or “live”) tree elements. A storage system may include a massive number (e.g., billions) of storage chunks. Determining which chunks should be considered during garbage collection (referred to herein as “GC scope”) is a complex task, particularly in the context of distributed storage systems.