Generally, multi-version concurrency control (MVCC) is a technique used by databases and storage systems to provide concurrent access to data. With MVCC, a user of a storage system sees a snapshot of the database at a particular instant in time. One user's changes to the database are not seen by other users until the changes have been completed.
Some data storage systems use search trees (e.g., B+ trees) to provide efficient access to stored data. Distributed storage systems (or “clusters”) may manage a large number of search trees, each often having a very large number of elements. The trees maintained are therefore often large and typically a substantial portion of each tree is stored on hard disk drives or other suitable non-volatile memory. The trees are shared by cluster nodes using MVCC.
To provide MVCC with search trees, a storage system may treat elements of a search tree as immutable. Under MVCC, a search tree may be updated by storing the new/updated data to unused portions of disk, and scheduling a tree update. During a tree update, at least one tree element is updated. In the case of a B+ tree, which includes a root node, internal nodes, and leaves, a tree update requires generating a new leaf to store the data, a new root node, and possibly new internal nodes. These new tree elements may be linked with existing tree elements to form a new search tree. Tree updates result in unused tree elements left on disk and, thus, storage systems typically include a process for detecting and reclaiming unused tree elements (referred to as “garbage collection”). When data updates are massive, such trees cause severe hard drive space fragmentation. To address this issue, some modern computer systems are using a copying garbage collector to manage the fragmentation problem. However, the current copying garbage collectors are resource demanding processes.