Entities such as companies gather, store, and analyze an increasing amount of data. Clusters of computing devices are used to facilitate efficient, cost-effective storage of large amounts of data. For example, a cluster network environment of computing devices (nodes) may be implemented as a data storage system to facilitate the creation, storage, retrieval, and/or processing of digital data. Such a data storage system may be implemented using various storage architectures, such as a network-attached storage (NAS) environment, a storage area network (SAN), a direct-attached storage environment, and combinations thereof. The data storage systems may comprise one or more data storage devices configured to store digital data within data volumes.
The data can be organized as large data objects. Due to the size, large data objects are sometimes divided into multiple data segments stored in separate data storage nodes. The data segments are further divided into multiple data fragments, which are stored data storage devices of a data storage node. As a result, a data storage node can store millions, or even billions, of data fragments for different data objects.
Typically, the data storage node maintains a database for organizing and storing the metadata of the data fragments. When the data storage node receives a request for accessing data fragments of a particular data object, the data storage node scans the database to identify the data fragments of the data object and to retrieve the file system locations of the identified data fragments. Then the data storage node reads the contents of the identified data fragments based on the file system locations. However, it is often a challenge to efficiently identify the data fragments from the database that stores metadata for millions of data fragments.