The use of distributed computing systems, e.g., “cloud computing,” has become increasingly common for consumers and enterprises, especially for data storage. This so-called “cloud data storage” employs large numbers of networked storage servers that are organized as a unified repository for data, and are configured as banks or arrays of magnetic hard disk drives (HDDs) and/or solid-state drives (SSDs). Typically, these servers are arranged in high-density configurations to facilitate such large-scale operation. For example, a single cloud data storage system may include thousands or tens of thousands of storage servers installed in stacked or rack-mounted arrays.
The majority of storage in cloud data storage systems is provided by HDDs, due to the low cost-to-capacity associated with such drives. Object-oriented database management systems using “key-value pairs” have a number of advantages over relational database systems, including a flexible data model that has no structure to the data; scalability; simple access application program interfaces (APIs); and enabling clients to define and change the structure of data anytime without impacting the database. A key-value pair is a set of two linked data items: a key, which is a unique identifier for some set of data, and a value, which is the set of data associated with the key. Distributed computing systems using key-value pairs provide a high performance alternative to relational database systems, since an obsolete value is not overwritten when a new version of the value is stored. Instead, newly received key-value pairs can be written in a continuous sequential writing process, thereby eliminating the latency associated with seeking to a different location in an HDD for each newly received key-value pair.
One drawback with storing object-based data on HDDs is that disk errors that render a part of the drive inaccessible or corrupted can render the entire HDD unusable, even though the vast majority of the storage space in the HDD can reliably store data. This is because important metadata associated with each value is included in the associated key that identifies that particular value. Thus, if a corrupted or otherwise unreadable portion of an HDD includes the metadata of one or more values, the identity of the one or more values is lost. In such a scenario, determination of what values may be affected by the corrupted portion of the HDD can be problematic. Furthermore, many storage systems stores have a mapping index of the locations of objects stored on the same HDD as the actual data included in these objects. Thus, if a part of HDD is corrupted, the mapping index can get damaged, and potentially a large portion of data on the HDD can become unreachable.