The rise in electronic and digital device technology has rapidly changed the way society communicates, interacts, and consumes goods and services. Modern computing devices allow organizations and users to have access to a variety of useful applications in many locations. Using such applications results in the generation of a large amount of data. Storing and retrieving the produced data is a significant challenge associated with providing useful applications and devices.
The data generated by online services and other applications can be stored at data storage facilities. As the amount of data grows, having a plurality of users sending and requesting data can result in complications that reduce efficiency and speed. Quick and reliable access in storage systems is important for good performance.
Distributed encoded storage systems typically divide each data object to be stored into a plurality of data pieces, each of which is encoded into a plurality of encoded data fragments. The encoded data fragments are spread across multiple backend storage elements, thereby providing a given level of redundancy. A distributed encoded storage system maintains metadata which identifies each stored data object, specifies where and how in the system each data object is stored, including where the encoded data fragments have been distributed and hence from where they can subsequently be retrieved, what type of encoding has been used, etc. For each encoded fragment of the data object, an identifier, location information and encoding information are maintained. Thus, storage of a single data object generates a large amount of associated metadata.
As noted above, a distributed encoded storage system stores the encoded data fragments on storage elements in the backend. However, because the corresponding metadata is accessed frequently and needs to be provided with a high level of responsiveness, it is typically stored on storage elements other than those of the backend, as this would lead to unacceptable delays. Typically the backend storage elements on which data objects are stored are in the form of hard disks, while the metadata is stored on expensive, fast, low latency storage elements, such as solid state disks (“SSDs”). The separate storage of metadata on SSDs leads to the problem of a higher cost and typically a reduced level of durability.
Additionally the metadata storage needs to be provided with a suitable level of redundancy. In order to provide for a sufficient level of redundancy, the SSDs are often duplicated inside each datacenter, for example by making use of a triple modular redundancy configuration with majority vote logic to ensure redundancy against individual failures of the SSDs. In order to provide for a sufficient level of responsiveness, the metadata storage could also be duplicated in several geographically dispersed datacenters of the distributed encoded storage system. Further, the stored metadata is typically made accessible by means of high bandwidth connections and provided with high levels of processing power to guarantee the desired responsiveness when processing client requests. This results in the usage of a great deal of expensive high responsive storage elements, such as SSDs, expensive high bandwidth connections and expensive processing power.
It would be desirable to address at least these issues.