Distributed storage systems play an important role in management of big data, particularly for data generated at tremendous speed. A distributed storage system may require many hardware devices, which often results in component failures that require recovery operations. Moreover, components in a distributed storage system may become unavailable, such as due to poor network connectivity or performance, without necessarily completely failing. In view that any individual storage node may become unreliable, redundancy measures are often introduced to protect data against storage node failures and outages, or other impediments. Such measures can include distributing data with redundancy over a set of independent storage nodes.
One relatively simple redundancy measure is replication. Replication, particularly triple replication, is often used in distributed storage systems to provide fast access to data. Triple replication, however, can suffer from very low storage efficiency which, as used herein, generally refers to a ratio of an amount of original data to an amount of actually stored data, i.e., data with redundancy. Error-correcting coding, and more particularly erasure coding, provides an opportunity to store data with a relatively high storage efficiency, while simultaneously maintaining an acceptable level of tolerance against storage node failure. Thus, a relatively high storage efficiency can be achieved by maximum distance separable (MDS) codes, such as, but not limited to, Reed-Solomon codes. Long MDS codes, however, can incur prohibitively high repair costs. In case of employing locally decodable codes, for example, any single storage node failure can be recovered by accessing a pre-defined number of storage nodes and by performing corresponding computations. Locally decodable codes (LDC) are designed to minimize I/O overhead. In the case of cloud storage systems, minimization of I/O overhead is especially desirable because data transmission can consume many resources, while computational complexity is less significant. In spite of promising theoretical results, the number of practical constructions of LDC codes is low. It is recognized by the inventors that some generalized concatenated codes (GCC) demonstrate a property of locality. Yet another important consideration regards bandwidth optimization, which leads to reduced latency. Regenerating codes can be used to reduce the amount of data transmitted during repair from each storage node. One drawback, however, is that advantages provided by regenerated codes are limited to partial read operations within storage system.
It is observed that requirements of error-correcting code in redundant arrays of independent disks (RAID) can be different, such as in view of computational complexity and storage efficiency. Moreover, the number of disks within a RAID is usually limited to a relatively low number, resulting in codes having a relatively small length being employed. Accordingly, array codes such as RDP, EVENODD, are not optimal for cloud storage systems and distributed storage systems, in general.
Yet another consideration of cloud storage systems is security and, more particularly, data encryption. The computation complexity of data encryption is high, unfortunately, and maintaining keys continues to be an operational issue. Alternative approaches can include mixing original data, such that any amount of original data can be reconstructed only by accessing not less than a pre-defined number of storage nodes. This pre-defined number of storage nodes is such that probability that a malicious adversary is able to access all these nodes is negligible.