Distributed data stores, or in other words, storages, are deployed for the storage of huge volumes of data. Since such large-scale systems may be prone to frequent failure of individual components, they generally require redundancy at different levels to achieve fault-tolerance. At the data layer, redundancy may be achieved using either replication, or alternatively by employing error or erasure correcting codes. With the growing volume of data, the cost factors arising from storage overheads to realize redundancy are accentuated and therefore, one of the design objectives for a data storage system or its corresponding method of encoding data is to reduce storage overheads.
A vigorously studied problem is that of repairing erasure coded data. When a storage node storing an encoded piece fails permanently, it is desirable to recreate anew the corresponding information at a live node, so that the system remains resilient over time. A naive strategy to replenish redundancy may be to decode and re-encode, but this is expensive, particularly in terms of the usage of network resources. Regenerating codes which optimize the bandwidth usage for repairs may address this issue, but regenerating codes requires contacting many live nodes, which contradicts another design objective which is to reduce the number of live nodes to be contacted in order to carry out repairs. Reducing the number of live nodes to be contacted for carrying out repairs may lead to a reduction in repair bandwidth usage, better degraded reads, faster repairs, less number of input/output (I/O) operations, ability to repair multiple failures simultaneously, etc.
Therefore, there is a need for a method of encoding data that is able to achieve local repairability, in other words, a lesser number of surviving nodes is required to restore a lost data block, and fast creation of erasure coded data, using a single code.