The present disclosure relates to data recovery and/or regeneration.
Data can often become corrupted as it is transmitted, processed, and/or stored in memory. Parity bits are often used to verify the integrity of the data to ensure against corruption. Since the amount of data that is being processed and stored has increasingly dramatically over the past several years and is only accelerating, particularly in large storage applications, such as enterprise storage and cloud applications, the amount of parity-checking that is required is rising at a corresponding rate.
Conventional parity-checking algorithms, however, often require increasingly significant computational and storage requirements and are unable to scale at an acceptable rate. The above problem is further exacerbated when performing lost data reconstruction using previously computed parity information, because the requirements for accurately reconstructing the data are cumbersome and computationally expensive using conventional parity reconstruction techniques, particularly in the event of a storage device failure.
As a further example, for in the past decade or so, the term “regenerating code” has at times been used in the coding theory community to describe code constructions that focus on the amount repair traffic (data) that is required to repair one or more failures (1 failure, 2 failures, etc., depending on how many failures are tolerated).
For example, with 10 user nodes of 1 terabytes (TB) each, and 2 parity nodes, of 1 TB each, there are a total of 12 TB. Prior solutions would often use traditional Reed Solomon code for error correction, and would tolerate up to 2 node failures. In case of 1 node failure, the traffic would be 10 TB (9 surviving nodes with user data and 1 parity node), thus providing no traffic savings relative to the size of the original 10 user nodes. In case of 2 failures, the traffic would be the same 10 TB.