Network systems and storage devices need to reliably handle and store data and, thus, typically implement some type of scheme for recovering data that has been lost, degraded or otherwise compromised. At the most basic level, one recovery scheme could simply involve creating one or more complete copies or mirrors of the data being transferred or stored. Although such a recovery scheme may be relatively fault tolerant, it is not very efficient with respect to the need to duplicate storage space. Other recovery schemes involve performing a parity check. Thus, for instance, in a storage system having stored data distributed across multiple disks, one disk may be used solely for storing parity bits. While this type of recovery scheme requires less storage space than a mirroring scheme, it is not as fault tolerant, since any two device failures would result in an inability to recover any compromised data.
Thus, various recovery schemes have been developed with the goal of increasing efficiency (in terms of the amount of extra data generated) and fault tolerance (i.e., the extent to which the scheme can recover compromised data). These recovery schemes generally involve the creation of erasure codes that are adapted to generate and embed data redundancies within original data packets, thereby encoding the data packets in a prescribed manner. If such data packets become compromised, as may result from a disk or sector failure, for instance, such redundancies could enable recovery of the compromised data, or at least portions thereof. Various types of erasure codes are known, such as Reed-Solomon codes, RAID variants, array codes (e.g., EVENODD, RDP, etc.) and XOR-based erasure codes. However, encoding or decoding operations of such erasure codes often are computationally demanding, typically rendering their implementation cumbersome in network systems, storage devices, and the like.
In addition, determining the fault tolerance of a particular erasure code, and thus the best manner in which to implement a selected code can be challenging. For instance, fault tolerance determinations often do not factor in the fault tolerance of the devices themselves, thus leading to imprecision in assessing the actual fault tolerance of the recovery scheme. Thus, efforts to select an optimal erasure code implementation for a particular system could be impeded. Further, uncertainty regarding the fault tolerance of a particular code can impact the manner in which data is allocated among various storage devices and/or communication channels. Such uncertainty could hamper a user's ability to optimally store and/or allocate data across storage devices. Similarly, such uncertainty also could hamper efforts to allocate and route data across communication network channels, inasmuch as those systems could not function as desired. Moreover, the irregular fault tolerance of such erasure codes also makes evaluation of the reliability of a storage system challenging because of the need to accurately determine which sets of disk and sector failures lead to data loss.