At present, with the rapid development of information technology industry, more and more manufacturers choose to deploy a distributed system in products in consideration of aspects such as costs, reliability and so on. Therefore, the distributed system is developed quickly.
In the architecture of an existing distributed system, a file may be divided into a plurality of data blocks for storage. To ensure robustness and fault recovery capability of the system, a data block generally has a plurality of copies that are respectively stored in different physical positions. However, a fault tolerance method of the above plurality of copies requires more storage devices to be configured, thereby resulting in the increase of costs of storage devices. Taking three copies as an example, the fault tolerance method of the above plurality of copies will increase storage redundancy by 200% and storage cost by 200%.
Compared with the fault tolerance method of a plurality of copies, a Reed-Solomon (RS) method may generate a corresponding check block based on a designated data block and recover an invalid data block based on a valid data block and a check block when the data block is invalid. Thus, higher data reliability can be obtained with smaller data redundancy. For example, when the sizes of a designated data block and its corresponding check block are 100M and 30M respectively, the above RS method may realize storage reliability of three copies by using 30% redundancy.
However, during recovery of invalid data blocks, the above RS method usually needs to read all valid data blocks and check blocks, that is, the above RS method cannot effectively utilize Input/Output (I/O) during data recovery. Usually, the RS method of 30% redundancy needs to read data of 100M during data recovery, which results in ten times I/O consumption.