In a storage system, an applied erasure code is mainly set by using two important parameters, namely, k and m. A (k, m) erasure code divides original data into k data strips (e.g., disk blocks), and encodes them into m parity strips, so that the original data can be recovered by a combination of any k data strips and parity strips. Because the erasure coding technology has relatively high storage efficiency, more and more storage systems use an erasure coding solution to ensure data reliability. In the erasure code, a set of k+m data strips and parity strips that determine a correlation depending on an erasure code constitutes a “stripe”. Generally, the erasure-coded storage system can be logically seen as a combination of multiple stripes. A popular family of erasure code is called XOR-based erasure code, which performs encoding/decoding operations by using XOR operations only. In XOR-coded storage systems, a strip is further partitioned into many elements with equal-size, where element is the basic operated unit (e.g., a byte or a sector).
In the erasure-coded storage system, a main objective of optimizing recovery of a single-disk failure is to reduce an amount of read data (or called disk I/O) and achieve fast recovery.
However, in a disk, to read a needed amount of data, a magnetic head needs to be first rotated to a position where the data resides, and then read the data. A phase of magnetic head rotation and preparation before the needed data is read is generally called a “seek operation”. Due to characteristics of the mechanical hard disk, the seek operation causes considerable latency, which is a bottleneck restricting a breakthrough for current random access performance of the disk. Conventional optimization on the single-disk failure recovery focuses on the reduction of data that needs to be read during failure recovery, but neglects a consequence that a quantity of seeks may be derived during the recovery optimization. The increasing quantity of seeks will prolong the recovery time. Therefore, it is a very important issue to pay close attention to the amount of data that needs to be read and the quantity of seek operations for recovering a single-disk failure.
Methods for optimizing recovery of a single-disk failure in related technologies can greatly reduce the amount of data read from surviving disks during the recovery, but still have a lot of disadvantages. For example, first, some methods for optimizing recovery of a single-disk failure are only applicable to a specific erasure code, for example, RDP code and X-code, both of which are typical RAID6 codes and can tolerate at most double disk failures, lacking universality; second, some methods for optimizing recovery of a single-disk failure have good universality, that is, the methods are applicable to any XOR-based erasure code, but finding an optimal recovery solution is an NP-hard problem; although an optimal solution for a homogeneous environment can be calculated and stored in advance, so that the failure of a single disk can be directly recovered by following the stored solution and reading a smallest amount of data, it needs to take a long calculation time to find an optimal recovery solution for a heterogeneous scenario with frequently changed system configurations, and therefore a single-disk failure cannot be handled efficiently in real time; third, some methods for optimizing recovery of a single-disk failure can find a near-optimal recovery solution in theory within polynomial time, but they only consider the scenario of a single stripe; however, an actual storage system should be a logically combination of multiple independent stripes after erasure coding, and therefore these methods have some limitations. In addition, most importantly, most methods for optimizing recovery of a single-disk failure in the related technologies do not consider the optimization of the seek operation and need further improvement.