An error-tolerant computer generally has an ECC (Error Correction Code, error correction code) error check and correction function, and memory backup is an important reliability feature of the error-tolerant computer. The memory backup means that, when the number of correctable errors in a memory area (generally in units of Rank or DIMM) exceeds a certain threshold, the memory area is considered unreliable and is most likely to be corrupted, so that data stored in the memory area is lost. Then, a memory controller reports the erroneous memory area to an operating system; the operating system searches for a proper backup memory area according to a size of the erroneous memory area, and instructs the memory controller to start a memory backup operation; and the memory controller moves data stored in the erroneous memory area to the backup memory area. In this way, a risk caused by the unreliable memory is avoided.
The memory backup operation is implemented by a backup engine module in the memory controller. A specific process is as follows:
When the number of correctable errors in a memory area exceeds a specific threshold, the memory controller starts a backup operation. The backup engine module firstly sets a read address of the backup engine to an initial address of the erroneous memory area, and initiates a read command to read data from the erroneous memory area. The backup engine module usually initiates a plurality of read commands successively to read data from the erroneous memory area successively. After receiving the data read from the erroneous memory area, the memory controller performs error correction on the data by using the ECC, and initiates a write command to write the corrected data to the backup memory space. If data in the erroneous memory area is not moved completely, a read address is added to continue to read subsequent data, and the foregoing operations are repeated until the memory backup operation is complete.
In the memory backup operation, the operating system may initiate a read and write operation command on the erroneous memory area. If a sequence number of the address in the command is smaller than a maximum sequence number of the address of data that has been read by the backup engine module and is greater than a maximum sequence number of the address of data that has been written by the backup engine module, the read and write operation command is blocked directly.
During the implementation of the present invention, the inventor discovers that the prior art has at least the following problem: in the current technical solution, a conflicting system command is blocked directly, which affects performance of read and write operations of the system.