Computer devices are typically provided with extensive memory systems, increasing the probability of memory errors. Memory errors may occur in the computer memory during a store and/or retrieve operation, causing incorrect or even no data at all being returned. These memory errors may be “soft” or “hard” errors. Soft memory errors occur when data randomly changes state. Hard memory errors occur when one or more of the computer memory chips fail. Both types of memory errors are undesirable, and may even be unacceptable, particularly in high-end computer systems, such as, e.g., in network servers.
Error Correction Code (ECC) operations may be implemented to handle soft memory errors. ECC algorithms typically include check bits along with the data bits being stored in memory. When the data is retrieved from memory, the ECC algorithm evaluates the check bits and data bits to automatically detect and correct errors in the data. Although the ECC algorithm may detect multiple-bit errors, only correct single-bit errors can be corrected.
Advanced ECC operations (also referred to as “chip correct”) are available to detect and correct multi-bit memory errors. In addition, advanced ECC algorithms also report errors (e.g., to the system BIOS or other firmware). The error reports may be used to identify hard memory errors so that one or more failed memory modules can be replaced.
However, conventional ECC algorithms implement a word size that must be less than or equal to the smallest critical word length. Accordingly, these ECC algorithms do not provide the desired level of data correction/fault isolation.