Conventional computer products may include various reliability, availability, and serviceability (RAS) features targeted at limiting the system impact of, for example, soft and hard errors in a memory subsystem. For example, a memory controller may implement an “Error Correcting Code” (ECC) algorithm, where additional bits of data are stored along with each cache-line fragment such that any single bit error or combination of bit errors may be corrected in hardware. In addition, the memory controller may use multiple channels to enable memory mirroring. Mirroring data may concern maintaining two or more copies of data/datum in the main memory store. For example, the controller's first channel may be coupled to a memory or memory unit that stores primary data. The controller's second channel may be coupled to another memory unit that stores redundant data, which is redundant to the primary data. Thus, the second memory unit “mirrors” the primary data included in the first memory unit. Regardless, even memory systems that include techniques such as memory mirroring have shortcomings.
For example, with a three channel memory controller configured for memory mirroring, the first two channels may be used for memory mirroring (i.e., a primary channel for primary data and a mirror channel for redundant data) while the third channel is not utilized. As a result, a scenario may exist where even though the memory is mirrored across the first two channels, data redundancy is still lost. For instance, persistent uncorrectable errors may exist in one of the mirrored memory units or modules. When this “redundancy loss” occurs the memory may no longer have mirroring protection. Consequently, the system may be shutdown if another uncorrectable error occurs. Furthermore, even if the system can re-enable memory mirroring using a memory coupled to the first or second channels, the system may have another redundancy loss due to the same failing memory unit which could lead to a system shutdown.