Certain types of memory resources have high failure rates compared to most other platform components. For example, DDR (dual data rate) memory devices experience higher rates of failure than most other components (such as processors, storage, interface components, and/or others) that are part of a computing platform or server environment. Long-term storage components also experience significant rates of failure. Given that failures to the memory devices cause downtime and require servicing to a system, higher platform RAS (reliability, availability, and serviceability) is preferred.
Traditionally there are multiple different sparing techniques employed to survive hard DRAM (dynamic random access memory) failures or hard errors, which can push out service requirements. A hard error refers to an error with a physical device which prevents it from reading and/or writing correctly, and is distinguished from transient errors which are intermittent failures. Techniques are known for SDDC (single device data correction) and DDDC (double device data correction) to address hard failure. However, despite techniques for pushing out servicing of a memory subsystem, failure rates remain higher than desired, especially for larger memory configurations.
Descriptions of certain details and implementations follow, including a description of the figures, which may depict some or all of the embodiments described below, as well as discussing other potential embodiments or implementations of the inventive concepts presented herein.