Driven by the inevitable trend towards the cloud, more and more real-time in-memory computing applications are being served by large-scale parallel processing platforms (e.g., Spark and Hadoop). To facilitate the development of large-scale in-memory computing applications, most in-memory computing frameworks (e.g., Spark) rely on the use of immutable objects. e.g., Resilient Distributed Datasets (RDD) and DataFrame in Spark. This can obviate a large number of potential problems caused by the updates from multiple threads at once. The use of immutable objects (i.e., data that does not change once written) makes it very safe to share data across processes, and makes it fundamentally easy to gain fault tolerance and correctness.
In current practice, memory controllers (which are typically integrated in the CPU chip) are solely responsible for memory fault tolerance and typically use fine-grained memory error correction. In current mainstream computing systems, memory controllers employ SEC-DED (single error correction, double error detection) coding to protect each 8-bytes of user data with 1-byte coding redundancy, which is primarily for handling DRAM soft errors caused by radiation. As a result, DRAM modules are typically implemented with 72-bit DIMMs to accommodate such ECC configurations. For sub-20 nm DRAM and emerging new memory technologies (e.g., 3D XPoint), such a weak ECC scheme could be inadequate and the memory fault tolerance strength may have to increase at the cost of higher redundancy.