Driven by the inevitable trend towards the cloud, more and more real-time in-memory computing applications are being served by large-scale parallel processing platforms (e.g., Hadoop). As a result, large-scale parallel processing platforms must employ distributed in-memory data storage systems to realize data sharing and exchange among different in-memory computation frameworks and jobs. Distributed in-memory data storage systems form a large-scale distributed cache layer sitting between in-memory computation frameworks/jobs and persistent storage systems (e.g., Amazon S3 and HDFS).
However, in-memory storage is fundamentally subject to two cost issues, and how well one can tackle these two cost issues largely affects the overall system performance of future large-scale parallel processing platforms: (1) Memory resource cost: It is apparent that in-memory data storage tends to occupy a large amount of memory capacity. This will become increasingly significant as more and more memory-centric data processing tasks are being migrated onto a single large-scale parallel processing platform. This directly results in memory resource confliction between the application layer and the underlying in-memory data storage layer. In spite of the continuous scaling of DRAM beyond the 20 nm node and the maturing new low-cost memory technologies (e.g., 3D XPoint), the ever-increasing demand for more memory capacity will keep memory as one of the most expensive resources. Hence, it is highly desirable to minimize the memory capacity (and hence cost) overhead induced by in-memory data storage systems. (2) Computation cost: Different from a traditional buffer pool mechanism, in-memory data storage systems hold the data in the storage-oriented format (e.g., JSON, Parquet, and ORC) other than as in-memory objects. Therefore, when moving data across the application layer and in-memory storage layer, data format conversion is required and can result in significant computation cost. In addition, as the most obvious option to reduce memory capacity overhead of in-memory data storage, data compression inevitably leads to further computation cost. This directly results in computation resource confliction between the application layer and the underlying in-memory data storage layer.
In current practice, the memory controller is completely unaware of in-memory data storage and has to use the same fine-grained memory fault tolerance mechanism, in particular ECC (error correction code), to protect the entire memory. In current mainstream computing systems, memory controllers employ the SEC-DED (single error correction, double error detection) code to protect each 8-byte user data with 1-byte coding redundancy, which is primarily for handling DRAM soft errors caused by radiation. As a result, DRAM modules are typically 72-bit DIMMs to accommodate such ECC configuration. For sub-20 nm DRAM and emerging new memory technologies (such as 3D XPoint), such a weak ECC could be inadequate and one may have to increase the memory fault tolerance strength at the cost of higher redundancy.