With the development of the multi-core and multi-threading technology, Dynamic Random Access Memory (DRAM) can no longer meet the growing memory demand of applications due to restrictions in terms of power consumption and techniques. Emerging Non-Volatile Memories (NVMs), such as Phase Change Memory (PCM), Spin Transfer Torque Magneto resistive Random Access Memory (STT-MRAM), and Magnetic Random Access Memory (MRAM), have features such as byte-addressable, comparable read speed with DRAM, near-zero standby power consumption, high density (storing more data per chip), and high scalability, and may serve as a substitute for DRAM as the storage medium of main memory. However, compared with the DRAM, these new non-volatile memories still have a lot of disadvantages: (1) a relatively high read/write delay, where the read speed is approximate twice slower than that of DRAM, and the write speed is almost five times slower than that of the DRAM; (2) high write power consumption; and (3) a limited endurance life. Therefore, it is unfeasible to directly use these emerging non-volatile memories as the computer main memory. A mainstream approach at present is to integrate a large amount of non-volatile memories with a small amount of DRAMs to form a heterogeneous memory system, so that the performance, power efficiency and endurance of the memory system are improved by exploiting the advantage of large capacity of the non-volatile memory and the advantages of a low memory access delay, low write power consumption, and high endurance of the DRAM.
There are mainly two types of heterogeneous memory architectures at present: flat and hierarchical heterogeneous memory architectures.
In a heterogeneous memory system with a flat architecture, the NVM and the DRAM are uniformly addressable, and both NVM and DRAM are used as main memory. To improve the power efficiency and performance of the system, hot page migration is a common optimization policy adopted in this architecture. That is, frequently accessed NVM page frames are migrated to the DRAM. A migration operation is generally divided into two sequential steps: (1) copying the content of source and target page frames into a buffer; and (2) writing the data in the buffer into target addresses. Therefore, one page migration operation may generate four times of page replication, and thus the time cost of the migration operations are relatively large because the reading phase and the writing phase are performed sequentially. Besides, if a memory system supports 2 MB or 4 MB superpage to reduce TLB miss rate, the hot page migration mechanism can leads to tremendous time and space overhead.
In a heterogeneous memory system with a hierarchical architecture, high-performance memories such as DRAM are used as a cache to the non-volatile memory. As memory access to DRAM cache is more efficient than that of NVM, a heterogeneous memory system with a hierarchical architecture can achieve better application performance compared with the heterogeneous memory system with a flat architecture. In a conventional hierarchical heterogeneous memory system, a DRAM cache is managed by hardware and is transparent to operating systems, and the organization of the DRAM cache is similar to a conventional on-chip cache. When a LLC miss occurs, the hardware circuit in DRAM memory controller is responsible for tag lookup of a physical page address. It determines whether data access is hit in the DRAM cache, and then performs actual data access. This implies that the hierarchical heterogeneous memory system has a relatively long access delay when a DRAM cache miss occurs. In addition, the hardware-managed DRAM cache generally adopts a demand-based data fetching mechanism. When data is not hit in DRAM cache, the NVM data block corresponding to the missing data should be fetched into DRAM cache first and then is loaded to on-chip last-level cache. In big data environments, a lot of applications have poor temporal/spatial locality, and such data fetching mechanism would aggravate cache pollution.