The memory system of a computer is one of the important elements that have an impact on the system structure and software performance. In the past decades, as the gap between processor performance and memory performance has become larger and larger, the memory system has been the bottleneck that restricts system performance. In recent years, along with the evolution of processors to multi-core and many-core structures, the access bandwidth problem of memory systems becomes more severe, which restricts the further development of the scope of multi-core processors.
In the past years, the main approaches for improving an effective access bandwidth are to improve a bus frequency and to increase the number of data channels, that is, to increase a physical bandwidth. However, a synchronous-bus-based memory access structure has no big change in these years. An SDRAM (Synchronous Dynamic Random Access Memory, synchronous dynamic random access memory) appeared in the mid-1990s, then it was developed to SDRAM, DDR (Double Data Rate), DDR2, and DDR3, and now DDR4 is going to be issued. The SDRAM uses synchronous interfaces, and all requests need to wait for a fixed clock period to obtain responses. Since the SDRAM has been issued, the memory bus structure hardly has had any essential change. Basically, the SDRAM is developed based on the improvement of the bandwidth through consistent enhancement of an interface frequency.
At present, attempts to change the memory structure have been made in an international scale. For example, in the RDRAM and XDR (Extreme Data Rate) technologies of Rambus, a packet-based request/response protocol is used, and a serial memory bus that is relatively narrow but has a high data rate is used to transmit data packets. An advanced memory buffer (AMB, Advanced Memory Buffer) is added on the Dual Inline Memory Module (DIMM) of FB-DIMM (Fully Buffered-DIMM) of Intel so that FB-DIMM may be connected to memory controllers or AMBs on neighboring DIMMs through a high-speed serial channel. Similar all-data buffer is used in LRDIMM (Load-Reduced DIMM), DDR4, and other technologies to improve the quality of high frequency signals. However, these attempts just partially change the memory structure. To be specific, data transmission is converted from the parallel bus format to the packet format, but a synchronous access protocol is still required in terms of timing sequence.
On one hand, the existing synchronous memory systems are mainly designed for ensuring that the delay of a single memory access is fixed and low. However, when a multi-core structure is used, the memory access delay actually includes two parts, waiting time in the memory access queue of a processor and a delay on a memory access channel. Apparently, a low delay on a memory access channel cannot ensure good overall memory access performance.
On the other hand, the data granularity of conventional memory access is fixed and has a tendency of increase. This is for ensuring that more data is transmitted in one transmission period and the data read each time is basically of the length of the CPU Cache line. However, in an actual program, the actual granularity for each data access varies. For some application data accesses that are irregular with a low granularity, the fixed large data granularity for each access inevitably causes a waste; when a large amount of data needs to be read and written for some applications, the data access needs to be divided into a plurality of memory transactions, thereby increasing protocol overhead. These all cause a waste of a memory access bandwidth.