Many computing systems (e.g., servers, workstations, desktops, laptops, etc.) use memory or memory modules that can be inserted or plugged into a socket on a printed circuit board (e.g., motherboard or blade) of the computing system. The memory stores data accessible by a host controller and other components in the computing system, and a processor performs operations over the stored data. A common form of pluggable memory is the Dual In-line Memory Module (DIMM). DIMMs can contain multiple memory chips (e.g., Dynamic Random Access Memory or DRAM chips), each of which has a particular data bus width (e.g., 4 bits or 8 bits). For example, a DIMM may have eight 8-bit DRAM chips, or sixteen 4-bit DRAM chips, arranged in parallel to provide a total 64-bit wide data bus. Each arrangement of 64 data bits from parallel DRAM chips is called a rank. The memory arrays are further subdivided into addressable groups, rows, and columns. A data bus (e.g., DQ bus) is connected to the host controller to allow writing and reading of datasets to and from the DIMM. A command/address bus (e.g., CA bus) also runs between the host controller and each DIMM. The CA bus and DQ bus together form a system bus. The system bus and communication protocol between the host controller and DRAM chips is often standardized (e.g., following “the JEDEC DDR4 SDRAM standard”, etc.) to ensure product interoperability and reduce costs for all the stakeholders in the computing business ecosystem.
As computing systems increase in performance, the amount of memory and complexity of the host boards (e.g., motherboards) also increases. This introduces several issues for legacy memory systems and architectures. For example, with an un-buffered DIMM (e.g., UDIMM), the CA bus is connected directly to every DRAM chip on the DIMM. As a result, there is a high electrical load (e.g., capacitive load) on the CA bus that is proportional to the product of the number of DRAM chips and the number of ranks. For the DQ bus, the electrical load is proportional to the number of ranks. Some legacy memory systems have employed buffering devices on or near the DRAM chips to boost the drive of the DQ and CA signals in the presence of high electrical loads. Even with such load reduction DIMMs (e.g., LRDIMMs), however, legacy memory systems have further limitations in capability and performance.
One such limitation is related to dataset access temporal and spatial locality. Temporal locality is a time-wise measure. One such measure depends from the time locality of two events. For example, how soon after a word of data (or other portion of a dataset) that word is accessed or expected to be accessed again. Spatial locality is a measure of how soon after a dataset at a given address location is accessed that another dataset at a relatively close address location is expected to be accessed.
In computing applications with low spatial and temporal locality (e.g., random querying or key-value indexing of large datasets), ongoing dataset accesses primarily move data into and out of the DIMMs (e.g., to facilitate data transformation operations performed in a processing element). In such applications (e.g., that exhibit low temporal and spatial locality), legacy memory systems expend more power in the process of data movement than in the data transformation operations. Moreover, legacy memory systems place high demands on the host controller during these data movement operations.
Techniques are needed to address the problem of implementing a memory system that exhibits improved power consumption as well as increased performance in such computing applications—yet without requiring a memory system or memory architecture re-design. None of the aforementioned legacy approaches achieve the capabilities of the herein-disclosed techniques. Therefore, there is a need for improvements.