I. Field of the Disclosure
The technology of the disclosure relates generally to computer memory systems, and, in particular, to memory controllers in computer memory systems for providing central processing units (CPUs) with a memory access interface to memory.
II. Background
Microprocessors perform computational tasks for a wide variety of applications. A typical microprocessor application includes one or more central processing units (CPUs) that execute software instructions. The software instructions may instruct a CPU to fetch data from a location in memory, perform one or more CPU operations using the fetched data, and generate a result. The result may then be stored in memory. As non-limiting examples, this memory can be a cache local to the CPU, a shared local cache among CPUs in a CPU block, a shared cache among multiple CPU blocks, or main memory of the microprocessor.
In this regard, FIG. 1 is a schematic diagram of an exemplary system-on-a-chip (SoC) 100 that includes a CPU-based system 102. The CPU-based system 102 includes a plurality of CPU blocks 104(0)-104(N) in this example, wherein ‘N’ is equal to any number of CPU blocks 104(0)-104(N) desired. In the example of FIG. 1, each of the CPU blocks 104(0)-104(N) contains two (2) CPUs 106(0), 106(1). The CPU blocks 104(0)-104(N) further contain shared Level 2 (L2) caches 108(0)-108(N), respectively. A system cache 110 (e.g., a Level 3 (L3) cache) is also provided for storing cached data that is used by any of, or shared among, each of the CPU blocks 104(0)-104(N). An internal system bus 112 is provided to enable each of the CPU blocks 104(0)-104(N) to access the system cache 110 as well as other shared resources. Other shared resources accessed by the CPU blocks 104(0)-104(N) through the internal system bus 112 may include a memory controller 114 for accessing a main, external memory (e.g., double-rate dynamic random access memory (DRAM) (DDR), as a non-limiting example), peripherals 116, other storage 118, an express peripheral component interconnect (PCI) (PCI-e) interface 120, a direct memory access (DMA) controller 122, and/or an integrated memory controller (IMC) 124.
As CPU-based applications executing in the CPU-based system 102 in FIG. 1 increase in complexity and performance, limitations on memory bandwidth may impose a constraint on the CPU-based system 102. If accesses to external memory reach memory bandwidth limits, the memory controller 114 of the CPU-based system 102 may be forced to queue memory access requests. Such queuing of memory access requests may increase the latency of memory accesses, which in turn may decrease the performance of the CPU-based system 102.
Memory bandwidth savings may be realized by employing memory bandwidth compression schemes to potentially reduce the bandwidth consumed by a given memory access. However, the memory architecture underlying a system memory of the CPU-based system 102 may limit reads and writes to the system memory to memory granules having a specified minimum size (referred to herein as “memory read/write granularity”). As a result, conventional memory bandwidth compression schemes limit the size of “bins,” or compressed blocks, to the same size as the memory read/write granularity of the system memory using the memory bandwidth compression scheme. Thus, for example, a memory system having a memory read/write granularity of 64 bytes and a memory line size of 128 bytes may only provide two (2) compressed memory sizes for each memory line: 64 bytes (i.e., 1 bin) or 128 bytes (i.e., 2 bins).
It is therefore desirable to provide a memory bandwidth compression mechanism that may effectively provide a larger number of bins.