1. Field of the Invention
The invention is related to computing systems and more particularly to spatial locality of memory requests in computing systems.
2. Description of the Related Art
In a typical computing system, a memory system is designed with a goal of low latency experienced by a processor when accessing arbitrary units of data. In general, the memory system design leverages properties known as temporal locality and spatial locality. Temporal locality refers to multiple accesses of specific memory locations within a relatively small time period. Spatial locality refers to accesses of relatively close memory locations within a relatively small time period.
Typically, temporal locality is evaluated in terms of a granularity smaller than that of a next level in a memory hierarchy. For example, a cache captures a repeated access of blocks (e.g., 64 Bytes (B)), which is smaller than the storage granularity of main memory (e.g., 4 Kilobyte (KB) pages). Spatial locality is typically captured by storing quantities of data slightly larger than a requested quantity in order to reduce memory access latency in the event of sequential access. For example, a cache is designed to store 64B blocks, although a processor requests one to eight Bytes at a time. Meanwhile, the cache requests 64B at a time from a memory, which stores pages of 4 KB contiguous portions.
In general, typical memory system designs capture whatever temporal and spatial locality information that can be culled from the memory streams they are servicing in a strictly ordered and independent manner. For example, a level-two (L2) cache of a memory system having three cache levels only receives memory accesses missed in a level-one (L1) cache. A level-three (L3) cache only receives memory accesses that have already been filtered through both of the L1 and the L2 caches. Similarly, a dynamic random access memory (DRAM) only receives memory accesses that have been filtered through the entire cache hierarchy. Accordingly, each level of the memory hierarchy has visibility to only the temporal and spatial locality of memory accesses that have been passed from the previous level(s) of the hierarchy (e.g., cache misses) and only at the granularity of that particular level. Of particular interest is the filtering of memory accesses by a last-level cache (i.e., a cache level that is closest to the main memory), typically an L3 cache, to memory. In a typical memory system, the L3 cache and main memory form a shared memory portion (i.e., shared by all executing threads) and capture global access patterns. However, the memory system typically does not have a mechanism for providing information regarding thread characteristics with respect to page granularity because the L3 cache operates on blocks and filters information from the DRAM. Meanwhile, the DRAM operates on larger portions of memory, but receives filtered information from the L3 cache. Information regarding memory usage patterns of memory requests that enter the shared portion of the memory system (e.g., the L3 cache, after L1 and L2 cache filtering) may be used to make macro-level policy adjustments in various applications. Accordingly, techniques that provide information regarding an application or thread memory access patterns may be useful to improve performance of memory systems.