FIG. 1 shows a portion of an architecture for a basic computing system that includes: 1) a processor 101; 2) a cache 102; 3) a memory controller 103; and, 4) a system memory 104. The processor 101 implements software routines by executing instructions that perform various operations on elements of data. The instructions and data elements are stored in the cache 102 and/or system memory 104. When the processor 101 needs a specific instruction or data element it looks to the cache 102 for the desired instruction or data element before requesting it from system memory 104.
Generally, cache 102 is deemed to be “faster” than the system memory 104. Better said, the processor 101 waits less time waiting for an instruction or data element that resides in the cache 102 than an instruction or data element that resides in the system memory 104. This disparity in waiting time as between the cache 102 and system memory 104 typically arises as a consequence of the cache 102 being implemented with inherently faster memory cells (e.g., SRAM cells) than those of which the system memory is implemented (e.g., DRAM cells).
Per bit of storage space an SRAM type cache 102 is more expensive than a DRAM type system memory 104. The computing system architecture of FIG. 1 therefore attempts to optimize both cost and performance by being designed to store more frequently used instructions and data elements in the cache 102 and less frequently used instructions and data elements in the system memory 104. By storing the more frequently used instructions and data elements in the cache, the processor should endure acceptable “timing penalty hits” in the form of wasted time waiting for instructions/data to be fetched from system memory 104 because a significant percentage of the instructions/data needed by the processor will be found in the cache 102.
In order to enhance the percentage of “cache hits” (i.e., the instances where a needed instruction or data element is found the cache 102), notions of “temporal locality” and “spatial locality” come into play. Temporal locality is the notion that a single instruction or data element is apt to be used soon after it has already been used. Spatial locality is the notion that instructions and data elements that are located near each other in memory (i.e., have similar addresses) tend to be used at about the same time. Temporal locality is accounted for by keeping instructions and data elements in cache 102 for at least some period of time after they are first transferred from system memory 104 into cache 102.
Spatial locality is accounted for by designing the cache 102 to be loaded with a block of data from system memory 102 (i.e., multiple instructions or data elements) whose content is proximate to (e.g., “surrounds”) any single instruction or data element that needs to be fetched from system memory 104. For example, if an instruction at address X is needed from system memory 104, instead of transferring only the needed instruction from system memory 104, instead of transferring only the needed instruction from system memory 104 to cache 102, a block of content corresponding to a plurality of addresses that are related to address X is transferred from system memory 104 to cache 102.
FIG. 2 attempts to depict such a situation by showing that a first contiguous “block” of content 105 (which is referenced through multiple system memory addresses) is loaded into a single cache line 107; and, that a second contiguous “block” of content 106 (which is referenced through a different set of multiple system memory addresses) is loaded into another single cache line 108. For simplicity, FIG. 2 shows the cache 204 as a single structure. Various computing systems are designed with different levels of cache, however. For example, many types of computing systems have two levels of caches (a level one (L1) cache and a level two (L2) cache) where the first level cache (L1) corresponds to less processor waiting time than the second level cache (L2). The L1 cache is supposed to store the most frequently used data elements and instructions while the L2 cache is supposed to store data elements and instructions that are used less frequently than those in L1 cache but more frequently than those in system memory.
Traditionally, both cache levels are implemented with a faster memory type as compared to system memory (e.g., both L1 and L2 cache are implemented with SRAM memory cells); however, the L1 cache is integrated onto the same semiconductor die as the processor while the L2 cache is implemented with different semiconductor die than the processor. As “on chip” cache accesses are faster than “off chip” cache accesses, accesses to the L1 cache correspond to less waiting time for the processor than accesses to the L2 cache.
The memory controller 103 is responsible for taking requests from the processor 101 for data, that are not satisfied by the cache, and managing the process of servicing those requests in system memory 104. There may be many different kinds of requests, such as load requests for data that is not present in the cache, and evictions of data from the cache that need to be stored back into memory. Typically, the memory controller is able to pipeline requests, so that many requests may be outstanding, and can be serviced in parallel with a much shorter average latency. The memory controller is responsible for interfacing with the details of a particular memory technology, and isolates the system memory from the processor in a modular fashion. The memory controller may either be integrated with the processor, e.g. on the same die, or may be separated, e.g. in a chipset.
The system memory is typically implemented with a specific type of system memory (e.g., EDO RAM, SDRAM, DDR, etc.).