The field of invention relates to computing system architecture; and, more specifically, to reducing the latency of a processor that seeks information located within system memory.
Processors are used in computing systems and are implemented within a semiconductor chip. Processors execute instructions that typically operate upon data elements in order to implement a software program. The instructions and data elements used by a processor to implement a software program are stored in a memory structure (e.g., an L1 cache, L2 cache and/or system memory) and fetched by the processor prior to their being used. Each instruction and data element has a corresponding address so that it may be obtained from a particular memory structure location. The L1 and L2 caches are typically partitioned so that instructions are within one partition while data elements are in another partition.
FIG. 1 shows a portion 100 of a typical computing system. The system portion 100 of FIG. 1 includes a system bus 106 coupled to a memory interface unit 101 and a bus interface/L2 lookup unit 102. The memory interface unit is coupled to system memory 107. The bus interface/L2 lookup unit 102 is coupled to an L2 cache 104, a pair of instruction fetch queues 104a,b and a pair of data element fetch queues 105a,b. 
When a processor needs an instruction or data element, the L1 cache (not shown in FIG. 1) is first checked. If the desired instruction or data element is not present in the L1 cache, a request is placed in the appropriate queue 104a, 105a (i.e., an instruction fetch request is placed in the outbound instruction fetch queue 104a or a data element fetch request is placed in the outbound data element queue 105b).
The L2 cache 104 is next checked. That is, the request in the appropriate queue 104a, 105a is effectively forwarded to the bus interface/L2 lookup unit 102. The bus interface/L2 lookup unit 102 searches the L2 cache 104 for the requested information. If the desired instruction or data element is not present in the L2 cache 104, the request is effectively forwarded to the memory interface unit 101 via the system bus 106. This action is commonly referred to as a memory read.
The memory interface unit 101 (e.g., a memory controller) then retrieves (i.e., reads) the desired information from system memory 107. The retrieved information is then sent from the memory interface unit 101 over system bus 106 to the bus interface/L2 lookup unit 102. The bus interface/L2 lookup unit 102 then forwards the retrieved information into the appropriate queue 104b, 105b (i.e., an instruction is placed in the inbound instruction fetch queue 104b or a data element is placed in the inbound data element queue 105b). The processor then uses the retrieved instruction or data element to continue execution of the software program.
The various levels of memory structure (e.g., L1 cache, L2 cache 104 and main memory 107) demonstrate a cost-performance balance. L1 and L2 caches are typically implemented with static random access memory (SRAM) cells while main memory 107 is implemented with dynamic random access memory (DRAM) cells.
DRAM memory cells are typically slower and cheaper than SRAM memory cells, resulting in greater latency (and reduced system performance) whenever information is retrieved or stored from/to system memory 107. Also, the memory space of main memory 107 is usually larger than the combined memory spaces of the L1 and L2 caches.
With this approach, most of the information stored within the computing system is inexpensively stored in main memory 107. The slower speed of the main memory 107 (and corresponding reduction in system performance) is offset by enhanced utilization of the L1 and L2 caches. Because L1 and L2 caches are typically formed with SRAM cells, they are comparatively faster and more expensive than main memory to implement per unit of memory space.
To minimize the cost of their implementation, L1 and L2 caches have less combined memory space than main memory 107 as mentioned above. However, to take advantage of their faster speed in order to maximize their contribution to system performance, they are configured to be used more frequently than main memory 107.
Thus, a computing system is designed (e.g., via prediction) with the intention that instructions or data elements needed at any instant of time by a processor are more likely to found in the L1 and L2 caches rather than the main memory 107.