1. Technical Field
The present invention relates generally to data processing systems and specifically to load requests of a processor core. Still more particularly, the present invention relates to an improved system and method of handling core load requests in a cache hierarchy.
2. Description of the Related Art
Increasing efficiency of data operation at the processor-cache level is an important aspect of processor chip development. Modern microprocessors typically include entire storage hierarchies (caches) integrated into a single integrated circuit. For example, one or more processor cores containing L1 instruction and/or data caches are often combined with a shared on-chip L2 cache. A conventional symmetric multiprocessor (SMP) computer system, such as a server computer system, includes multiple processing units all coupled to a system interconnect, which typically comprises one or more address, data and control buses. Coupled to the system interconnect is a system memory, which represents the lowest level of volatile memory in the multiprocessor computer system and which generally is accessible for read and write access by all processing units. In order to reduce access latency to instructions and data residing in the system memory, each processing unit is typically further supported by a respective multi-level cache hierarchy, the lower level(s) of which may be shared by one or more processor cores.
Cache memories are commonly utilized to temporarily buffer memory blocks that might be accessed by a processor in order to speed up processing by reducing access latency introduced by having to load needed data and instructions from memory. In some multiprocessor (MP) systems, the cache hierarchy includes at least two levels. The level one (L1), or upper-level cache is usually a private cache associated with a particular processor core and cannot be accessed by other cores in an MP system. Typically, in response to a memory access instruction such as a load or store instruction, the processor core first accesses the directory of the upper-level cache. If the requested memory block is not found in the upper-level cache, the processor core then access lower-level caches (e.g., level two (L2) or level three (L3) caches) for the requested memory block. The lowest level cache (e.g., L3) is often shared among several processor cores.
For a typical processor core that has an associated store-in L2 cache, a sensitive performance balance exists between the scheduling of core load requests and core store requests. For optimal performance, access latency should remain at a minimum for core loads. However, many false attempts to dispatch a load request may exist, due to resource conflicts in the L2 cache. These resource conflicts may include: address collisions, load-hit-store queue collision, and machine full collisions.
Load requests have higher priority over store dispatch requests because the requested data of a load request is critical for processing in the core. Store request operations are less critical because store requests only update the memory hierarchy with computational results. In convention systems, however, lookup bandwidth and internal datapaths may be substantially consumed by issued requests that cannot yet be dispatched leading to decreased processing efficiency.