Generally, a microprocessor operates much faster than main memory can supply data to the microprocessor. Therefore, many computer systems temporarily store recently and frequently used data in smaller, but much faster cache memory. Cache memory may reside directly on the microprocessor chip (Level 1 cache) or may be external to the microprocessor (Level 2 cache). In the past, on-chip cache memory was relatively small, 8 or 16 kilobytes (KB); however, more recent microprocessor designs have on-chip cache memories of 256 and even 512 KB.
Referring to FIG. 1, a typical computer system includes a microprocessor (10) having, among other things, a CPU (12), a load/store unit (14), and an on-board cache memory (16). The microprocessor (10) is connected external cache memory (17) and a main memory (18) that both hold data and program instructions to be executed by the microprocessor (10). Internally, the execution of program instructions is carried out by the CPU (12). Data needed by the CPU (12) to carry out an instruction are fetched by the load/store unit (14) and loaded into internal registers (15) of the CPU (12). A memory queue (not shown) maintains a list of outstanding memory requests. The load/store unit adds requests into the memory queue and also loads registers with values from the memory queue. When the memory queue contains a list of outstanding memory requests this is referred to as a memory transaction. The memory transaction is released, or guaranteed to be completed, with other instructions. The correspondence between starting and releasing memory transactions with instructions helps a compiler manage the memory queue.
Upon command from the CPU (12), the load/store unit (14) searches for the data first in the fast on-board cache memory (16), then in external cache memory (17), and finally in the slow main memory (18). Finding the data in the cache memory is referred to as a xe2x80x9chit.xe2x80x9d Not finding the data in the cache memory is referred to as a xe2x80x9cmiss.xe2x80x9d
The time between when a CPU requests data and when the data is retrieved and available for use by the CPU is termed the xe2x80x9clatencyxe2x80x9d of the system. If requested data is found in cache memory, i.e., a data hit occurs, the requested data can be accessed at the speed of the cache and the latency of the system is reduced. If, on the other hand, the data is not found in cache, i.e., a data miss occurs, and thus the data must be retrieved from main memory for access, the latency of the system is increased.
In pursuit of increasing efficiency by reducing latency and increasing the hit to miss ratio associated with cache memory, prefetch operations have been implemented in many computer systems. Prefetch operations retrieve data associated with a memory operation prior to when the memory operation occurs. By doing so, when the memory operations occurs, the data is present in the cache memory. It is important to schedule prefetch operations at optimal points in an instruction line and to prefetch only data that is likely to be referenced.
In general, in accordance with an embodiment of the present invention, a method for cache line optimization of programs with irregular access patterns comprises selecting references for optimization, identifying cache lines, and mapping the selected references, determining dependencies within the cache lines, and scheduling the cache lines based on the determined dependencies with the goal of increasing the number of outstanding cache line misses at all times.
In general, in accordance with an embodiment of the present invention, a method of cache line optimization comprises a cache line scheduling step, and an instruction line scheduling step based on the cache line scheduling step.
In general, in accordance with an embodiment of the present invention, a software tool for cache line optimization comprises a program stored on computer-readable media for selecting references for optimization, identifying cache lines, mapping the selected references to the identified cache lines, determining dependencies within the cache lines, and scheduling the cache lines based on the determined dependencies.
In general, in accordance with an embodiment of the present invention, a software tool for cache line optimization comprises a program stored on computer-readable media for scheduling a cache line, and scheduling an instruction line based on the cache line scheduling.
In general, in accordance with an embodiment of the present invention, an apparatus for cache line optimization comprises cache line scheduling means, and instruction line scheduling means, wherein the instruction line scheduling means schedules instructions based on the cache line scheduling means.
Other advantages and features will become apparent from the following description, including the figures and the claims.