1. Field of the Invention
The present invention relates to software control of memory access by a processing unit and, more particularly, to software controlled prefetching that effectively hides inherent memory access latency.
2. Background Art
Computer systems typically access data and/or program information from memory by utilizing the principles of temporal and spatial locality. Spatial locality, or locality in space, relates to the likelihood that, once a given entry is referenced, nearby entries will tend to be referenced in the near future. Temporal locality, or locality in time, relates to the likelihood that, once an entry is referenced, it will tend to be referenced again in the near future. To take advantage of these principles of locality, computer systems typically employ a hierarchical memory structure. This structure includes cache memory that is relatively small, fast, and local to the processor in addition to the larger, but slower, main memory. Some systems may include two or more levels of cache memory. The L2 cache, or second level of cache memory, may be located on the central processing unit (CPU) itself or on a separate integrated circuit chip, for example. The L1 cache, or first level of cache memory, is usually integrated within the CPU chip itself. Thus, in order to take advantage of the principles of locality, it is desirable to have the sought data in the cache, preferably the L1 on-chip cache, by the time the CPU makes its request for the entry.
When a memory access is requested, the system first checks the L1 on-chip cache, then the L2 cache (if present), then the main memory. While the technology used to implement the cache levels is typically static random access memory (SRAM), the technology used to implement the main memory is typically dynamic random access memory (DRAM). The DRAM cost per byte is substantially lower than the SRAM cost per byte and, as such, DRAM is the preferred choice for larger main memory systems. However, the DRAM access time is much longer than the associated cache memory access time. This results from the physical nature of the basic storage element that is a capacitor as well as the memory chip density and the overall main memory density. Given these constraints, a system that is able to manipulate the sought data access so that it is likely to be located in the local cache memory at the time that it is required by the CPU is capable of higher performance than a system that does no such explicit manipulation.
A method and apparatus for altering code to effectively hide main memory latency using software prefetching with non-faulting loads prefetches data from main memory into local cache memory at some point prior to the time when the data is requested by the CPU during code execution. The CPU then retrieves its requested data from local cache instead of directly seeing the memory latency. The non-faulting loads allow for safety and more flexibility in executing the prefetch operation earlier because they alleviate the concern of incurring a segmentation fault, particularly when dealing with linked data structures. Accordingly, the memory access latency that the CPU sees is essentially the cache memory access latency. Since this latency is much less than the memory latency resulting from a cache miss, the overall system performance is improved.