One of the key goals in the design of computers is to reduce the latency involved in memory access. Generally, central processing unit (CPU) speeds have increased faster than memory access time, thereby exacerbating the problem. Thus a memory access operation may require multiple CPU cycles, and the processor may be stalled while waiting for data to execute the next instruction. Unless steps are taken to reduce memory latency and its effects, the benefits of high speed processors are not achieved.
In addition to design improvements which reduce memory latency per se, computer architectures typically have features which limit the effects of memory latency. One common approach to reducing the effects of memory latency is to utilize a cache memory. The cache is a relatively small, low latency memory that contains data required by instructions currently being executed. When a load instruction is executed, main memory is accessed and a block of data containing the required data word is placed in the cache. Typically that block of data remains in the cache until it is replaced by another block that needs the space. On subsequent accesses to the same data block, the data is read from the cache with low latency. The success of the cache depends on the fact that computer programs typically require multiple accesses to the same data block within a short time and on the fact that the cache has substantially lower latency than the main memory. The performance of caches may be optimized with respect to capacity, replacement algorithms and the like. Both data caches and instruction caches have been utilized.
Another way to reduce the effects of memory latency is to execute load instructions out of order in the instruction sequence. More particularly, the load instruction is moved earlier in the instruction sequence, so that the accessed data will be available to the execution unit by the time it is needed. As a result, delay caused by memory latency is avoided.
However, when the load instruction is moved earlier in the instruction sequence, it is likely to be executed speculatively, because the compiler does not know if one or more subsequent branch instructions will take a path away from the load instruction. Unfortunately, data blocks accessed as part of a speculative load instruction that is never needed will displace data in the cache that may be needed later. A delay may be incurred in reloading the displaced data when it is needed. Thus, loading of data into the cache in response to a speculative load instruction may have an adverse effect on performance, despite the fact that the speculative data is not used.
Accordingly there is a need for improved computer apparatus and methods of operation wherein the adverse effects of memory latency are reduced.