1. Field of the Invention
The present invention relates to the field of computers. More specifically, the present invention relates to value prediction.
2. Description of the Related Art
With advances in microprocessor technology, the clock speeds of processors are significantly faster than speeds of memory systems, which leads to more costly memory accesses. Efficient caching schemes can help reduce memory accesses, but typical cache miss penalties are in the order of hundreds of cycles. When a load misses the cache, the processor waits idly for the missing load to return. Speculative execution aims at using these idle processor cycles to do useful work (like prefetching for memory accesses that are also known to miss in the near future, thereby reducing the overall number of cache misses).
Run-ahead scouting is a speculative execution scheme, where the processor executes some speculative code, while waiting for a cache miss to complete. The typical scheme is where the processor runs ahead and executes code past the missing load, and executes more missing loads that will help improve the memory level parallelism (MLP), and hence the name run-ahead scouting. Run-ahead scouting can execute with or without hardware support, and can execute code from the main thread or code from a compiler generated scout thread.
Some recent studies have indicated that 50% of missing loads in commercial applications, such as database applications, are last-value predictable 90% of the time. This is a useful property that can be exploited, for speculative execution in general and for run-ahead scouting in particular, to speculatively break memory dependencies on values of missing loads, and execute beyond these missing loads. In conventional run-ahead scouting without value prediction, missing loads are skipped and consequently prefetch addresses for subsequent missing loads dependent on the skipped loads cannot be generated. With value prediction, the predicted values of missing loads are propagated to subsequent instructions and often are utilized to generate addresses for subsequent missing loads. Thus, fewer loads are skipped, the average number of missing loads prefetched during speculative executions is increased and overall performance is significantly improved. However, value prediction in hardware is usually very expensive.