The performance of current microprocessors is severely limited by finite cache effects for a large fraction of important workloads. Finite cache effects include all contributors to performance degradation that would be eliminated if the first level cache of a microprocessor was made infinitely large. The amount of time that a microprocessor stalls while waiting for operand data from off chip storage is equal to the time spent executing instructions in many cases. This is especially true in workloads which involve database and transaction processing.
Many current microprocessor designs work towards reducing the finite cache penalty. Large caches, multiple levels of caching, high speed multichip modules, out of order execution and instruction prefetching have been widely used and are thought to be the most successful. Operand prefetching has also been successfully used for certain workloads with and without conventional out of order processing. However, operand prefetching is not particularly effective for database and transaction workloads. Large caches provide a reduction in finite cache effects but further improvements in this area are limited by the cost performance implications of increased die size or chip counts. Current out of order execution techniques provide large reductions in finite cache effects but they come with a penalty in the form of reduced processor clock frequency and increased design complexity. Thus there is a need to provide an improvement in microprocessor designs which can substantially reduce the cost of implementing out of order execution designs that were previously thought preferable.