Microprocessors are used extensively in computer and electronic devices to process instructions and data values. A basic operation for microprocessors is to read data values from memory for use during processing, also known as load operations, which can be based on received load instructions. A microprocessor reads instructions and values from memory locations specified as addresses in the load instructions, and the loaded data values can be used in the execution of subsequent instructions. The use of parallel processing techniques, such as instruction pipelining, allows microprocessor performance to increase by executing more than one instruction at a time. For example, the initial steps of instruction fetching and decoding of a subsequent instruction can be started before the prior instruction finishes executing.
One of the limits to performance of central processing units (CPUs) in microprocessors is stalls caused when data retrieved by a load operation is also needed by one or more pipelined subsequent instructions before the load operation has completed. This causes a time delay while such dependent subsequent instruction(s) wait for the load operation to complete. Even when using fast local caches in the microprocessor, the load-to-use penalty will typically be non-zero. In a superscalar design, this penalty is even higher if multiple instructions are stalled, waiting for the single result of a load instruction.
Some approaches have been tried to reduce this penalty in processing time due to pending load instructions. For example, one approach attempts to predict the address in memory at which a future decoded load operation will read a value, and execute the load operation speculatively, at the predicted address, before the actual load address is known. This pre-loading of data values can potentially save time when the values are actually needed by dependent instructions. However, such address prediction requires the load operation to be completely executed before the speculated data is obtained and available for subsequent operations, which can still introduce a significant delay in the processing of subsequent instructions waiting on the load operation result. In other approaches, the load value is predicted based on stored load values retrieved during previous load operations. However, these techniques can often provide inaccurate predicted data values and consume significant system resources, greatly reducing the benefits of the prediction system.
Accordingly, what is needed is a method and system that provides data values for load operations to be available to dependent operations before execution of the load operations, and which can provide more accurate predicted load values using less system resources.