1. Field of the Invention
This invention relates to computer processing systems, and particularly to recalling calculated values produced under the technique known as runahead execution for later reuse.
2. Description of Background
A microprocessor having a basic pipeline microarchitecture processes one instruction at a time. The basic dataflow for an instruction follows the steps of: decode, address generation, cache access, register read/cache output, execute/write back. Each stage within a pipeline occurs in order and hence a given stage can not progress until the stage in front of it is progressing. In order to achieve highest performance one instruction will enter an exit the pipeline every cycle. Whenever the pipeline has to be delayed or flushed, this adds latency which in turn negatively impacts performance with which a microprocessor carries out a task. While there are many complexities that can be added on for performance, the above sets the groundwork for data prediction.
A current trend in microprocessor design has been to increase the number of pipeline stages in a processor. By increasing the number of stages within a pipeline, the amount of logic performed in each stage of the pipeline is reduced. This facilitates higher clock frequencies and most often allows the processor's throughput to increase over a given time frame. With increasing pipeline depth, bottlenecks remain that inhibit translating higher clock frequencies into higher performance. One such bottleneck is that of address generation interlock (AGI). AGI occurs when an instruction produces a result at one segment within the pipeline which is consumed to compute an address at an earlier stage within the pipeline for a following instruction. This requires the consuming address to stall until the producing instruction completes storing its value in one of the processor's registers. Traditional approaches to solving this problem have included providing bypass paths in the pipeline to allow use of produced data as early as possible. This has its limits, and deepening pipelines will increase the number of cycles until the earliest time the data is available in the pipeline. The problem remains of how to remove the remaining stalls created in the pipeline that adversely affect performance.
For an in-order pipeline, whenever the pipeline stalls because of a dependency, early parts of the pipeline are prevented from progressing past the stalled instruction within the given pipeline for if such was permitted the instructions would become out-of-order. Without specifically designing for the occurrence of out-of-order, data integrity will be introduced into the machine. Code has limitations to the amount of parallelism, non-dependent items, that can be processed in a given time frame. This lack of full parallelism in turn creates stalls within the pipeline. One such example is instructions which acquire operand data from memory. In the ideal case, the data content will be in a close local memory, ideally the first level of data cache. In cases where the data is not in the local data cache, the data request must go out to the memory hierarchy to acquire the required content. Going to larger and further out levels of memory requires further time and this addition of time from the dependency creates a long stall in the pipeline of the in-order microprocessor. Under such scenarios, the processor can go into a mode known as runahead. Runahead is where the processor is allowed to continue progressing forward, even though there is a dependency in the pipeline. Because this forward movement causes the pipeline to get out-of-order, which in turn can create data integrity problems, the architected state of the machine can not be updated. By allowing the pipeline of the machine to move forward, but not updating the architected state of the machine implies multiple items. First, the operations performed under the stall must be repeated when the dependency has been resolved. Second, because the architected state of the machine is not being updated, data integrity issues are prevented from arising. Third, by allowing the pipeline to runahead, additional cache misses can be found and the search through the memory hierarchy for the data content of concern can be set in motion earlier. This concept of allowing the pipeline to move temporary move forward under a data cache miss is known as runahead.