This invention relates generally to data processing and in particular to techniques for controlling memory access requests within a data processor.
A typical pipelined data processor includes a series of stages beginning with an instruction retrieval (or fetch) stage that retrieves instructions from a memory, and provides them, in the form of an instruction stream, to a subsequent stage within the series of stages for further processing. Typically, the instruction retrieval stage attempts to retrieve and provide as many instructions as possible to maximize processor utilization. That is, the instruction retrieval stage tries to provide the subsequent stage with as many instructions as it can consume.
A typical instruction retrieval stage includes an instruction retrieval circuit and a bus interface circuit. Generally, when the subsequent stage demands a set of instructions from the instruction retrieval stage, the instruction retrieval circuit sends memory access requests to the bus interface. In particular, the instruction retrieval circuit sends an instruction fetch request followed by one or more instruction prefetch requests to the bus interface.
An instruction fetch request instructs the bus interface to read (or fetch) a set of instructions (one or more instructions) in response to a demand for that set from the subsequent stage. An instruction prefetch request instructs the bus interface to speculatively read (or prefetch) a set of instructions from the memory in response to a predicted need for that set by the subsequent stage. Such prefetching attempts to reduce instruction retrieval latency (i.e., the amount of time the subsequent stage must wait for the set of instructions to be retrieved from memory), thus reducing idle time of the subsequent stage and increasing processor utilization.
The bus interface schedules the memory access requests such that they are fulfilled one at a time beginning with the instruction fetch request. Accordingly, for each set demanded by the subsequent stage, the instruction retrieval stage provides a fetched set of instructions and one or more prefetched sets of instructions to the subsequent stage. As a result, the one or more prefetched sets of instructions are made available to the subsequent stage as quickly as possible should the subsequent stage be able to use them.
It is possible that some instructions retrieved by the instruction retrieval stage will not be needed by the subsequent stage. For example, while the processor executes instructions retrieved from memory, the processor may mispredict the direction of instruction execution, and speculatively execute down an incorrect branch of instructions. In such a situation, the subsequent stage of the processor then demands instructions of the correct branch from the instruction retrieval stage. In general, the processor recovers to an earlier state of execution that existed prior to execution down the incorrect branch. Additionally, the processor typically kills any remaining instructions of the incorrect branch that remain in the pipeline past the instruction retrieval stage to insure correct program behavior and to prevent wasting of processor resources.
When the instruction retrieval stage receives the demand for instructions of the correct branch from the subsequent stage, the instruction retrieval stage operates to satisfy the demand. In particular, the instruction retrieval circuit typically sends new memory access requests to the bus interface for processing. The bus interface then attends to the new memory access requests after satisfying any earlier received memory access requests.
Unfortunately, the earlier received memory access requests may include requests for instructions of the incorrect branch. The processing of such requests by the bus interface causes excessive prefetching of instructions. In particular, it is unlikely that the instructions of the incorrect branch will be executed after being prefetched since the processor has killed other instructions of the incorrect branch and has proceeded to execute down the correct branch.
Furthermore, such excessive prefetching typically wastes processor resources by tying up the bus interface and other processor resources. For example, the bus interface may defer handling the memory access requests for the correct instruction branch. That is, the bus interface may first satisfy the unnecessary instruction prefetch requests for the incorrect instruction branch before satisfying the new memory access requests for the correct instruction branch. In such a situation, the subsequent stage typically waits for the unnecessary instruction prefetch requests to be satisfied before receiving instructions of the correct branch. Additionally, processor resources must then kill the instructions prefetched by these unnecessary instruction prefetch requests.
In contrast to the conventional approaches, an embodiment of the invention is directed to a technique for controlling memory access requests to minimize excessive instruction retrieval and thus reduce wasting of processor resources. The technique involves obtaining a prefetch request for performing a prefetch operation that prefetches a first set of instructions from a memory, and subsequently obtaining a fetch request for performing a fetch operation that fetches a second set of instructions from the memory to satisfy a cache miss. The technique further involves canceling the obtained prefetch request when the fetch request is obtained before the prefetch operation initiates in response to the obtained prefetch request, and performing the prefetch operation to completion when the fetch request is obtained after the prefetch operation initiates in response to the obtained prefetch request. Cancellation of the prefetch request frees processor resources, thus enabling the processor to perform other operations.
Preferably, obtaining the prefetch request and obtaining the fetch request involve adding, in a request queue, a first entry identifying the prefetch operation in response to the prefetch request, and adding, in the request queue, a second entry identifying the fetch operation in response to the fetch request. Additionally, canceling the obtained prefetched request preferably involves invalidating the first entry in the request queue when the fetch request is obtained before the prefetch operation initiates, and maintaining the first entry in the request queue in a valid form when the fetch request is obtained after the prefetch operation initiates. Such adding, validating, and invalidating of entries are preferably performed using circuitry that controls values within fields of entries of the request queue.
Preferably, the technique further involves attempting to retrieve the second set of instructions from a cache to create the cache miss, and generating, in response to the cache miss, a series of requests beginning with the fetch request followed by at least one prefetch request. As such, the series of requests may include the fetch request followed by multiple prefetch requests. Subsequent requests within the series of requests may identify memory access operations that access contiguous or discontiguous areas within the memory.
It should be understood that the obtained prefetch request belongs to a previous series of requests beginning with a previous fetch request and at least one previous prefetch request.
The technique preferably further involves generating a prevent signal that prevents cancellation of new prefetch requests that are obtained after the fetch request.