1. Technical Field
This disclosure relates to computer processors, and more specifically to reducing a restart latency in a processor.
2. Description of the Related Art
In executing a computer program, program order is generally followed in order to ensure correct results. Thus, when a first instruction is followed by a second instruction that depends on the first instruction's result, the execution of the second instruction is not completed until the first instruction's result becomes available. Sometimes a result will be available almost immediately. Other times, a result may take hundreds of processor cycles to become available—for example, in the case of a memory load that misses a data cache (e.g., an L1 cache) and must retrieve the desired data from elsewhere in the memory hierarchy (e.g., an L2 cache, main memory, etc.). One option in response to a lengthy delay in obtaining results (e.g., a memory cache miss) is to stall. Other options may include executing instructions speculatively or performing “scouting.”
To perform scouting, a processor executes one or more scouting threads to prefetch data for a main thread. The scouting threads may differ from the main thread in that the scouting threads may include only the instructions that are relevant for calculating memory addresses and issuing cache requests. Results of these scouting threads are not committed, however. When a scouting thread is executed, the scouting thread may not stall upon encountering a cache miss but rather may continue to execute as though the cache miss did not occur. By doing so, the scouting thread causes multiple cache requests to be issued and serviced in parallel, instead of sending requests and servicing them sequentially. The cost of servicing multiple cache requests can thus be amortized.