1. Field of the Invention
This invention is related to processors and, more particularly, to cache miss handling in multithreaded processors.
2. Description of the Related Art
Presently, typical processors are single threaded. That is, the instructions that are being executed concurrently in the processor all belong to the same thread. Instruction fetching in such processors generally involves fetching instructions from the single thread. In various implementations, branch prediction schemes may be used to control fetching or sequential fetching may be implemented. In either case, fetching may be redirected (if a branch misprediction occurs, or for a taken branch in the sequential fetch implementation, or for an exception, trap, etc. in either case).
Most present processors implement an instruction cache to store instructions for rapid fetching by the processor. While instruction cache access latency is shorter than memory access latency (or access latency to lower level caches, if a cache hierarchy is implemented), the instruction cache has a limited capacity and thus is subject to cache misses. A cache miss occurs in a cache if an access to a given address is performed and the corresponding instructions/data are not stored in the cache. In contrast, a cache hit occurs if the access is performed and the corresponding instructions/data are stored in the cache (and are provided by the cache in response to the access). Typically, a cache allocates and deallocates storage in contiguous blocks referred to as cache lines. That is, a cache line is the minimum unit of allocation/deallocation of storage space in the cache.
When a cache miss occurs for a given cache line, the processor initiates a cache fill for that cache line. The cache fill generally includes retrieving the cache line from memory or a lower level cache and storing the cache line in the cache. While the cache fill is occurring for an instruction cache miss, instruction fetching is generally stalled in the single threaded processor. Since instruction execution cannot progress beyond the instruction cache miss, fetching instructions beyond the cache miss is not helpful. Since the processor is waiting on the instructions in the cache line returned for the cache fill, many single threaded processors attempt to bypass the instructions from the cache line into the processor's pipeline as the fill data arrives to be written into the instruction cache.
More recently, multithreaded processors have been proposed. Particularly, in fine grain multithreading, the processor may have two or more threads concurrently in process. Instructions may be issued from any of the threads for execution. Thus, in some cases, instructions from different threads may be in adjacent pipeline stages in the processor. Since multiple threads are being fetched, instruction fetching mechanisms may be more complex. Additionally, utilizing fetch bandwidth efficiently becomes even more important when multiple threads are being fetched.