This invention relates generally to data processing and in particular to techniques for processing load instructions in the presence of resource conflicts within a data processor.
A typical load instruction instructs a data processor to retrieve data from memory. In general, such a load instruction identifies a memory location that stores the data. When the processor processes the load instruction, the processor typically checks tag information corresponding to the identified memory location to determine whether the data resides in an internal data cache. If the tag information indicates that the data resides in the data cache (a cache hit), the processor uses the data from the data cache. On the other hand, if the tag information indicates that the data is not in the data cache (a cache miss), the processor retrieves the data from an external or off-chip memory (e.g., a secondary cache, main memory or disk memory). In general, data retrieval is faster from the data cache than from the external memory.
Some processor designers have attempted to minimize the amount of time needed to retrieve data from the data cache in order to make the data available to the processor for further processing as quickly as possible. To this end, designers have designed some processors with dedicated memory circuits called tag stores for storing tag information. In general, such tag stores have access times similar to those of data caches.
Typically, when a processor using a tag store encounters a load instruction within an instruction stream, the processor simultaneously (i) checks tag information from the tag store, and (ii) reads data from the data cache through a primary data bus. If the tag information indicates that the retrieved data is valid (a cache hit), the data is available to the processor immediately for further processing.
Conversely, if the tag information indicates that the retrieved data is invalid (a cache miss), the processor ignores the data from the data cache, and performs additional retrieval steps to obtain the data from another memory (e.g., off-chip memory). In particular, the processor sends out a request to the other memory for the data. In response, the other memory provides the requested data to the data cache through the primary data bus, updates the tag information in the tag store and notifies the processor that the data is now available. The processor then obtains and uses the data.
In general, when a processor processes multiple load instructions, some load instructions will result in cache hits and some will result in cache misses. When data arrives from another memory through the primary data bus in response to a cache miss, the primary data bus and the data cache become temporarily unavailable. This unavailability temporarily prevents the processor from processing any further load instructions in the instruction stream (or pipeline). That is, the processor delays processing further load instructions (i.e., simultaneously checking the tag information in the tag store and reading data from the data cache) until the cache miss is satisfied (i.e., until the primary data bus and the data cache are again available).
It is expensive to delay load instructions within an instruction stream of a processor since such delays cause processor resources (e.g., fetch and execution circuitry) to go underutilized. Moreover, such delays effectively delay other non-load instructions within the instruction stream which depend on data to be retrieved by the delayed load instructions.
Additionally, when a cache miss occurs, the retrieved data is typically more than just the data identified by the load instruction. Rather, a block of data is generally provided during multiple processor cycles to fulfill any subsequent load instructions for data adjacent to the retrieved data. Such activity extends the amount of time that the data cache and the primary data bus are unavailable, and the amount of time the subsequent load instructions must be delayed.
Furthermore, there is a tendency for cache misses to occur in bursts (i.e., when one cache miss occurs, other cache misses are likely). Accordingly, when a first cache miss occurs in response to an initial load instruction, there is a strong probability that arrival of data in the data cache through the primary data bus in response to the initial load instruction will delay one or more other load instructions ready for processing by the processor.
In contrast, an embodiment of the invention is directed to a technique for handling load instructions within a data processor that includes a cache circuit having a data cache and a tag memory indicating valid entries within the data cache. The technique involves writing data to the data cache in response to a first load instruction. The technique further involves reading tag information from the tag memory in response to a second load instruction while data is written to the data cache. Accordingly, the processor is able to process the second load instruction regardless of data cache and primary data bus availability.
If the tag information indicates that the data identified by the second load instruction is in the data cache (a cache hit), the data cache provides the identified data to the processor in response to the second load instruction after data is written to the data cache in response to the first load instruction. On the other hand, if the tag information indicates that the data identified by the second load instruction is not in the data cache (a cache miss), the processor requests the data from another memory. In either situation, it is of no consequence that a processor resource such as the data cache or the primary data bus is unavailable when processing the second load instruction.
Preferably, writing data to the data cache occurs over multiple processor cycles. In this situation, the technique involves updating the tag memory during a particular one of the multiple processor cycles in response to the first load instruction. Furthermore, reading the tag information in response to the second load instruction occurs during another one of the multiple processor cycles that is different than the particular one of the multiple processor cycles. For example, the multiple processor cycles may form a series of four processor cycles. Updating of the tag information may occur during a first processor cycle in the series, and reading the tag information may occur during one of the subsequent processor cycles in the series.
Preferably, when data is written to the data cache in response to the first load instruction, the processor continuously provides a particular address of the data cache to an address input of the cache circuit until writing data to the data cache completes. Accordingly, the data is written to the data cache based on the particular address without interference from other addresses that processor may provide when processing other load instructions such as the second load instruction.
Preferably, the processor processes a load instruction by accessing the tag memory and the data cache simultaneously when both are available at the same time. For example, to process a third load instruction, the processor reads tag information from the tag memory and simultaneously reads data from the data cache.