1. Field of the Invention
This invention is related to the field of processors and, more particularly, to forwarding of the critical word from a cache block fill in processors and related circuitry.
2. Description of the Related Art
Processors typically implement load and store operations to access data in memory. The loads specify reads of memory locations to provide data to the processor, and the stores specify writes to memory locations using data provided from the processor. Depending on the instruction set architecture implemented by the processor, loads and stores may be explicit instructions specified in the instruction set architecture, implicit operations in an instruction that specifies a memory operation, or a combination thereof.
To reduce memory latency for loads and stores, processors typically implement one or more caches that the processors access prior to accessing the main memory system. The caches store recently accessed data in units of cache blocks. Cache blocks may be of varying sizes in various processors, such as 32 bytes, 64 bytes, 128 bytes, etc. The blocks are typically aligned in memory to the natural boundary of their size.
Accordingly, if a load misses in a cache, the cache block containing the load data is read from memory and transferred into the cache. While storing the cache block into the cache will reduce latency for other accesses that hit in the cache, the processor's performance is often heavily impacted by the wait for the load data. Typically, a cache block is transferred using multiple data transmissions on the interconnect from the memory to the processor. To reduce the latency for the load data, the load data is provided in the first transfer of the cache block, and then the remaining data is transferred. The processor and caches can be designed to forward the load data to the target while waiting for the cache block to be provided. In some cases, a memory controller can be designed to provide a response indicating that the data is about to be transferred (e.g. some number of clock cycles prior to the data being transferred) so that the cache/processor can schedule the forwarding of the data. Such memory controllers provide the response a fixed number of clock cycles prior to transferring the data and guarantee that the data will be transferred on the identified clock cycle. Thus, the forwarding is precisely scheduled.