1. Field of the Invention
This invention is related to processors and, more particularly, to detecting load/store dependencies in processors.
2. Description of the Related Art
Processors generally include support for memory operations to facilitate transfer of data between the processors and memory to which the processors may be coupled. As used herein, a memory operation is an operation specifying a transfer of data between a processor and a main memory (although the transfer may be completed in cache). Load memory operations specify a transfer of data from memory to the processor, and store memory operations specify a transfer of data from the processor to memory. Memory operations may be an implicit part of an instruction which includes a memory operation, or may be explicit load/store instructions. Load memory operations may be more succinctly referred to herein as “loads”. Similarly, store memory operations may be more succinctly referred to as “stores”.
A given memory operation can specify the transfer of multiple bytes beginning at a memory address that is calculated during execution of the memory operation. For example, 16 bit (2 byte), 32 bit (4 byte), and 64 bit (8 byte) transfers are common in addition to an 8 bit (1 byte) transfer. In some cases, even 128 bit (16 byte) transfers are supported. The address is typically calculated by adding one or more address operands specified by the memory operation to generate an effective address or virtual address, which can optionally be translated through an address translation mechanism to a physical address of a memory location within the memory. Typically, the address can identify any byte as the first byte to be transferred, and the additional bytes of the multiple byte transfer are contiguous in memory to the first byte and stored at increasing (numerical) memory addresses.
Since any byte can be identified as the first byte, a given memory operation can be misaligned. Various processors define misalignment in different ways. In the strictest sense, a memory operation is misaligned if it is not aligned to a boundary that matches its data size (e.g. an 8 byte memory operation is misaligned if not aligned to an 8 byte boundary in memory, a 4 byte memory operation is misaligned if not aligned to a 4 byte boundary, etc.). Misaligned memory operations can, in some cases require additional execution resources (as compared to an aligned memory operation) to complete the access, and misalignment can be more loosely defined to be those cases in which additional resources are needed. For example, a processor often implements a cache having cache blocks. If one or more of the bytes operated upon by the memory operation are in one cache line and the remaining bytes are in another cache line, two cache lines are accessed to complete the memory operation as opposed to one cache line if the accessed bytes are included within one cache line.
Since memory operations can have arbitrary alignment and arbitrary size, dependency checking between loads and previous stores that have not yet completed is complicated. Often, a full cache block-sized mask is maintained for each incomplete store, identifying bytes within the cache block that are written by the store. A similar cache-block sized mask is generated for each load, and compared to the store masks. A dependency can thus be detected by comparing the cache block address of the store to the cache block address of the load (i.e. the address less the least significant bits that form an offset into the cache block) for equality, and detecting that at least one mask bit corresponding to the same byte is set in both the store mask and the load mask. However, storage for the masks is expensive.