1. Field of the Invention
This invention is related to the field of processors and, more particularly, to handling load/store operations in processors.
2. Description of the Related Art
Processors generally include support for loads and stores to facilitate transfer of data between the processors and memory to which the processors may be coupled. As used herein, a load is an operation specifying a transfer of data from a main memory to the processor (although the transfer may be completed in cache). A store is an operation specifying a transfer of data from the processor to memory. Loads and stores may be an implicit part of an instruction which includes a memory operation, or may be explicit instructions.
A given load/store may specify the transfer of multiple bytes beginning at a memory address calculated during execution of the load/store. For example, 16 bit (2 byte), 32 bit (4 byte), and 64 bit (8 byte) transfers are common in addition to an 8 bit (1 byte) transfer. The number of bytes transferred for a given load/store is generally referred to as the size of the transfer. The address is typically calculated by adding one or more address operands specified by the load/store to generate an effective address or virtual address, which may optionally be translated through an address translation mechanism to a physical address of a memory location within the memory. Typically, the address may identify any byte as the first byte to be transferred, and the additional bytes of the multiple byte transfer are contiguous in memory to the first byte and stored at increasing (numerical) memory addresses.
Many processors execute loads/stores speculatively (that is, before the results can be committed to architected state or memory). For stores, the updated bytes are often stored in a queue until the stores can be committed to a data cache (or to memory). Thus, a load may be executed, and one or more bytes updated responsive to a previous uncommitted store in the queue may be accessed responsive to the load. However, since there are various sizes of loads and stores and also since loads and stores of the same size may partially (but not fully) overlap, it is possible that one or more additional bytes that are not updated responsive to the previous uncommitted store may be accessed responsive to the load. For brevity herein, accessing bytes responsive to a load may be referred to as the load accessing bytes. Similarly, updating bytes responsive to a store may be referred to as the store updating bytes.
If a load accesses one or more bytes updated by a previous uncommitted store and also accesses one or more additional bytes not updated by a previous uncommitted store, hardware may be implemented to select the bytes updated by the store from the queue and the additional bytes from another source (such as a data cache) to obtain the bytes accessed by the load. However, such hardware may be complex and expensive to implement. Alternatively, the load may be cancelled and attempted again at a later time, after the previous store is committed. However, such a design may experience a loss of performance due to the delay of the load and due to the resources consumed unnecessarily to execute the load, only to cancel it and wait for subsequent reexecution.