Modern processors reorder memory operations to increase performance. The reason for such reordering might be that some of the operands to some of the memory operations (i.e. addresses for loads and stores, store data for stores) are available at different times, and executing said memory instructions in program order would result in reduced performance and utilization, as the operands to a younger operation might be available earlier than the operands to an older operation.
Such reordering, however, is not always safe. For example, if the machine reorders a younger load to location X ahead of an older store to location X, the load will produce the wrong value, and the machine will operate incorrectly unless it detects such a condition and initiates recovery actions which ultimately result in the younger load effectively executing after the older store.
Some machines reorder operations only after their addresses are known, thereby preventing unsafe re-orderings, but reducing throughput and performance if the address operands are late. For example, a single store with an unresolved address can prevent many younger loads and stores from executing early even though the likelihood of such a conflict is often very low.
The memory reordering unit (MRU) is a unit that detects conflicts between loads and stores that have been reordered by the machine and initiates recovery action when an unsafe reordering is detected. The memory reordering unit is informed of the program order (or ordering constraints) of the loads and stores, and on a violation of order, initiates recovery. The memory reordering unit is either a part of or coupled to a load-store unit, but is not itself involved in the data portion of the execution of loads and stores—it only cares about addresses and sizes to detect overlap conditions that may constitute a violation of order.
In addition to such local violations of order (where a single-threaded program would operate incorrectly), depending on the architecture being implemented, there can also be violations of global order, where the consistency model for the architecture requires that some operations not be observably reordered by CPU cores such that other agents (e.g, device Direct Memory Access (DMA) or Central Processing Unit (CPU) cores running other threads in a multi-threaded program) can detect that reordering occurred.
The MRU can also detect such violations (potential or actual) of global order if snoops are sent to the MRU as well as to the caches. The MRU can effectively implement Fray's algorithm or variants as required by the consistency model for the architecture.
Typically such a reordering unit is based on physical addresses, as two different memory operations can use different virtual addresses that resolve to the same physical address, and if only virtual addresses are compared, two memory instructions (e.g., a load and a store) may appear not to overlap because the virtual addresses used are different, even though they both actually access the same location because the physical addresses are the same. Physical addresses, however, are known later than virtual addresses, making some overlap comparisons more difficult or introducing additional recovery situations.
It would be advantageous to use virtual addresses in the memory reordering unit and somehow handle the virtual address alias problem (two virtual addresses mapping to the same physical address) and the global ordering problem (if it matters for the architecture) by some other means.
Although virtual aliases have to be handled correctly, they are very rarely used in close proximity, hence the likelihood of an incorrect reordering detection based on virtual addresses is low, but correctness demands that it be detected.