Multi-core processors are found in most computing segments today, including servers, desktops and System on a Chip (SoC). The move to these multi-core processor systems necessitates the development of parallel programs to take advantage of performance. Programming a multi-core processor system, however, is a complex task because of the non-deterministic nature of the software execution on these systems. This non-determinism comes from many reasons, including the multitude ways in which the different threads of execution interleave in shared memory, making the reproduction and the understanding of a program execution difficult.
Computer systems execute instructions of various code. Often, the code is not designed for a particular processor, and the codes performance on a given platform can suffer. Effective optimizations can improve performance and reduce power consumption. There has been a great deal of work to develop optimization techniques. Unfortunately, these techniques have only been applied with a limited optimization scope. Complicated memory models of modern processors hinder memory operations for multi-threaded programs.
The problem of memory disambiguation arises when one reorders memory operations with respect to each other without knowing if the memory addresses of the 2 operations alias and at least one of them is a write. This introduces hazards into the program which were originally absent. Any optimization that allows such a reordering thus has to detect such aliases at runtime and recover accordingly. Any detection mechanism thus has to have 2 pieces of information 1) whether or not a memory operation has been reordered with respect to the other memory operations in some original order and 2) whether the dynamic address of the memory operations are the same. There are various implementations and proposals depending on what kind of reordering is done (only loads, loads and stores), where the reordering occurs (in hardware or software), whether instructions in the machine are dynamically scheduled or statically scheduled etc.
Statically scheduled code (VLIW) uses checked load. This means that a load is moved above store(s), and check-load instruction is inserted in the original location of the load to check if the load has any aliases. Some binary translation techniques use instruction prefixes to specify if a certain memory instruction needs to set its address in an alias disambiguation buffer or if it needs to check its address against other addresses. One other way of doing this, is to use regular ISA instructions to perform the check, which essentially checks that the two address ranges do not overlap. This check is done for the logical address only. Some other mechanism needs to check for physical page-level aliasing. All of these schemes have different ways of communicating these 2 pieces information with varying costs, overheads and capabilities.