In multiprocessor systems, processors often have one or more layers of memory cache, which improve performance both by speeding access to data and reducing traffic on the shared memory bus. However, while memory caches can greatly improve performance, they also present new challenges. For example, two processors that examine the same memory location may receive different results since one processor may use a stale cached value, whereas the other may pull an updated value from main memory. Furthermore, many compilers and computer architectures rewrite code to optimize the execution. For example, a processor may rewrite or reorder code to take advantage of the current data stored in its cache. However, many of the optimizations only ensure consistent program semantics for the case where a single processor/thread is executing the program. As a result, in a multi-processor/multithreaded environment, the reordering could result in unexpected behavior and inconsistent program states. For example, the computer architecture might perform a load/store early when it is most convenient to do so provided that the variable is not relied upon until the original program index of the instruction. However, with multiple threads or processors, performing operations early that are relied upon by other threads could result in a state that would otherwise be impossible to encounter.