In the binary translation domain, there are some barriers that hinder aggressive instruction scheduling and software pipelining. For example, one of these barriers in binary translation is the requirement of the guest architecture's memory ordering which is not present in a source to binary compiler. When binary translation is performed on a code with strong memory order instruction set architecture (ISA), the translated code cannot change the order of memory operations such as, store to store, load to load, and load to store, even if there is no dependence between these memory operations. This is to prevent the other processor(s) from seeing their memory changed in a different order compared to the original binary code. In x86 ISA architecture for example, the memory ordering rules do not restrict moving a non-dependent load instruction ahead of a store operation but do restrict moving a non-dependent store instruction ahead of another store instruction.
The atomic region formed between commit instructions or points allow the reordering of load instructions and store instructions inside the atomic region. Some processors support transactional execution in which a region of instructions is executed atomically with all the memory state changed in the region becoming visible at once when a commit instruction is executed. Such transactional execution support allows reordering of memory instructions inside the atomic region without other processors seeing memory changed in a different order.
Software pipelining, which is very efficient in the optimization of code in a compiler, bring lesser benefit in the binary translation domain because of the strong memory order requirements. Software pipelining aggressively reorder instructions across different iterations. For example, a store instruction in a later iteration may need to be reordered before a store instruction or a load instruction in a previous iteration to get a highly compact kernel loop. However, this breaks the store to store (or load to store) ordering and cannot be done in a binary translator. If a binary translator is to follow these memory ordering constraints, the binary translator can only bring less benefit compared to the software pipelining algorithm of a compiler.