The present invention relates to dynamic code translation, and more specifically, to translating code in the presence of intermixed code and data.
In computing, binary translation is the emulation of one instruction set by another through translation of code. Sequences of instructions are translated from the source to the target instruction set. For example, a program may be written in a high-level programming language and translated into machine code for execution by a particular machine. The conversion process may be done, for example, in a compiler.
Static binary translation is a type of translation where an entire executable file is translated into an executable of the target architecture. This is very difficult to do correctly because not all the code can be discovered by the translator. For example, some parts of the executable may be reachable only through indirect branches whose value is only known at run-time.
Alternatively, dynamic translation looks at a short sequence of code, typically on the order of a single basic block, translates it and caches the resulting sequence. Code is only translated as it is discovered and when possible, branch instructions are made to point to previously translated code.
Dynamic binary translation differs from simple emulation in that it eliminates the emulator's main read-decode-execute loop (a major performance bottleneck). Of course, elimination of this loop may cause large overhead during translation time. This overhead is hopefully amortized as translated code sequences are executed multiple times.
In binary or dynamic translation of software code (particularly, machine code), situations may arise where the original code modifies itself. In such situations, any translations that have been made must be invalidated before the code is modified and regenerated if necessary. To detect such self-modification, it is necessary to trap all writes that access code locations. This can be done fairly easily using existing page table structures if code regions are completely separated from data regions as is required in some of the newer coding conventions.
However, in older code, written when memory was scarce, program code (or instructions) may have been intermixed with data that it may be operating on, within the same page or even in the same cache line. These cases result in false traps (i.e., traps that need not occur because the stores target the data area rather than the adjacent code areas that have been translated). Unless all locations that contain code are listed in some data structure at low granularity it is possible that innocuous stores (i.e., stores that do not touch code locations) will still cause a trap and a needless flush of unaffected translations. Yet, a listing at a byte or halfword granularity is impractical because of the potential size of the table involved, and hence the granularity of the listing is made larger, often as large as a page (4096 bytes). The problem with this larger granularity is a loss of accuracy in identifying whether the location being written into is indeed a code location or not.
Past solutions to this problem involve trapping whenever a region of memory that may contain code is written into and either invalidating all translations that involve code in that region, or adding prolog code to those translations to check whether the code has indeed been modified. The principal drawback to these solutions is the overhead involved and consequently the loss of efficiency in the performance of emulation or dynamic optimization. In the first case, trapping on a write and the process of invalidating and regenerating translations causes overhead which becomes prohibitive when there are a large number of writes to such regions, while in the second case the overhead of performing the comparison on every execution of the translated segment becomes high even when the writes are relatively infrequent.