The present disclosure pertains to the field of information processing, and more particularly, to the field of managing interrupts in an information processing system.
Binary translation features the emulation of one instruction set by another through translation of binary code. Thus, sequences of instructions are translated from a source to a target instruction set. Such translation may be implemented in hardware (e.g., by circuits in a processor) or in software (e.g., run-time engines, static recompiler and emulators). Various software methods have been implemented to perform binary translation of return operations, which causes instruction execution to leave a current subroutine and resume at a point in code immediately after where the subroutine was called (e.g., return address). These methods include a fast lookup hash table; a return cache; a shadow stack and inlining. However, none of the existing methods satisfy both performance and low memory footprint requirements when multi-threaded modern applications are targeted.
The fast look-up hash table and return cache methods use a simple hashing function to look up a table to minimize the overhead of the prediction operation. As a result, both the fast look-up and return cache methods require the binary translation system to allocate at least medium amount of memory (e.g., 256 KB) per thread instance to yield high hit rates and performance gain. However, both methods significantly impact the total memory consumption of the binary translation system when running modern multi-threaded applications. For example, some web browsers create more than 80 threads. Thus, allocation of 256 KB of the return cache per thread results in consuming more than 20 MB of memory for just improving the performance of the return operations.
Higher memory consumption limits applicability of the fast look-up hash table and return cache methods for supporting modern highly multi-threaded applications. Specifically, a larger buffer footprint impacts performance by affecting data cache hit rates when accessing the buffer. Sharing of the single return cache among multiple threads is not desirable because it introduces cache pollution issues as well as thread-synchronization issues and thus impacts performance negatively as a result.
A shadow stack approach, which allocates a dedicated hidden stack for the binary translation system to track and predict the return address of the translated target, may provide both high predictability with a small memory footprint impact. However, in a host instruction set architecture (ISA) supporting a small register set (e.g. x86), the shadow stack approach suffers from higher performance overhead due to high runtime maintenance cost of the shadow stack operations including extra register spill and fill operations for managing the shadow stack pointer and implementing “push” and “pop” operations. Thus, the shadow stack approach provides little benefit for improving the performance of the binary translation system.
Inlining generally provides high prediction with the lowest overhead but may suffer from significant code bloat. As a result, this method is only used for the hot code paths that are known to be frequently executed and when a callee function is relatively small. Otherwise, code expansion negatively impacts performance by lowering I-cache hit rates as well as increasing memory footprint.