Interpreters are a special class of programs that interpret instructions (e.g., opcodes, bytecodes, operators, etc.) that are different than the native instruction set of the machine upon which the interpreter is executing. The interpreter generally comprises a program or other logic configured to receive the code to be executed (e.g., a code file) and translate the non-native computer instructions, typically written in a high-level programming language, into native computer instructions.
Many interpreters are configured with an “inner loop” that is typically performed on each instruction or operator in the code to be executed via the interpreter. The inner loop consists of a fetch cycle, a decode cycle, and an execution cycle. The fetch cycle involves fetching the next instruction or operator in the code. In the decode cycle, a fetched interpreted operator is translated into a series of native instructions that implement the interpreted operator. The series of native instructions are typically organized as a collection of operator functions or subroutines with a one-to-one correspondence between subroutine and interpreted operator. The decode cycle determines, based on the interpreted opcode, which of the subroutines it needs to execute in the execution cycle. The execution cycle involves executing the actual native operators via native machine code. At the end of each loop, a branch operator is then executed to transfer control back to fetch cycle. This loop is executed once per operator or instruction, and contributes greatly to the overall performance of the interpreter.
In some interpreter implementations, a significant loss of performance may result from a pipeline stall that occurs at the point where the inner loop enters the execution cycle. For example, the actual call or switch to the machine code that implements the interpreted operator may cause a branch prediction failure because the native processor is unable to predict the location of the final destination address of the machine code implementing the interpreted operator. The branch prediction failure may require the instruction pipeline to be flushed and reloaded, which results in additional processor clock cycles. The additional clock cycles may significantly reduce the overall performance of the interpreter. Furthermore, if the operator being called is relatively short (in terms of clock cycles), the additional clock cycles may be a significant part of the entire operator execution time.
Despite the many advantages and the commercial success of interpreters, there remains a need in the art for ways to reduce the overhead of the inner loop and make available more processing cycles that may be applied to the operator execution for a corresponding increase in performance.