1. Field
The following description relates to a bytecode interpreter in a computing system, and more particularly, to a branch processor and method that are capable of increasing the performance of a bytecode interpreter by reducing a branch misprediction and a pipeline stall penalty which may occur in a bytecode interpreter that drives a virtual machine.
2. Description of the Related Art
Numerous studies have been conducted in an effort to increase the performance of a virtual machine that processes JAVA® bytecode. The performance of a virtual machine may be enhanced by a Just-in-time compilation (JITC).
However, an embedded system has a difficulty in actively introducing JITC because of resource limitation and latency which is sensitive to a user of the device. As a result, an embedded system typically utilizes JITC in a limited manner. In addition, because not all of the code of an application program is complied in JITC format, the performance of an interpreter is critically important.
A direct-thread method is an alternative method for improving the performance of the interpreter. In this method, a next virtual instruction is fetched at an end of a bytecode handler. This method has been recently employed for an ANDROID® DALVIK® virtual machine. Another alternative method is an ARM JAZELLE® DBX (Direct Bytecode eXecution) that processes bytecode entirely in hardware.
The direct-thread method has disadvantages in that an indirect branch instruction may confuse a branch predictor of a real processor (e.g., x86 or ARM), resulting in an increased amount of branch misprediction and a deterioration in performance.
For a processor with a general pipeline structure, if branch misprediction occurs, all of the instructions which have been speculatively executed are discarded, and the processor returns to a state at which the branch started. For example, if a high-end superscalar processor such as an ARM CORTEX® A9 is introduced in an embedded device, performance deterioration due to branch misprediction may increase.
An indirect branch instruction used in the direct-thread method jumps to an address of a different handler based on a next virtual instruction in the same program counter (PC), and thus a general branch predictor based on a PC may not work properly. To solve such a drawback, various methods including selective inlining and context threading have been introduced, but disadvantages arise in that a code size increases and overhead of call/return occurs.
In addition, hardware implementation such as JAZELLE® DBX has high performance but requires a large amount of hardware resources, and cannot handle a new type of bytecode such as DALVIK®.