Computer programming languages are used to create applications consisting of human-readable source code that represents instructions for a computer to perform. Before a computer can follow the instructions however, the source code must be translated into computer-readable binary machine code.
A programming language such as C, C++, or COBOL typically uses a compiler to generate assembly language from the source code, and then to translate the assembly language into machine language which is converted to machine code. Thus, the final translation of the source code occurs before runtime. Different computers require different machine languages, so a program written in C++ for example, can only run on the specific hardware platform for which the program was written.
Interpreted programming languages are designed to create applications with source code that will run on multiple hardware platforms. Java™ is an interpreted programming language that accomplishes platform independence by generating source code that is converted before runtime to an intermediate language known as “bytecode” or “virtual machine language.” At runtime, the bytecode is translated into platform-appropriate machine code via interpreter software, as disclosed in U.S. Pat. No. 4,443,865. To interpret each bytecode, interpreter software performs a “fetch, decode, and dispatch” (FDD) series of operations. For each bytecode instruction the interpreter software contains a corresponding execution program expressed in native central processing unit (CPU) instructions. The interpreter software causes the CPU to fetch or read a virtual machine instruction from memory, to decode the CPU address of the execution program for the bytecode instruction, and to dispatch by transferring control of the CPU to that execution program. The interpretation process can be time-consuming.
As disclosed in PCT Patent Application No. WO9918484 adding a preprocessor (a virtual machine interpreter (VMI)) between a memory and a CPU improves the processing of virtual machine instructions. In essence, the virtual machine is not a physical structure, but rather is a self-contained operating environment that interprets bytecode for the hardware platform by selecting the corresponding native machine language instructions that are stored within the VM or in the CPU. The native instructions are then supplied to and consecutively executed in the CPU of the hardware platform. A typical virtual machine requires 20–60 cycles of processing time per bytecode (depending on the quality and complexity of the bytecode) to perform an FDD series of operations. First, a VMI reads (fetches) a bytecode from memory. Next, the VMI looks up a number of properties of (decodes) the fetched bytecode. The properties accessed by the VMI determine how the bytecode will be processed into native instructions for execution in the CPU. While the CPU is executing an instruction, the VMI fetches and processes the next bytecode into CPU instructions. The VMI can process simple bytecodes in 1–4 cycles.
While interpreting a sequence of bytecodes, a virtual machine may encounter a bytecode that represents a conditional branch instruction, hereinafter referred to as a CBI. When a CBI is encountered, the VMI generates a sequence of native instructions that causes the CPU to determine whether the condition is fulfilled. The decision to execute the branch therefore depends on earlier computations, which in the VMI concept were executed in the CPU with the results remaining in CPU registers. For example, the Java™ bytecode “ifeq n” offsets the bytecode counter by “n”, but only if the top of the stack is zero (i.e., the previous computation left the value 0 on the stack). The value of the branch condition (here, the top of the stack) must be retrieved and written to the control register of the VMI (which is reserved specifically for branch conditions). If the condition has been fulfilled, the CBI causes an update to the VMI bytecode counter (a jump) which alters the sequence of bytecodes to be executed. Typically, when one instruction is being processed in the VMI the next instructions to be processed are already in the VMI pipeline, so if an instruction results in a branch the bytecodes already in the VMI pipeline must be flushed. Additionally, the “pipelined” structure of processor hardware creates an inherent delay for transporting instructions and data between the instant that the instructions and/or data are dispatched to the processor and the instant when the processor effectively executes the instruction and/or processes the data. Specifically, because the typical CPU has a multistage (typically, 3 to 8 stages) pipeline the write operation will not be executed immediately after the instruction is issued. In the case of a CBI, additional delay occurs while the CPU determines whether the condition is fulfilled and transfers the result of this determination to the VMI. If the value of the branch condition (the control value) indicates that the branch condition is fulfilled, several (depending on the size of the CPU pipeline) instructions will already have entered the CPU pipeline. To keep the CPU and instruction cache busy, a series of “no operation” (NOP) commands can be generated while waiting for the control value that indicates whether the condition is fulfilled. The control value is received while the CPU executes the next to the last NOP and the VMI generates the last NOP. After making the determination, the VMI's pipeline requires several cycles for the VMI to retrieve the bytecode representing the next instruction from the VMI's cache.
Other approaches speculatively execute potential branch instructions by predicting whether an instruction will result in a branch to another location. An example of this approach is directed to RISC (Reduced Instruction Set Computing) microprocessors, and provides a branch instruction bit to determine which conditional branches are “easy” to predict, and for those branches, uses software branch prediction to determine whether to execute the jump. Software branch prediction predicts branches using a software-controlled prediction bit. If the branch is determined to be “hard” to predict, the branch is predicted using hardware branch prediction (such as a branch prediction array). This approach discloses using a branch prediction scheme which predicts that a branch will be taken if the offset is less than zero (a backward branch) and that a branch will not be taken if the offset is greater than zero (a forward branch). A disadvantage of this approach is the consumption of processor resources for the making and updating the ease-of-prediction determination, which is based upon whether historical operation of the branch taken is important in determining whether the branch will be taken.
In another branch prediction approach, bits from the address of the potential branch instruction are compared to bits concatenated from a local branch history table and a global history register. The result of the comparison is used to read a branch prediction table. A disadvantage of this approach is the consumption of resources required to perform the concatenation and comparison operations and to store and access the branch prediction table. Furthermore, the approach does not disclose a means of correcting mispredictions. A similar methodology is disclosed in U.S. Pat. No. 5,136,696, wherein a branch prediction is made by the branch cache based on the address of a potential branch instruction. According to that disclosure, where the prediction is wrong the corresponding instruction is invalidated but is executed anyway, so that the branch cache can be updated with the correct prediction in case the same instruction is encountered again. The CPU pipeline is flushed during the same cycle as the branch cache update by invalidating all of the instructions in the first seven stages of the pipeline and loading the contents of a program counter register.
Because conditional branches occur frequently (approximately 10% of all virtual machine instructions) and are process-intensive when processed according to existing approaches which achieve high accuracies but consume processor resources, there is a need for a system of interpreting programming languages that accurately and efficiently executes instructions intended by conditional branch instruction bytecodes while increasing the processing speed.