1. Field of the Invention
Systems and methods consistent with the present invention relate to reduction of the execution time of bytecode in a Java virtual machine (JVM). More particularly, the present invention relates to a system and a method for reducing the bytecode execution time in the JVM, whereby the entire bytecode execution time can be reduced by keeping the number of accesses of a stack memory to a minimum when carrying out machine code operations.
2. Description of the Related Art
Java is an object-oriented programming language which has become the de facto standard in network programming. These days, Java is used in embedded systems and in systems including a microprocessor or a microcontroller. Characteristics of Java environments include object-orientation, automatic garbage collection and runtime security, and a part thereof can be successfully used in embedded applications.
However, the performance of the runtime environment has been poor because of the resource overhead required to execute Java code in the virtual machine of an interpreter or a just-in-time (JIT) compiler.
Java binary code, called “bytecode”, is distributed in one or more class files. Bytecode is instructions of a hypothetical computer that was specially designed for the execution of Java programs.
Since the conventional CPU cannot execute bytecode, the bytecode is executed in a software layer called the Java virtual machine (JVM). The JVM is an abstract machine specification that was published by Sun Microsystems, Inc.
Execution of a Java program will now be described. A Java program, which has the “.java” extension, is compiled by a compiler and converted into a java executable file having the “.class” extension. The class file is interpreted by an interpreter and is executed. Interpreting is done in three stages: class loading to load all the classes necessary for execution of the program, verification to check the formats of the class files, licenses to access, and format changes of data, and actual execution of the program.
The hierarchical structure of Java comprises a Java program written in the Java language, the Java platform including the JVM and the Java application programming interface (API), and a hardware-dependent platform. Under this structure, a Java executable file is composed of bytecode, which is platform independent, and thus, it can run on any platform that has the Java runtime environment (JRE), independent of the hardware of the system.
Java technology has a number of advantages including platform independence, that is, secure Write Once Run Anywhere (WORA) capability and dynamic extensibility, and therefore, it is used in a variety of fields. Most web servers are based on Java technology since it has been actively used as a server technology for web services. Java technology is also being employed in embedded devices as an environment to provide user services or execute control applications.
FIG. 1 illustrates instruction processing to conduct operations of a conventional machine code.
As illustrated, machine code having a start address op(A0) of a basic block executes an instruction to transfer data stored in the +4 position to a register 1 (r1) at the current local point (lp), and executes an instruction to store the data stored in the register 1 (r1) in the +4 position at the current position of the stack pointer (sp), and then changes the address of the stack pointer to the address by +4.
Machine code having op(A1) of the basic block address executes an instruction to transfer data stored in the +8 position to a register 1 at the current local point, and executes an instruction to store the data stored in register 1 in the +4 position of the stack pointer, and then changes the address of the stack pointer by +4.
Machine code having the basic block address of op(A2) executes an instruction to transfer data stored in the −4 position to a register 2 (r2) at the current stack point, and conducts an instruction to transfer the data stored in the −4 position to the register 1 at the current stack point (A). In addition, it executes an instruction to add the values in register 1 and register 2, and store the result in register 1 (B), and executes an instruction to store data stored in the register 1 at the current position of the stack pointer +4, and then changes the address of the stack pointer by +4 position (C).
Three types of machine instructions are present in the machine code: a pop (A), a push (C) and a core instruction (B). The pop instruction is to transfer data in an upper part of the stack to a register, the push instruction is to transfer data stored in the register to the stack, and the core instruction refers to all the other instructions. In the basic block, the instructions are in the sequence of pop, push and core. The pop instruction uses successive registers in descending order, i.e., Reg_k, Reg_(k−1), . . . , Reg_1. The push instruction uses successive registers in ascending order, i.e., Reg_1, . . . , Reg_p. The core instructions use the same registers that the pop and push instructions use.
The machine code having the basic block address of op(A3) executes an instruction to transfer data stored at the current position of the stack pointer −4 to register 1, and executes an instruction to store the data stored in register 1 at the current position of the stack pointer +12.
Accordingly, in order to execute blocks A0 to A3, ten instructions to load/store from/to the stack six times, and an instruction to load/store from/the the local point three times must be executed.
To reduce memory accesses in the Java environment, bytecode folding, database (DB) cache and stack registers have been used.
Bytecode folding refers to a method that patterns in advance three or four successive instructions that are frequently used when executing the bytecode, and when these patterns are detected, optimized machine code corresponding to the whole pattern is executed in lieu of executing every instruction. According to bytecode folding, an operation result belonging to a pattern is stored in a register and it can be directly utilized when another instruction belonging to the same pattern uses it, thereby being capable of reducing the number of stack accesses.
Typical operations of bytecode folding will now be described. Among bytecode to be executed, three or four successive instructions are searched and it is checked whether they are identical to the limited number of predetermined patterns. When identical patterns are searched for, provided machine code is executed in lieu of executing the instructions, and the positions of the instructions are changed. However, if the instructions to be executed are not identical to the predetermined patterns, each instruction is executed.
The bytecode folding may produce an improvement in performance because fewer instructions have to be executed, and the number of memory accesses is reduced. However, only predetermined patterns can be applied, and thus, the application scope is limited. Also, it is very effective in an application having bytecode use patterns that cannot be processed in a folding unit.
A DB cache is embodied within the Pico Java processor. A DB cache is a kind of instruction cache and is a region to store bytecode to be executed in the pipeline stage and machine code generated as a result of bytecode folding, both of which are input by the folding unit. The DB cache can reduce memory accesses as it acts as an instruction cache, and in particular, further folding may not be executed when the already folded bytecode is stored, thereby being capable of further improving the performance
Operations of the DB cache will now be described. In a fetch stage, it is checked whether a corresponding instruction address is present in the DB cache. As a result, if no corresponding address is present, the process proceeds to the general pipeline execution, and at the same time the bytecode is sent to an instruction folding unit.
Then the instructions folding unit executes bytecode folding and sends the folding result or the instruction to the DB cache so it can be stored therein. If a corresponding instruction address is present in the patch stage, an instruction selected from the DB cache is executed.
When the bytecode is fetched from the DB cache in this way, memory accesses are reduced. When the folded machine code is fetched, memory accesses are reduced and the folding time is shortened.
The DB cache stores the bytecode and the machine code as the folding result in an additional cache in order to reduce memory accesses, and make further folding operations unnecessary. However, the process still retains the disadvantages described above.
A stack register realizes a stack by combining a round-shaped register and a memory with the limited number for a stack machine such as Java. In order to allow an execution unit to access the register when it needs to use the stack, the top region of the stack is filled and used in the register. When the register becomes filled, the bottom portion of the stack, which has been scarcely used, is dumped into the memory.
Typical operations of the stack register will now be described. The execution unit requests pop and push operations on the stack.
In case of a push operation, a value is added to the register. In case of a pop operation, a value is removed from the register, thereby creating an empty space in the register.
When the entire top of the stack is allocated by successive pushes, the bottom portion of the stack is not dumped into the memory, whereas the content dumped to the memory is transferred to the register when the top space of the stack is empty due to successive pops.
However, the stack register composes a stack using limited registers and memory, and it is assured that the execution unit can always use the registers, but not the memory thereby improving push and pop performance. However, it cannot remove push and pop instructions for an ineffective stack already present in the code.