(1) Field of the Invention
The present invention relates to virtual machines and to virtual machine compilers. In particular, the invention relates to a technique for increasing the execution speed of virtual machines.
(2) Description of the Prior Art
Standard Virtual Machine
Virtual machines are used to have a same program executed by computers, such as personal computers and workstations, that include different types of CPU. Virtual machines are useful in the field of communications, especially on a network to which different types of computers are connected, since they can overcome the differences in CPU architecture between computers and so allow the efficient and high-speed use of shared resources. Note that in this specification, CPUs are called xe2x80x9creal machinesxe2x80x9d.
A virtual machine is a virtual processor, which is to say, a processor achieved by executing software. A virtual machine decodes and executes executable programs (hereinafter referred to as xe2x80x9cvirtual machine programsxe2x80x9d or xe2x80x9cvirtual machine instruction sequencesxe2x80x9d) that are sequences of instructions (hereinafter, xe2x80x9cvirtual machine instructionsxe2x80x9d) specific to the virtual machine. Virtual machines are normally realized by programs (hereinafter, xe2x80x9creal machine programsxe2x80x9d or xe2x80x9creal machine instruction sequencesxe2x80x9d composed of instructions (hereinafter, xe2x80x9creal machine instructionsxe2x80x9d) specific to a target real machine on which the virtual program is to be run. Maintaining a high execution speed is a central issue for virtual machines, so that many virtual machines have a stack architecture.
One example of conventional virtual machines are the JAVA (trademark) virtual machines developed by SUN MICROSYSTEMS, INC.
FIG. 1 is a block diagram showing a construction of a conventional virtual machine 4400 with a stack architecture, such as a JAVA virtual machine. The virtual machine 4400 comprises the instruction storing unit 4401, the decoding unit 4402, the executing unit 4410, and the stack 4420. The instruction storing unit 4401 stores a virtual machine program to be executed. The decoding unit 4402 reads and decodes a virtual machine instruction. The execution unit 4410 executes operations according to the decoded data produced by the decoding unit 4402. The stack 4420, which is a LIFO (last-in first-out) memory area, temporarily stores data used in the processing of the execution unit 4410. In FIG. 1, solid lines show the data flows, while dotted lines show the control flows.
The decoding unit 4402 includes the decode table 4406, the program counter (PC) 4404, the instruction reading unit 4403, and the search unit 4405. The decode table 4406 stores data, such as jump addresses of microprograms (stored in the executing unit 4410) that correspond to all of the virtual machine instructions that can be executed by the virtual machine 4400 with a stack architecture. The program counter (PC) 4404 holds the address of the next instruction to be read from the instruction storing unit 4401. The instruction reading unit 4403 reads this next instruction. The search unit 4405 refers to the decode table 4406 to find a jump address corresponding to the read instruction and outputs the jump address to the execution unit 4410. In this specification, a microprogram is a real machine program that corresponds to a virtual machine instruction.
The executing unit 4410 includes a microprogram storing unit 4411 and a stack pointer (SP) 4412. The microprogram storing unit 4411 stores microprograms, which are real machine programs corresponding to virtual machine instructions, in advance at locations indicated by jump addresses. The stack pointer (SP) 4412 indicates the address at the top of the stack 4420.
FIG. 2 is a table for describing the instruction set of the virtual machine 4400. In FIG. 2, all of the virtual machine instructions that the virtual machine 4400 can decode and execute are shown in mnemonic form, along with the operation content of each instruction, changes in the content of the stack 4420 caused by each instruction, and the value of the SP 4412 after execution. In FIG. 2, the legend xe2x80x9cs0xe2x80x9d indicates the value at the top of the stack 4420, while xe2x80x9cs1xe2x80x9d indicates the second highest value. As one example, the notation xe2x80x9csp←s0+s1xe2x80x9d for the virtual machine instruction xe2x80x9cAddxe2x80x9d denotes that the value at the top of the stack is set equal to a sum of the top and second highest values of the stack before execution. The notation xe2x80x9csp←sp-1xe2x80x9d denotes that the height of the stack decreases by one due to the execution of the xe2x80x9cAddxe2x80x9d instruction.
FIG. 3 shows the stored contents of the decode table 4406 shown in FIG. 1. This decode table 4406 includes opcodes 4406a that indicate the operation types of virtual machine instructions, jump addresses 4406b which are the addresses of microprograms in the microprogram storing unit 4411 that correspond to these virtual machine instructions, and numbers of operands 4406c that show the number of operands in each virtual machine instruction. Here, each opcode is set as 1-byte long, and operands are counted in one-byte units. Virtual machine instructions, which may include only an opcode or only an operand, that are represented by a physical bit pattern are hereinafter referred to as xe2x80x9cvirtual machine codexe2x80x9d.
FIGS. 4A-4D show examples of the microprograms stored in the microprogram storing unit 4411 in FIG. 1. The microprograms in FIGS. 4A-4C respectively correspond to the virtual machine instructions xe2x80x9cPushxe2x80x9d, xe2x80x9cAddxe2x80x9d, and xe2x80x9cMultxe2x80x9d, while the microprogram in FIG. 4D shows a microprogram that forms the common latter part of each of the microprograms in FIGS. 4A-4C. This microprogram in FIG. 4D is a real machine program for jumping to the next virtual machine instruction. The operation contents of the real machine instructions in these microprograms are shown in FIG. 5. The virtual machine 4400 itself is realized by a real machine that can decode and execute the real machine instructions shown in FIG. 5. Note that the PC 4404 is physically realized by register #2 (r2) of the real machine, and the SP 4423 by register #3 (r3).
FIG. 6 is a flowchart showing the processing of decoding unit 4404 shown in FIG. 1. The instruction reading unit 4403 is instructed by the execution unit 4410 via a signal line R to read the next instruction (steps 4502-4503) and so reads the virtual machine instruction with the address stored in the PC 4404 from the instruction storage unit 4401 (steps 4504-4505). Following this, search unit 4405 refers to the decode table 4406 to find a jump address and operands corresponding to the read virtual machine instruction, outputs the jump address and operands (if any) to the executing unit 4410 as decoded data (step 4506), and gives the executing unit 4410 a xe2x80x9cread endxe2x80x9d notification via the signal line R (step 4507). This xe2x80x9cread endxe2x80x9d notification marks the completion of decoding for one virtual machine instruction.
FIG. 7 is a flowchart showing the processing in step 4506 in detail. The search unit 4405 compares 1-byte of virtual machine code (the opcode) read by reading 4403 with one opcode 4406a in decode table 4406 at a time until a match is found (steps 4802-4807). The search unit 4405 then reads the jump address 4406b and the number of operands 4406c corresponding to the matching opcode 4406a from the decode table 4406. The search unit 4405 outputs the read jump address 4406b to the executing unit 4410 (step 4808), has the instruction reading unit 4403 read as many operands as are indicated by the number of operands 4406c from the instruction storing unit 4401, and outputs the operands to execution unit 4410 (steps 4809-4813).
The flowcharts of FIGS. 6 and 7 show the processing when decoded data sent from the decoding unit 4402 is directly transferred to the executing unit 4410. The flowchart in FIG. 8 shows the case when the decoded data is transferred to the executing unit 4410 via a buffer that is capable of storing sets of decoded data. In this latter case, the reading of virtual machine instructions from the instruction storing unit 4401 and the subsequent decoding may be performed independently of the execution by the executing unit 4410 and repeated as long as there is space in the buffer (steps 4605-4613).
FIG. 9 shows the processing of executing unit 4410 in FIG. 1. The executing unit 4410 initializes SP 4412 and PC 4404 (step 4702) and repeats the processing described below for each virtual machine instruction (steps 4703-4707). That is, the executing unit 4410 instructs the instruction reading unit 4403 via the signal line R to read the next virtual machine instruction (step 4703). The executing unit 4410 then reads decoded data transmitted from the search unit 4405, jumps to a jump address that is included in the decoded data and that specifies a microprogram stored in the microprogram storing unit 4411, the microprogram corresponding to the read virtual machine instruction, and executes the microprogram until the executing unit 4410 receives a xe2x80x9cread endxe2x80x9d notification via the signal line R (steps 4704-4707).
FIG. 10A shows a sample program for describing a specific example of the processing of the virtual machine 4400. In this example, instruction storing unit 4401 stores a virtual machine program for calculating the arithmetic expression xe2x80x9c2*(3+4)xe2x80x9d shown in FIG. 10B.
FIG. 10C shows the decoded data that is sequentially outputted from the decoding unit 4402 when the virtual machine program shown in FIG. 10A is decoded and executed by the conventional virtual machine 4400. The decoding unit 4402 successively outputs jump addresses and the necessary operands corresponding to the decoded virtual machine instructions as decoded data to the executing unit 4410.
FIGS. 11A and 11B show the states of the PC 4404, the SP 4412, and the stack 4420 before and after the execution of the each virtual machine instruction when the executing unit 4410 executes the virtual machine program shown in FIG. 10A in accordance with the decoded data sequences shown in FIG. 10C. These figures show the processing of the virtual machine program split into a former and a latter part. Here, PC 4404 indicates the address of the next virtual machine instruction to be executed in the virtual machine program. The addresses of virtual machine instructions are the numbers shown to the left of the virtual machine instructions in FIG. 10A. The initial value of the PC 4404 is xe2x80x9c1xe2x80x9d. The SP 4412 indicates the top of stack 4420, and so marks a position at which an item was most recently stored or read. The initial value of SP 4412 is xe2x80x9cxe2x88x921xe2x80x9d and indicates that the stack 4420 is empty. As can be understood from FIGS. 11A and 11B, the calculation of the arithmetic expression xe2x80x9c2*(3+4)xe2x80x9d is completed when PC 4404 indicates xe2x80x9c9xe2x80x9d.
The major problem for conventional virtual machines like virtual machine 4400 is how to increase execution speed. Processes such as the decoding of virtual machine instructions generate overheads, so that virtual machines end up operating at a much slower speed than when an equivalent real machine program is directly executed by a real machine. To improve the performance speed of virtual machines, the following methods have been proposed.
First Conventional Technique
In this first conventional technique, the storage area at the top of the stack (TOS) is assigned not to memory but to a specified register of a real machine. Hereinafter, such a storage area is called the TOS variable (See pp315-327 xe2x80x9cPLDIxe2x80x9d (1995), ACM).
FIGS. 12A-12D are microprograms corresponding to the principal virtual machine instructions that are stored in a microprogram storage unit of a virtual machine based on this first conventional technique. These figures correspond to FIGS. 4A-4D that were used to describe the virtual machine 4400. This example uses the following physical mapping. The TOS variable is assigned to register #0 (r0) of the real machine and, as in FIGS. 4A-4D, PC 4404 to register #2 (r2), and SP 4421 to register #3 (r3).
FIGS. 13A and 13B show the changes in the states of the PC 4404, the SP 4412, the TOS variable 4421, and the memory stack 4422 (the part of the stack 4420 that is allocated to memory) as a virtual machine provided with the microprograms shown in FIGS. 12Axcx9c12D executes the virtual machine program shown in FIG. 10A. These figures shows the processing split into a former and a latter part and correspond to the FIGS. 11A and 11B that were used to describe the operation of the virtual machine 4400. As before, the calculation of the arithmetic expression xe2x80x9c2*(3+4)xe2x80x9d is completed in FIGS. 13A and 13B when the PC 4404 indicates xe2x80x9c9xe2x80x9d.
As can be seen by comparing FIGS. 12Axcx9c12D with FIGS. 4Axcx9c4D, the first conventional technique makes fewer accesses to the memory. When the virtual machine 4400 executes a virtual machine instruction such as an addition xe2x80x9cAddxe2x80x9d or a multiplication xe2x80x9cMultxe2x80x9d, two reads and one write are performed for the stack 4420, making a total of three memory accesses for one virtual machine instruction. With the first conventional technique, the assigning of the TOS variable to a register enables the same instruction to be executed with only one access to the memory stack 4422. This results in the execution speed being increased in proportion to the reduction in the number of memory accesses.
Second Conventional Technique
A second conventional technique uses a xe2x80x9cnative codingxe2x80x9d method, in which a predetermined part of a virtual machine programs is written in real machine instructions and is directly executed by a real machine. As a result, identifiers are used to indicate that such predetermined part is written using real machine instructions.
As one example, a JAVA virtual machine can store the constant name xe2x80x9cACC_NATIVExe2x80x9d (256) into an access flag (such as the 16-bit flag xe2x80x9caccess_flagsxe2x80x9d that forms part of the xe2x80x9cmethod_infoxe2x80x9d structure) of a class file that includes a virtual machine program to show that part of the program is written in real machine instructions (see the Java Bytecodes and the JAVA Virtual Machine Specification, 1995 editions, produced by SUN MICROSYSTEMS, INC.).
In this way, this second conventional technique improves execution speed by having the real machine directly execute a predetermined part of a program.
Third Conventional Technique
A third conventional technique uses a xe2x80x9cjust-in-timexe2x80x9d (JIT) compiler that compiles parts of a virtual machine program as required during execution. Here, compiling refers to the replacement of virtual machine instructions with real machine instructions (see Laura Lemay et al., Java Gengo Nyumon (An Introduction to JAVA), Prentice Hall, 1996, and Laura Lemay and Charles L. Perkins, Teach yourself JAVA in 21 days). Virtual machines that use a JIT compiler have the real machine directly execute compiled parts of a virtual machine program, and so increase the overall execution speed of virtual machine programs.
Fourth Conventional Technique
A fourth conventional technique is used when computers on a network execute virtual machine programs that they download from a server computer. In this technique, the code in a virtual machine program is compressed beforehand using LZ (Lempel-Zif) methods or Huffman coding to reduce the time taken by file transfer (see Japanese Laid-Open Patent Application H07-121352 or H08-263263).
With this technique, an increase in execution speed can be obtained if the time taken to transfer the virtual machine program forms a large part of the overall processing time required to execute the virtual machine program.
The first to fourth conventional techniques described above have the following problems.
Problems with the First Conventional Technique
The first conventional technique, where the TOS variable is allocated to a register of a real machine, has a drawback in that it is not suited to real machines with superscalar architecture that have become increasingly inexpensive in recent years. This means that the improvements in the execution speed for a superscalar real machine (hereinafter, xe2x80x9csuperscalar machinexe2x80x9d) are relatively small when compared with the improvement for a standard real machine (hereinafter called a xe2x80x9cstandard machinexe2x80x9d) that is incapable of parallel processing. This is described in more detail below.
The following describes the standard operation and notation of a pipeline used by a register machine, such as a superscalar machine or a standard machine, with reference to FIGS. 14-22.
FIG. 14 shows the mnemonics used to indicate each stage included in the pipeline. The superscalar machine and a standard machine described below are assumed to each have a pipeline containing the five stages shown in this figure.
FIG. 15 shows the ideal pipeline flow for a standard machine. In this example, four real machine instructions are sequentially processed with each pipeline stage taking exactly one clock cycle. Each pipeline stage is performed in parallel for a different real machine instruction so that as the long-term average, one instruction is executed in one clock cycle.
FIG. 16 shows an ideal pipeline flow for a superscalar machine. This superscalar machine has two separate pipelines. In FIG. 16, two real machine instructions are executed in one clock cycle as the long-term average, giving the superscalar machine a throughput twice that of the standard machine.
FIG. 17 shows a pipeline flow for a standard machine when pipeline hazards occur. Here, instruction B uses the execution result of instruction A, which is to say, instruction B has a true dependency (also called a data dependency) on the preceding instruction A. Since the execution result of instruction A cannot be obtained until the memory access stage MEM is completed, the execution of instruction B is delayed, which causes the hazard as shown by xe2x80x9c-xe2x80x9d in the figure.
When the processing of an instruction is delayed in a real machine with a pipeline structure, the processing of the following instructions is also delayed. This is shown in FIG. 17, where the processing of instruction C, which follows instruction B, is also delayed.
FIG. 18 shows a pipeline flow for a superscalar machine when pipeline hazards occur. Here, instruction B1 has a true dependency on the preceding instructions A1 and A2. Here, the reason that a pipeline hazard occurs in the fifth clock cycle for the instruction C2 is that the two processing-units (arithmetic logic units or xe2x80x9cALUsxe2x80x9d) provided in the processor are busy with the execution of the preceding instructions B1 and C1. This means that instruction C2 cannot be executed in that cycle.
FIGS. 19 and 20 correspond to FIGS. 17 and 18, and show pipeline flows when two clock cycles need to pass before values obtained through memory access (MEM) can be used. In reality, in most real machines, obtaining a value from the primary cache takes two clock cycles. Note that obtaining a value from the secondary cache takes more clock cycles.
FIGS. 21 and 22 respectively show pipeline flows for a standard machine and superscalar machine when instructions A1 and A2 are instructions that indicate a jump destination using a register. The jump destinations of these instructions are not known until the register reference stage (RF) is completed, so that the succeeding instructions B, B1, and B2 that are fetched as per normal during the register reference operation are canceled (as shown by the xe2x80x9cxxe2x80x9d in FIGS. 21 and 22) in the third clock cycle following the RF stages.
The following describes the specific problems of a superscalar machine and a real machine of the first conventional technique, with reference to FIGS. 23-26.
FIGS. 23-26 show pipeline flows when the virtual machine of the first conventional technique is realized by a real machine executing the virtual machine program shown in FIG. 10A. In detail, these figures show the pipeline flow for the latter part (the jump processing shown in FIG. 12D) of the microprogram (of FIG. 12A) with the address 7 that corresponds to the virtual machine instruction xe2x80x9cAddxe2x80x9d and the pipeline flow for the former part (the multiplication processing) of the microprogram (of FIG. 12C) with the address 8 that corresponds to the virtual machine instruction xe2x80x9cMultxe2x80x9d. FIGS. 23 and 24 respectively show the pipeline flows for a standard machine and a superscalar machine where one clock cycle needs to pass before a value read during a memory access can be used, while FIGS. 25 and 26 respectively show the pipeline flows for a standard machine and a superscalar machine where two clock cycles needs to pass before a value read during a memory access can be used.
This series of microprograms shown in FIGS. 12D and 12A contain two significant true dependencies. The first is in the microprogram for jump processing shown in FIG. 12D corresponding to the virtual machine instruction xe2x80x9cAddxe2x80x9d, and exists between the instruction xe2x80x9cLoadxe2x80x9d for reading a jump address and the instruction xe2x80x9cJumpxe2x80x9d for jumping to the address. The second is in the microprogram shown in FIG. 12C corresponding to the virtual machine instruction xe2x80x9cMultxe2x80x9d for multiplication processing and exists between the instruction xe2x80x9cLoadxe2x80x9d for reading a variable from the memory stack and the instruction xe2x80x9cMultxe2x80x9d for multiplication processing.
In the pipeline shown in FIG. 23, the first data dependency is absorbed by the real machine instruction xe2x80x9cIncxe2x80x9d that is inserted between the instructions xe2x80x9cLoadxe2x80x9d and xe2x80x9cJumpxe2x80x9d. The second data dependency is absorbed by the real machine instruction xe2x80x9cDecxe2x80x9d that is inserted between the instructions xe2x80x9cLoadxe2x80x9d and xe2x80x9cMultxe2x80x9d. The processing in this pipeline is only disturbed by the cancellation of one instruction that is necessitated by the execution of the real machine instruction xe2x80x9cJmpxe2x80x9d. As a result, the entire procedure is completed in 11 cycle clocks.
In the pipeline shown in FIG. 24, the first and second data dependencies are not absorbed. As a result, the processing in these pipelines is disturbed at three points. The first disturbance is the hazard in the fourth clock cycle caused by the first data dependency, the second is the cancellation of five instructions necessitated by the execution of real machine instruction xe2x80x9cJmpxe2x80x9d, and the third is the hazard in the eighth clock cycle caused by the second data dependency. As was the case with FIG. 24, the entire procedure is completed in 11 clock cycles in FIG. 23.
As in FIG. 24, the above first and second data dependencies are not absorbed in the pipeline shown in FIG. 25, so that the processing in this pipeline is disturbed at three points. The first disturbance is the hazard in the fifth clock cycle caused by the first data dependency, the second is the cancellation of one instruction necessitated by the execution of the real machine instruction xe2x80x9cJmpxe2x80x9d, and the third is the hazard in the tenth clock cycle caused by the second data dependency. The entire procedure is completed in 13 clock cycles.
As in FIG. 24, the above first and second data dependencies are not absorbed in the pipeline shown in FIG. 26, so that the processing is disturbed at three points. The first disturbance is the hazards caused in the fourth and fifth clock cycles by the first data dependency, the second is the cancellation of seven instructions necessitated by the execution of the real machine instruction xe2x80x9cJmpxe2x80x9d, and the third is the hazards caused in the eighth and tenth clock cycles by the second data dependency. As in FIG. 25, the entire procedure is completed in 13 clock cycles.
Considering that the processing shown in either of FIGS. 23 and 24 requires 11 clock cycles and that the processing shown in either of FIGS. 25 and 26 requires 13 clock cycles, it is clear that there is no difference in execution time between a standard machine and a superscalar machine for this first conventional technique. This means that no advantage is gained from using a superscalar machine capable of parallel processing.
In this way, this first conventional technique causes a large drop in the processing efficiency of a superscalar machine. Another drawback is the lack of provisions for exception handling, such as for errors, or interrupt handling, which is required for debugging.
As a result, a virtual machine that uses this first conventional technique needs to detect an interrupt state and to perform interrupt handling every time the machine executes a virtual machine instruction. This means that another memory access (i.e., data transfer of a variable in the memory that indicates an interrupt state into a register) is required every time a virtual machine instruction is executed. This cancels out the advantage of this first conventional technique, wherein assigning the TOS variable to a register reduces the number of memory accesses, so that the overall execution speed is not improved.
Problems with the Second Conventional Technique
The second conventional technique, which is to say the use of native coding, has a problem in that it is difficult to provide common virtual machine programs to real machines with different architectures. This is because part of the virtual machine program is written in real machine instructions for a specific type of real machine. As a result, when a virtual machine program is to be provided on a network for common use by five types of computers with different real-machine architectures, it becomes necessary to provide real machine programs of all five real machines.
Since there are also differences in system configuration between computers, there is no guarantee that real machine instructions will have a faster execution speed than virtual machine instructions, even for real machines with the same architecture. As one example, if programs are written for RISC (Reduced Instruction Set Computers) type real machines where code size is generally large, the use of insufficient memory will lead to frequent page swapping between main and virtual memory when virtual machine instructions are replaced with real machine instructions. This reduces the overall execution speed.
Problems with the Third Conventional Technique
The third conventional technique, which uses a JIT compiler, has a problem in that the compiling of the virtual machine program can take a long time. The reasons for this are explained below.
A first reason is that the processing must satisfy the specific restrictions of the target real machine concerning jump destinations. As one example, when the target machine has a restriction that the address of a jump destination must be within word (basic word length) boundaries in the main memory, simple conversion of the virtual machine instructions to corresponding real machine instructions will result in a violation of this restriction.
FIG. 27 is a program list for a sample virtual machine program for explaining this first reason. FIG. 28 is a flowchart for this sample virtual machine program.
The present virtual machine program calculates the total of ten integers from zero to nine. It is composed of a setting of initial values (step 7002, Addresses 0xcx9c6), judgment of the end of calculation (step 7003, Addresses 8xcx9c13), addition and setting of the next value to be added (step 7004, Addresses 15xcx9c29), and end processing (step 7005, Address 31).
FIG. 29 is a conversion table that is used when compiling this virtual machine program according to this third conventional technique. This conversion table is a correspondence table that associates virtual machine instructions with the real machine programs into which they are to be converted. Note that for reference purposes, the conversion table in FIG. 29 also shows the code size of each real machine program.
FIG. 30 shows the code arrangement of the real machine program that is obtained when the sample virtual machine program shown in FIG. 27 is compiled using the conversion table shown in FIG. 29. In FIG. 30, relative addresses in original virtual machine program are given for each real machine program to show the correspondence between the real machine program and the virtual machine program.
If the target real machine has a restriction whereby only jump destinations complying with a two-word alignment can be indicated, it can be seen from FIG. 30 that the virtual machine instruction xe2x80x9cStopxe2x80x9d with address 31 that is the jump destination indicated by the virtual machine instruction xe2x80x9cBrzxe2x80x9d at address 13 is arranged at odd-numbered addresses in the real machine program. Since this address does not correspond to the two-word alignment, this branch instruction violates the restrictions concerning jump destinations. As a result, processing that rectifies this violation needs to be performed.
A second reason for the above problem is that special processing that accompanies branches can be necessary for the target real machine. Some CPUs with RISC architecture, such as CPUs with SPARC (Registered Trademark) architecture produced by SPARC INTERNATIONAL, INC. and CPUs produced by MIPS TECHNOLOGIES, INC., have special rules that are used when executing a number of instructions located after a branch instruction. Specific examples of these rules are the execution of a specific succeeding instruction regardless of whether a branch is performed (called a xe2x80x9cdelayed branchxe2x80x9d) or the execution of a specific succeeding instruction only when a branch is performed (called a xe2x80x9ccanceling branchxe2x80x9d).
When the target real machine is of this type, special processing needs to be performed, such as scheduling that analyzes the instructions and changes their order or the insertion of no operation instructions (such as NOP codes) directly after branch instructions.
Problems with the Fourth Conventional Technique
The fourth conventional technique, which is to say the compression of virtual machine programs in advance, has a problem in that there is no resolving means for dealing with problems that occur due to the execution of branch instructions in the compressed virtual machine program.
FIG. 31A shows a compression table for explaining this problem. This compression table associates variable-length codes 9300a with virtual machine instructions 9300b. FIG. 31B is example code that is obtained by encoding the virtual machine instruction sequence A using the compression table shown in FIG. 31A.
If the example code shown in FIG. 31B is decoded starting from the first bit, the original virtual machine instruction A (xe2x80x9cbabcxe2x80x9d) will be obtained. However, when the execution flow moves to point B in FIG. 31B due to a branch instruction, decoding the code sequence xe2x80x9c0010110xe2x80x9d that starts at point B using the compression table in FIG. 31A gives the mistaken virtual machine instruction xe2x80x9caabcxe2x80x9d.
Problems Common to the Firstxcx9cFourth Conventional Techniques
The firstxcx9cfourth conventional techniques described above have a common problem in that none of them is able to raise the efficiency of cache processing. As a result, the market is still waiting for the realization of a high-speed virtual machine that makes full use of the processing power of real machines and computers that are equipped with a cache memory.
FIG. 32 is a block diagram showing the program counter 6901 and the instruction cache 6902 of a virtual machine. This drawing will be used to explain the problems that can occur for a virtual machine that is equipped with a cache memory.
The instruction cache 6902 is equipped with a cache table 6904 that stores addresses for specifying each cache block in the cache memory, where a cache block is an instruction sequence 6903 composed of the data in ten consecutive addresses. FIG. 33 shows the case where the sample virtual machine program shown in FIG. 27 is stored in the cache memory, with the boundary lines A, B, and C marking the boundaries between the cache blocks. These boundary lines simply divide the virtual machine program into cache blocks of an equal size, as can be seen from the boundary line C that splits the virtual machine instruction xe2x80x9cBr 8xe2x80x9d into the opcode xe2x80x9cBrxe2x80x9d and the operand xe2x80x9c8xe2x80x9d. Accordingly, when dividing a virtual machine program into cache blocks, it is necessary to judge whether any of the virtual machine instructions that changes the value of the program counter 6901 will end up spanning a boundary between cache blocks. This increases the complexity of the processing and results in an actual decrease in the overall execution speed of a virtual machine when a cache is provided.
It would be conceivably possible to devise a method for storing an entire virtual machine program in cache memory or a method for arranging the virtual machine program in the cache based on analysis of the virtual machine program by a JIT compiler. However, the former of these methods uses cache memory inefficiently and has a further problem in that the time required for file transfer in a network environment is greatly increased. The latter method, meanwhile, has a problem in that writing the virtual machine program into cache memory is very time-consuming. Accordingly, both of these methods result in a marked decrease in the overall execution efficiency of a virtual machine.
In view of the above problems, the present invention has an overall aim of providing a virtual machine that executes a virtual machine program at a higher execution speed than a conventional virtual machine, a virtual machine compiler that generates a program for this virtual machine (hereafter, a virtual machine and a virtual machine compiler are together called a virtual machine system), and a JIT compiler. Here, a virtual machine compiler refers to a program that translates a source program written in a high-level language such as C into a virtual machine program.
To achieve the above aim, the invention has the following six specific objects.
The first object is to provide a virtual machine system that can diminish disadvantages caused by true data dependencies so that high execution speed is maintained.
The second object is to provide a high-speed virtual machine system by minimizing the decreases in execution efficiency caused by interrupt handling.
The third object is to provide a virtual machine system with which xe2x80x9cnative codingxe2x80x9d for different real machines can be performed without decreasing overall execution speed, even when the virtual machine is used by real machines with different architectures. Such a virtual machine is highly independent of real machine architectures without decreasing execution speed.
The fourth object is to provide a high-speed virtual machine system that can be used by a real machine with a cache system without decreases in execution efficiency which may result from a virtual machine instruction program being divided into cache blocks or from complicated resolving addresses being performed when using a JIT compiler.
The fifth object is to provide a high-speed virtual machine system that can decompress a compressed virtual machine program correctly even when the compressed program contains branch instructions.
The sixth object is to provide a high-speed JIT compiler that does not need to perform a complex resolving of addresses.
The first object can be achieved by a virtual machine of claim 1.
The virtual machine executes a virtual machine instruction sequence under control of a real machine, the virtual machine comprising: a stack unit for temporarily storing data in a last-in first-out format; an instruction storing unit for storing the virtual machine instruction sequence and a plurality of sets of succeeding instruction information, wherein each virtual machine instruction in the virtual machine instruction sequence is associated with a set of succeeding instruction information that indicates a change in a storage state of the data in the stack unit due to execution of a virtual machine instruction executed after the associated virtual machine instruction; a read unit for reading a virtual machine instruction and an associated set of succeeding instruction information from the instruction storing unit; and a decoding-executing unit for specifying and executing operations corresponding to a combination of the read virtual machine instruction and the read set of succeeding instruction information.
With the above construction, the instruction storing unit stores next instruction information in addition to virtual machine instructions and the decoding-executing unit performs not only operations for the decoded virtual machine instruction but also a stack handling in advance for a virtual machine instruction executed immediately after the decoded virtual machine instruction. Performing appropriate stack handling in advance in machine cycles where pipeline hazards (which occur especially frequently in superscalar machines) would otherwise occur, enables the detrimental effects of true data dependencies to be absorbed and so increases the execution speed of the virtual machine.
Here, the decoding-executing unit may include: a real machine instruction sequence storing unit for storing a plurality of real machine instruction sequences that correspond to all combinations of virtual machine instructions and sets of succeeding instruction information; a specifying unit for specifying a real machine instruction sequence in the real machine instruction sequence storing unit, the real machine instruction sequence corresponding to a combination of the virtual machine instruction and the set of succeeding instruction information read by the read unit; and an executing unit for executing the specified real machine instruction sequence.
In this way, advance stack handling for absorbing data dependencies can be included in the real machine instruction sequence corresponding to a virtual machine instruction.
Here, each set of succeeding instruction information may indicate a change in a number of sets of data in the stack unit due to execution of a virtual machine instruction executed after a virtual machine instruction associated with the set of succeeding instruction information, and at least one real machine instruction sequence stored in the real machine instruction sequence storing unit may contain real machine instructions that perform a stack handling in the stack unit in advance for a virtual machine instruction that is to be executed based on a set of succeeding instruction information associated with a currently executed virtual machine instruction.
With this construction, when a change in a number of stack levels due to execution of a given instruction is canceled out by execution of an instruction executed immediately after the given instruction, needless stack handling can be avoided, which improves the execution speed of the virtual machine.
Here, the real machine instruction sequences stored in the real machine instruction sequence storing unit may be composed with a premise that regions of the stack unit used to store two sets of data to be read first and second are mapped to two registers in the real machine.
The above construction replaces the load and store stack operations that are frequently performed by stack-type virtual machines with read/write operations for the internal registers of the real machine. Such operations are suited for rearrangement as the advance stack handling performed in machine cycles where pipeline hazards would otherwise occur. In this way, execution efficiency of the virtual machine is raised.
Here, the instruction storing unit may include a first storage area for storing the virtual machine instruction sequence and a second storage area for storing the sets of succeeding instruction information, wherein each location that stores a virtual machine instruction in the first storage area may be associated with a location that stores an associated set of succeeding instruction information in the second storage area, and the read unit may read the virtual machine instruction from a location in the first storage area and the associated set of succeeding instruction information from a location in the second storage area, the location in the first storage area being associated with the location in the second storage area.
In this way, a virtual machine instruction sequence and next instruction information are stored separately, which means that a virtual machine instruction sequence of the present virtual machine has the same data format as a conventional virtual machine instruction sequence. Compatibility of instruction data format with a conventional virtual machine is therefore maintained.
Here, the virtual machine instruction sequence stored in the instruction storing unit may be an extended virtual machine instruction sequence that includes extended virtual machine instructions, the extended virtual machine instructions being combinations of virtual machine instructions and associated sets of succeeding instruction information, wherein the read unit may read an extended virtual machine instruction from the instruction storing unit, and wherein the decoding-executing unit may specify and execute operations corresponding to the extended virtual machine instruction.
In this way, since an extended virtual machine instruction is a combination of a virtual machine instruction and next instruction information, next instruction information need not be processed or stored separately. This means that a virtual machine with a similar architecture to a conventional computer can be provided.
The first object can be also achieved by a virtual machine compiler. The compiler generates programs for a virtual machine with a stack architecture that includes a stack, the compiler including: an instruction sequence converting unit for converting a source program into a virtual machine instruction sequence executable by the virtual machine; a succeeding instruction information generating unit for generating sets of succeeding instruction information corresponding to virtual machine instructions in the virtual machine instruction sequence, each set of succeeding instruction information indicating a change in a storage state of data in the stack due to execution of a virtual machine instruction executed immediately after a virtual machine instruction corresponding to the set of succeeding instruction information; and an associating unit for associating each set of generated succeeding instruction information with a corresponding virtual machine instruction and outputting the set of succeeding instruction information and the virtual machine instruction.
In this way, the above virtual machine compiler generates not only virtual machine instructions but also next instruction information which can be used by a virtual machine to absorb true data dependencies. Thus, the present virtual machine compiler can generate programs for a virtual machine whose execution speed is improved by having data dependencies absorbed.
The second object can be achieved by a virtual machine. The virtual machine executes a virtual machine instruction sequence under control of a real machine, the virtual machine including: an instruction storing unit for storing the virtual machine instruction sequence; a read unit for reading a virtual machine instruction in the virtual machine instruction sequence from the instruction storing unit; and a decoding-executing unit for specifying and executing operations corresponding to the virtual machine instruction, wherein the decoding-executing unit includes a branch instruction judging unit for judging if the virtual machine instruction is a branch instruction and an interrupt handling unit for detecting, if the virtual machine instruction is judged to be a branch instruction, whether there is an interrupt request, and, if so, performing a corresponding interrupt handling in addition to executing the branch instruction.
In this way, an interrupt handling is only performed whenever a branch instruction is executed, which is sufficient for most virtual machine programs. This suppresses decreases in execution speed caused by performing interrupt more frequently.
Here, the decoding-executing unit may further include a real machine instruction sequence storing unit for storing real machine instruction sequences corresponding to every virtual machine instruction and real machine instruction sequences for having interrupt handling performed corresponding to each interrupt request and an executing unit for executing a real machine instruction sequence corresponding to the virtual machine instruction read by the read unit, wherein if the virtual machine instruction is judged to be the branch instruction and an interrupt request is detected, the interrupt handling unit has the executing unit execute a real machine instruction sequence for having the corresponding interrupt handling performed and then the real machine instruction sequence corresponding to the branch instruction.
With this construction, an interrupt handling to be additionally performed can be specified by a real machine instruction sequence. This realizes a virtual machine capable of performing an interrupt handling with a simpler architecture.
The second object can be also achieved by a virtual machine. The virtual machine executes a virtual machine instruction sequence under control of a real machine, the virtual machine including: an instruction storing unit for storing the virtual machine instruction sequence; a read unit for reading a virtual machine instruction in the virtual machine instruction sequence from the instruction storing unit; and a decoding-executing unit for specifying and executing operations corresponding to the read virtual machine instruction, wherein the decoding-executing unit includes a block judging unit for judging if the read virtual machine instruction is a virtual machine instruction representative of a block, a block being a predetermined number of virtual machine instructions and an interrupt handling unit for detecting, if the read virtual machine instruction is judged to be the representative virtual machine instruction, whether there is an interrupt request to the virtual machine, and if so, performing a corresponding interrupt handling in addition to executing the representative virtual machine instruction.
In this way, an interrupt handling is performed every time a predetermined number of virtual machine instructions are executed, and a frequency to perform interrupt handling can be controlled by changing this number in advance. This avoids decreases in execution speed caused by performing interrupt handling more frequently.
Here, the decoding-executing unit may include a real machine instruction sequence storing unit for storing a plurality of real machine instruction sequences corresponding to every virtual machine instruction and at least one real machine instruction sequence for having interrupt handling performed in response to an interrupt request and an executing unit for executing a real machine instruction sequence corresponding to the read virtual machine instruction, wherein the block judging unit may judge that the read virtual machine instruction is a virtual machine instruction representative of the block when a number of virtual machine instructions that have been read is equal to a multiple of the predetermined number and wherein if the read virtual machine instruction is judged to be a representative virtual machine instruction and an interrupt request has been detected, the interrupt handling unit may have the executing unit execute a real machine instruction sequence for having the interrupt handling performed and then the real machine instruction sequence corresponding to the representative virtual machine instruction.
With this construction, an interrupt handling to be additionally performed can be specified by a real machine instruction sequence. As a result, a virtual machine that is capable of performing an interrupt handling with a simpler architecture can be achieved.
The third object may be achieved by a virtual machine. The virtual machine executes a virtual machine instruction sequence under control of a real machine, the virtual machine including: a real machine program storing unit for storing a plurality of subprograms composed of real machine instructions; an instruction storing unit that includes a first area for storing the virtual machine instruction sequence and a second area for storing a plurality of pointers to the subprograms in the real machine program storing unit; a read unit for reading a virtual machine instruction in the virtual machine instruction sequence from the first area in the instruction storing unit; and a decoding-executing unit for specifying and executing operations corresponding to the read virtual machine instruction, wherein the decoding-executing unit includes an area judging unit for judging whether the virtual machine instruction is an instruction that transfers control flow to a location in the second area and an address converting-executing unit for executing, if the virtual machine instruction is judged to be an instruction that transfers control flow to a location in the second area, a subprogram indicated by a pointer stored in the location.
With this construction, execution of either a virtual machine function or a real machine function is solely determined by a corresponding location in an area of the memory map in the virtual machine, so a setting of whether a virtual machine function or a real machine function is executed for a function can be easily changed. This makes it possible to use xe2x80x9cnative-codingxe2x80x9d in virtual machine programs for real machines with different architectures.
Here, the first area and the second area in the instruction storing unit may be two adjacent storage areas whose boundary is marked by an address, and the area judging unit may judge, when the read virtual machine instruction is a call instruction for a subprogram, whether the virtual machine instruction is an instruction that transfers control flow, by comparing a call address of the call instruction with the address.
With this construction, control over switches between executing a virtual machine function and a real machine function can be easily achieved by shifting the boundary line between areas in the memory map of the virtual machine. As a result, virtual machines that have improved execution speed and are suited to different real machine environments can be realized.
The fourth object can be achieved by a virtual machine. The virtual machine executes a virtual machine instruction sequence under control of a real machine, the virtual machine including: an instruction storing unit for storing the virtual machine instruction sequence; a read unit for reading a virtual machine instruction in the virtual machine instruction sequence from the instruction storing unit; and a decoding-executing unit for specifying and executing operations corresponding to the read virtual machine instruction, wherein the instruction storing unit is a plurality of instruction blocks that constitute the virtual machine instruction sequence, the instruction blocks corresponding to basic blocks, wherein the instruction blocks each include: an identifier area for storing an identifier that specifies a start position of the instruction block in the instruction storing unit; a non-branch instruction area for storing non-branch instructions belonging to a corresponding basic block; and a branch instruction area for storing at least one branch instruction belonging to the corresponding basic block, wherein each branch instruction stored in the branch instruction area designates a branch destination using an identifier stored in one of the identifier areas, and wherein if the read virtual machine instruction is a branch instruction, the decoding-executing unit has control flow branch to a start position of a non-branch instruction area in an instruction block having an identifier designated by the branch instruction as a branch destination.
With this construction, there is always only one entry point for each instruction block, which is the start of the instruction block. As a result, the address analysis for branch destinations of branch instructions is simplified, and the timing taken by compiling is reduced. Also, by caching instructions in instruction block units, the judgment processing regarding the cache boundaries is simplified, and decreases in execution efficiency that occur when a cache is provided for the virtual machine can be made smaller than in conventional techniques.
Here, the decoding-executing unit may include a program counter composed of (a) an identifier register for storing an identifier of an instruction block to which a virtual machine instruction to be read belongs and (b) an offset counter for storing an offset that indicates a relative storage position of the virtual machine instruction in the instruction block, wherein the read unit may read the virtual machine instruction based on the identifier and the offset in the program counter, wherein the decoding-executing unit may update, if the read virtual machine instruction is the branch instruction, the program counter by writing the identifier designated as the branch destination by the branch instruction into the identifier register and by setting an initial value in the offset counter, and if the read virtual machine instruction is a non-branch instruction, update the program counter by incrementing the offset counter, and the read unit may read a virtual machine instruction to be executed next based on the program counter updated by the decoding-executing unit.
Accordingly, each instruction block is specified only by a value of the identifier segment register, and each relative instruction storage position of a virtual machine instruction by a value of the offset counter. As a result, an address converting technique according to a conventional xe2x80x9csegment methodxe2x80x9d can be used.
Here, the decoding-executing unit may include a real machine instruction sequence storing unit that stores a plurality of real machine instruction sequences that each correspond to a different virtual machine instruction, the instruction blocks in the instruction storing unit each may include a decoded data sequence area for storing a decoded data sequence that specifies real machine instruction sequences in the real machine instruction sequence storing unit, the real machine instruction sequences corresponding to virtual machine instructions stored in the non-branch instruction area and the branch instruction area of the instruction block, wherein if a decoded data sequence is stored in an instruction block where reading is to be performed, the read unit may read a set of decoded data in the decoded data sequence instead of a virtual machine instruction, and if not, the read unit may read the virtual machine instruction and then generate a set of decoded data to specify a real machine instruction sequence in the real machine instruction sequence storing unit that corresponds to the virtual machine instruction, and wherein the decoding-executing unit may read from the real machine instruction sequence storing unit the real machine instruction sequence specified by the set of decoded data that has been either read or generated by the read unit, and executes the real machine instruction sequence.
With this construction of the virtual machine, in addition to the effects achieved in the virtual machine that manages a virtual machine program in units of instruction blocks, a time to decode a virtual machine instruction is shortened for the instruction blocks that already have a decoded data sequence. This is because the decoded data sequence is executed directly instead of virtual machine instructions. As a result, the execution speed of the virtual machine is improved.
Here, the decoded data sequence area in the instruction storing unit may include a flag area for storing a flag that indicates whether the decoded data sequence is stored in the decoded data sequence area, wherein the decoding-executing unit may include a current flag storing unit for storing a flag that is read from a flag area in a branch destination instruction block by the decoding-executing unit when executing a branch instruction, and wherein the read unit may read a set of decoded data or a virtual machine instruction depending on the flag in the current flag storing unit.
For this construction, a flag indicating whether a decoded data sequence exists is provided to each instruction block and read from the instruction block to be held by the virtual machine. As a result, when executing virtual machine instructions in an instruction block that has a decoded data sequence, the virtual machine need not refer to a flag every time it executes one virtual machine instruction.
Here, each instruction block in the instruction storing unit may further include a flag area for storing a flag that indicates whether a decoded data sequence is stored in the decoded data sequence area of the instruction block, and the decoding-executing unit may include a decoded data sequence writing unit for judging, after a branch instruction has been executed, whether the instruction block designated as the branch destination by the branch instruction stores a decoded data sequence by referring to a flag stored in a flag area of the instruction block, and if no decoded data sequence is stored, having a virtual machine instruction sequence in the instruction block read, decoding the read virtual machine instruction sequence to produce a decoded data sequence, and writing the decoded data sequence into a decoded data sequence area in the instruction block.
For this construction, a decoded data sequence is generated when an instruction block is executed for the first time. As a result, when the same instruction block needs to be repeatedly executed as in loop processing, the time required for executing instructions corresponding to the block is reduced from the second execution of the block onwards.
The fifth object can be achieved by a virtual machine. The virtual machine executes a virtual machine instruction sequence under control of a real machine, the virtual machine including: an instruction storing unit for storing a compressed virtual machine instruction sequence to be executed; a read unit for reading a compressed virtual machine instruction in the compressed virtual machine instruction sequence from the instruction storing unit and decompressing the compressed virtual machine instruction to generate a decompressed virtual machine instruction; and a decoding-executing unit for specifying and executing operations corresponding to the decompressed virtual machine instruction, wherein the instruction storing unit is a plurality of instruction blocks containing compressed virtual machine instructions constituting the compressed virtual machine instruction sequence, the instruction blocks corresponding to basic blocks, wherein the instruction blocks each include: an identifier area for storing an identifier that specifies a start position of the instruction block in the instruction storing unit; a non-branch instruction area for storing compressed non-branch instructions belonging to a corresponding basic block; and a branch instruction area for storing at least one compressed branch instruction belonging to the corresponding basic block, wherein each compressed branch instruction stored in a branch instruction area designates a branch destination using an identifier stored in one of the identifier areas, and wherein if the decompressed virtual machine instruction is a branch instruction, the decoding-executing unit has control flow branch to a start position of a non-branch instruction area in an instruction block having an identifier designated by the branch instruction as a branch destination.
For this construction, the compressed virtual machine program is stored in units of the instruction blocks based on basic blocks and is decompressed by the decoding-executing unit. As a result, malfunctions caused when compressed bit sequences are mistakenly decoded starting midway through do not occur to this virtual machine.
Here, each instruction block may include a decompression table area for storing a decompression table for use during decompression of compressed virtual machine instructions in the instruction block, the decompression table containing at least one combination of a compressed virtual machine instruction stored in the instruction block and a corresponding decompressed virtual machine instruction and wherein the read unit may read the compressed virtual machine instruction from the instruction storing unit and decompresses the compressed virtual machine instruction by referring to a decompression table in an instruction block to which the compressed virtual machine instruction belongs to generate the decompressed virtual machine instruction.
With this virtual machine, each instruction block stores a decompression table, and a different decompression table is referred for execution of instructions belonging to each instruction block. Accordingly, the present virtual machine assures that even when each instruction block is compressed in a different format, decompression can be correctly performed.
The sixth object can be achieved by JIT compilers. The JIT compiler is for use with a virtual machine that executes a virtual machine instruction sequence under control of a real machine, the JIT compiler converting parts of the virtual machine instruction sequence into real machine instruction sequences before execution, the JIT compiler including: a block start information receiving unit for receiving an input of block start information for each virtual machine instruction that composes the virtual machine instruction sequence, the block start information showing whether a corresponding virtual machine instruction would correspond to a start of a basic block if the virtual machine instruction sequence were divided into basic blocks; a converting unit for converting virtual machine instructions in the virtual machine instruction sequence into real machine instruction sequences; and an outputting unit for rearranging the real machine instruction sequences produced by the converting unit into basic block units in accordance with the block start information received by the block start information receiving unit. Here, this JIT compiler may further include a branch violation judging unit for judging, when a real machine instruction at a start of a produced real machine instruction sequence corresponds to a virtual machine instruction whose block start information indicates that the virtual machine instruction would be a start of a basic block, whether the real machine instruction is going to be arranged in an address that violates an address alignment restriction of the real machine, wherein if the real machine instruction is going to be arranged in an address that violates the address alignment restriction, the outputting unit may rearrange the real machine instruction sequence so that the real machine instruction is not arranged in the address.
Accordingly, without performing the complicated processing for analyzing branch destinations of branch instructions, the present JIT compiler can produce a real machine instruction program at a higher speed in which branch destinations are arranged at addresses complying with a two-word alignment.
Here, the outputting unit may insert a certain number of no-operation instructions at a start of each basic block, the number being a number of real machine instructions processed during a delay of a delayed branch.
As a result, the above JIT compiler is capable of dealing with delayed branch by inserting no-operation instructions at a start of each basic block without performing a complicated delayed branch analyzing.
As has been described, the present invention improves execution speed of virtual machines and is especially valuable as a technique to promote efficient and high-speed use of shared resources by different types of computers connected on a network environment.