Virtual machines are abstract computers that allow for portability of software applications, typically between different underlying computer architectures. A virtual machine (VM) is generally a complex software product that is implemented upon a particular computer hardware platform and/or operating system. The VM then provides a uniform layer of abstraction between the hardware platform and any compiled software applications that will run thereon. Virtual machines are essential for the portability of certain technologies, including Java. The Java Virtual Machine (JVM) allows compiled Java programs to be run on the JVM, independently of whatever hardware or operating system may be underneath. The JVM is described in further detail in the book “The Java™ Virtual Machine Specification (2nd Edition)” by Tim Lindholm, published by Sun Microsystems, and incorporated herein by reference. Examples of commercially available JVMs include the Sun Java Virtual Machine from Sun Microsystems, Inc., and the JRockit Virtual Machine from BEA Systems, Inc.
A real CPU understands and executes instructions native to that CPU (commonly called native code) in comparison a virtual machine understands and executes virtual machine instructions (commonly called byte code). A virtual machine almost always run on a real CPU executing native code. The core of a virtual machine is normally implemented in a programming language such as C, that is always compiled to native code using an OS/CPU compatible compiler.
A virtual machine can implement different strategies of how to execute the byte codes. If the virtual machine analyzes each byte code separately and does this every time the same byte code is executed, then the the virtual machine is said to be an “interpreter”. If the virtual machine translates the byte code into native code once and then the native code is used every time the same byte code is executed, then the virtual machine is said to be a “just in time compiler” (commonly called a JIT).
Some virtual machines contain both an interpreter and a JIT. In the case of Java Virtual Machines, the Sun Java Virtual Machine will initially use the interpreter when executing Java byte code. When the Sun JVM detects byte code that is executed often (commonly called a hot spot in the program), then it will compile that part of the byte code into native code. By contrast, the JRockit Virtual Machine from BEA will never interpret the Java byte code. It will always compile it to native code before executing it. If JRockit detects a hot spot in the program it will recompile that part of the byte code again, but with more code optimizations.
A Java Virtual Machine always needs to call native code to access operating system resources. Since the core of the Java Virtual Machine is written in a language such as C that can be compiled and linked to the operating system libraries, accessing operating system resources is simply a native function call following the platform calling conventions.
A JIT tries to optimize function calls between Java functions to use the most efficient way of calling on the particular CPU. One way to do this is to use registers as much as possible for arguments to functions. Due to several reasons this is usually not the same as the platform calling convention. For example, a JIT for a language with garbage collecting (like Java) needs to take care which registers contain pointers to live objects, the Java-to-Java calling convention can therefore declare that certain registers always contain object pointers and not temporary results from arithmetic calculations. Such care is not needed for the native calling convention. On a register-starved architecture like the Intel x86 processors, the JIT can also use fewer callee save registers than the platform calling convention and instead use the remaining registers for function arguments.
However, often byte code needs to make use of native libraries. These libraries can be used for a variety of purposes, including: low level graphics (like the implementation of the Standard Widget Library (SWT) from the Eclipse project); database access (native drivers are sometimes required to speed up access to databases); or large amounts of legacy code that cannot be ported.
In the Java language this is solved using the standardized Java Native Interface (JNI). The JNI specifies that the native code should be called using the platform calling convention, and also specifies how the Java arguments are translated to a format that native code can use. JNI is described in further detail in the book “Java™ Native Interface: Programmer's Guide and Specification” by Sheng Liang, which is incorporated herein by reference.
Traditionally, a call from compiled Java code to native code is redirected through a short piece of native code (a stub), and the stubs are generated by a stub generator. The stub performs the translation of the arguments, sets up the call to conform to the platform calling convention and finally calls the native code. Depending on the JIT strategy, some arguments, such as numbers need not be translated. In those cases the overhead introduced by the extra function calls are more noticeable than when several necessary argument translations are part of the call. If the native function that is called is very short, then the overhead of the native call setup can be significant.
TABLE 1Platform/EnvironmentA Java callA native callx86_64/Ptr args: rsi, rdiArgs: rcx, rdx, r8, r9WindowsInte args: rax, rdxStack used for more args.Stack used for more args.Calle save: rbx, rbpx86_64/LinuxSame as aboveArgs: rdi, rsi, rcx, rdx, r8, r9Stack used for more args.x86/WindowsPtr args: esi, ediStack used for all args.& LinuxInte args: eax, edxStack used for more args.Calle save: ebx, ebpia64/WindowsSame as right.Variable sized stack frame on& Linuxregister stack.Args: r32, r33, . . .
It is also a significant amount of work to write the stub generators given the number of common operating systems, for example AIX, Linux, Solaris and Windows NT/2k/XP, and their respective calling conventions. Table 1 shows that Windows NT/2k/XP and Linux use the same calling convention on Intel x86 and ia64 processors. However Windows and Linux use different calling conventions on Intel EM64T compatible processors (x86—64). Other calling conventions include AIX on IBM PowerPC processors, and Solaris on SPARC processors.
Listing 1public class HelloWorld{  native void printHelloWorld(int a);  static { System.loadLibrary(“nativelib”); }  public static void main(String[ ] args) {    HelloWorld h = new HelloWorld( );    h.helloWorld(0x1111);    h.helloWorld(0x2222);    h.helloWorld(0x3333);    h.helloWorld(0x4444);    h.helloWorld(0x5555);    h.helloWorld(0x6666);  }}
Listing 1 demonstrates an example of a Java-program that calls a native library. A class-file, such as that shown in Listing 2, is generated when the program is compiled with javac.
Listing 20:new   #2; //class HelloWorld3:dup4:invokespecial  #3; //Method “<init>”:( )V7:astore_18:aload_19:sipush 436912:invokevirtual#4; //Method printHelloWorld:(I)V15:aload_116:sipush 873819:invokevirtual#4; //Method printHelloWorld:(I)V22:aload_123:sipush 1310726:invokevirtual#4; //Method printHelloWorld:(I)V29:aload_130:sipush 1747633:invokevirtual#4; //Method printHelloWorld:(I)V36:aload_137:sipush 2184540:invokevirtual#4; //Method printHelloWorld:(I)V43:aload_144:sipush 2621447:invokevirtual#4; //Method printHelloWorld:(I)V50:return
Some virtual machines, including versions of the JRockit Virtual Machine from BEA will compile this bytecode using standard compilation techniques. Some of these techniques are described in the books “Advanced Compiler Design and Implementation” by Steven S. Muchnik; “Crafting a Compiler with C” by Charles N. Fischer and Richard J. LeBlanc, Jr.; and “Compilers” by Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman, all of which are incorporated herein by reference. The process is typically to translate the bytecode first into a high level intermediate representation (HIR); then to a medium level intermediate representation (MIR); and then to a low level intermediate level representation (LIR).
Listing 30x100080bc0:push%rbx0x100080bc1:mov$0x140c2c0,%rax0x100080bcb:callq0x1000054100x100080bd0:mov%rsi,%rbx0x100080bd3:callq0x1000807200x100080bd8:mov$0x1111,%eax0x100080bdd:mov%rbx,%rsi0x100080be0:mov(%rbx),%ecx0x100080be3:nop0x100080be4:callq0x10008075e0x100080be9:mov$0x2222,%eax0x100080bee:mov%rbx,%rsi0x100080bf1:mov(%rbx),%ecx0x100080bf4:nop0x100080bf5:callq0x10008075e0x100080bfa:mov$0x3333,%eax0x100080bff:mov%rbx,%rsi0x100080c02:mov(%rbx),%ecx0x100080c05:nop0x100080c06:callq0x10008075e0x100080c0b:mov$0x4444,%eax0x100080c10:mov%rbx,%rsi0x100080c13:mov(%rbx),%ecx0x100080c16:nop0x100080c17:callq0x10008075e0x100080c1c:mov$0x5555,%eax0x100080c21:mov%rbx,%rsi0x100080c24:mov(%rbx),%ecx0x100080c27:nop0x100080c28:callq0x10008075e0x100080c2d:mov%rbx,%rsi0x100080c30:mov$0x6666,%eax0x100080c35:mov(%rbx),%ecx0x100080c38:nop0x100080c39:callq0x10008075e0x100080c3e:pop%rbx0x100080c3f:retq
Listing 3 shows the native machine code generated by versions of JRockit from the example code for the x86—64/Linux platform. The address 0x10008075e is the address of the stub that will call the native c-function. The assembler code in Listing 3 follows the specification in Table 1 for the x86—64/Linux platform: the calling convention for normal Java to Java calls puts the object pointer inside rsi, and the first integer into eax. If a second pointer had been used, it would have been put into rdi, and a second integer would be put into edx. Also according to the Java calling convention, rbx is a callee save register which is why it is restored from the stack before the return.
Listing 4JNIEXPORT void JNICALLJava_HelloWorld_printHelloWorld (JNIEnv *env, jobject obj, jint x){ printf (“Hello World %d\n”, x);}
Listing 4 shows an example of a function that follows the JNI-specification (previously referenced above) and which can therefore be called using the above native call.
A Traditional Native Call Stub
To interface the calls in the compiled Java code in Listing 3 to the c-function in Listing 4 on the x86—64/Linux platform, the stub needs to put a pointer to the JNIEnv in edi, leave the object pointer in esi, and move the integer argument from eax to ecx, before calling Java_HelloWorld_printHelloWorld. The traditional stub generator is tailored for each CPU/OS platform and generates the lowest level of IR (LIR) that is essentially a one-to-one mapping to machine code.
Listing 5Variables, constants and labels:v1 (reg, rsi)v2 (reg, rax)v3 (reg, rsp)c4 (i64, 0x30)v5 [rsp+0x28]v6 (reg, rbx)v7 [rsp+0x20]v8 (reg, rbp)v9 [rsp]v10 (reg, r10)c11 (i64, 0x6ab140)v12 [rsp+0xfffffffffffff000]v13 [rbp+0x8]v14 [rbp+0x1f0]v15 [rsp+0x8]v16 (reg, r12)c17 (i64, 0xf)v18 [rbx]c19 (i64, 0x8)v20 [rbp+0x1e8]v21 (reg, rdi)v22 (reg, rdx)L23 (0x2aaaae1746d8)<--Address of Java_HelloWorld_printHelloWorldv24 [rbp+0x10]L25 (0x100000580)c26 (i64, 0x0)c27 (i64, 0xf4)Parameters:rsi (this) raxblock0: 0x86_subrsp 0x30 ->rsp (i64) 1x86_movrbx ->[rsp+0x28] (i64) 2x86_movrbp ->[rsp+0x20] (i64) 3x86_mov0x6ab140 ->r10 (i64) 4x86_movr10 ->[rsp] (i64) 5x86_movrax ->[rsp+0xfffffffffffff000] (i64) 6lir_thread_vm->rbp (i64) 7x86_movrsp ->[rbp+0x8] (i64) 8x86_mov[rbp+0x1f0] ->rbx (i64) 9x86_movrbx ->[rsp+0x8] (i64) 10x86_movrsp ->r12 (i64) 11x86_movr12 ->r10 (i64) 12x86_andr10 0xf ->r10 (i64) 13x86_subrsp r10 ->rsp (i64) 14x86_pushrax 15x86_testrsi rsi (i64)== (then block3, else block4)block1: 34x86_cmp[rbp+0x8] 0x0 (i64)== (then block7, else block8)block2: 33x86_callL25goto (block1)block3: 16x86_pushrsigoto (block5)block4: 17x86_movrsi ->[rbx] (i64) 18x86_pushrbx 19x86_addrbx 0x8 ->rbx (i64)block5: 20x86_lea[rbp+0x1e8] ->r10 (i64) 21x86_pushr10 22x86_pop->rdi (i64) 23x86_pop->rsi (i64) 24x86_pop->rdx (i64) 25x86_movrbx ->[rbp+0x1f0] (i64) 26lir_clrreg->rbp (i64) 27x86_callL23 28lir_thread_vm->rbp (i64) 29x86_movr12 ->rsp (i64) 30x86_mov[rbp+0x10] ->r10 (i64) 31x86_movr10 ->[rbp+0x8] (i64) 32x86_testr10 r10 (i64)!= (then block2, else block1)block6: 37x86_mov[rsp+0x8] ->r10 (i64) 38x86_movr10 ->[rbp+0x1f0] (i64) 39x86_mov[rsp+0x28] ->rbx (i64) 40x86_mov[rsp+0x20] ->rbp (i64) 41x86_addrsp 0x30 ->rsp (i64) 42x86_retblock7: 35x86_cmp[rbp+0x10] 0x0 (i64)== (then block6, else block8)block8: 36lir_code0xf4goto (block6)
Listing 5 shows the low-level intermediate representation of the generated stub. As can be seen, the traditional stub generator has to produce the correct native instructions for the current CPU/OS platform. Only the branch commands are automatically added by the translation from LIR to native machine code.
Pseudo Code for a Traditional Native Call Stub Generator
The following pseudo code describes a traditional stub generator that is tailored for the x86—64/Linux platform. It makes use of the following types:
pd_addr is a C-storage type for platform dependent addresses.
64 bit long on the EM64T/AMD64 platform.
NativeFrame is a structure which is placed on the stack as part of the native call. It contains:
                oldHandles: is a pointer to the old JNI handles        preserved[PLATFORM_NOOF_PRESERVED_STORAGES]: space for all registers that need to be preserved over the native call.        retAddr: Pointer to the return address in the compiled java code.        debugInfo: is a pointer to the debugInfo,        
The native frame also contains copies of the parameter registers and the return
value register.
Variables
jci: is the JNI call info structure used for bookkeeping during stub generation.
preserved: is a list of all preserved storages for the java calling convention.
np: is the length of the preserved list.
ir: is the tree of the intermediate representation of the stub.
current_block: contains code to setup call.
transitblock: is code necessary for the last transit to native code.
end_block: is code to handle the return from native code.
Subroutines
SetupParametersForCall: Transforms the java parameters to native parameters.
GetFrom: Acquire source var for LIR operation.
PushReference: Generate code that pushes a reference storage on the stack.
PushNativeParameter: Generate code that pushes a primitive type on the stack.
Step (1) The stub generator first sets up the book keeping data structures:
Setup return type of native function call.jci->ret_typeSetup how many storages are needed for the return type, int=1 or long=2.jci->ret_type_storagesSetup ret_frame to reference storage location(s) for the java return type.jci->ret_frame[ ]Setup ret_calle to reference storage location(s) for the native return type.jci->ret_callee[ ]Later align is setup to contain reference to the IR op that performs stackalignment.jci->alignjci->types= CG_MALLOC(env, (2*mpiGetNoofArgs(jci->  mpi)+2)*sizeof(JlcType));jci->pushed = 0;jci->handles = 0;jci->storage = CG_MALLOC(env, (2*mpiGetNoofArgs(jci->  mpi)+2)*sizeof(Storage));jci->jniEnv = env->jniEnv;
Step (2) Initialize three code blocks:
block = CreateIRBlock(ir);end_block = CreateIRBlock(ir);transit_block = CreateIRBlock(ir);block_done = CreateIRBlock(ir);
Step (3) Allocate native frame. The initial part of the stack frame for native function calls is always the same size:
AppendIRToBlock(block, IR_X86_SUB(ir,     IR_X86_SP,     IR_CONSTANT(NATIVEFRAME_SIZE*sizeof(pd_addr)),     IR_X86_SP))
Step (4) Store each preserved register on the stack so they can be restored after the JNI call:
for (i=0; i<np; ++i){  AppendIRToBlock(block, IR_X86_MOV(ir,       irGetStorageVar(ir, preserved[i]),       IR_RELATIVE_ID(X86_SP,         sizeof(void*)*(np−i−1)+offsetof(               struct NativeFrame, preserved))))}
Step (5) To make stack walking easier store a pointer to the native function in the stack frame:
AppendIRToBlock(block, IR_X86_MOV(ir,     IR_CONSTANT(irGetDebufInfo(ir)),     JAVA_TRANSIT_SCRATCH_REG))AppendIRToBlock(block, IR_X86_MOV(ir,     JAVA_TRANSIT_SCRATCH_REG,     IR_RELATIVE_ID(X86_SP, offsetof(               struct NativeFrame, debugInfo))))
Step (6) Add a unconditional write (stackbang) that verifies that it is at least one page of stack left for the native function:
AppendIRToBlock(block, IR_X86_MOV_T(ir, IR_PD_ADDR,     IR_X86_A,     IR_RELATIVE_ID(X86_SP, -(int)(size_t)sysPageSize)));
Step (7) Load the thread local data pointer and store the current stack pointer (the last Java stack frame) in the thread local data:
AppendIRToBlock(block, IR_LIR_THREAD_VM(ir, JAVA_TRANSIT_THREAD_REG));AppendIRToBlock(block, IR_X86_MOV_T(ir, IR_PD_ADDR,      IR_X86_SP,   JAVA_TRANSIT_THREAD_REG_ID(VMT_LAST_JAVA_FRAME_OFFSET)));
Step (8) Save the current JNI handles to the thread local data:
AppendIRToBlock(block, IR_X86_MOV_T(ir, IR_PD_ADDR,      JAVA_TRANSIT_THREAD_REG_ID(VMT_HANDLES_OFFSET),      JAVA2C_HANDLE_REG));AppendIRToBlock(block, IR_X86_MOV_T(ir, IR_PD_ADDR,      JAVA2C_HANDLE_REG,      IR_RELATIVE_ID(X86_SP, offsetof(                   struct NativeFrame, oldHandles))))
Step (9) Store the value of rsp before the parameters and alignment are pushed:
AppendIRToBlock(block, IR_X86_MOV_T(ir, IR_PD_ADDR,     IR_X86_SP,     JAVA2C_SCRATCH_PRESERVED_REG));AppendIRToBlock(block, IR_X86_MOV_T(ir, IR_PD_ADDR,     JAVA2C_SCRATCH_PRESERVED_REG,     JAVA_TRANSIT_SCRATCH_REG));
Step (10) Align stack since the native calling convention X86—64 Linux requires 16 byte aligned stack pointers when entering a native function:
AppendIRToBlock(block, IR_X86_AND(ir, JAVA_TRANSIT_SCRATCH_REG, IR_CONSTANT(STACK_ALIGNMENT−1), JAVA_TRANSIT_SCRATCH_REG))AppendIRToBlock(block, jci->align = IR_X86_SUB(ir, IR_X86_SP, JAVA_TRANSIT_SCRATCH_REG, IR_X86_SP))
Step (11) Place parameters in the correct places, basically push all on stack, and then pop some/all into the correct regs:
SetupParametersForCall(block);
Step (12) Store JNI handle in thread local data if the native function takes such arguments that need handles (usually pointers to objects):
if (jci.handles != 0){  AppendIRToBlock(block, IR_X86_MOV_T(ir, IR_PD_ADDR,      JAVA2C_HANDLE_REG,      JAVA_TRANSIT_THREAD_REG_ID(VMT_HANDLES_OFFSET)))}
Step (13) Patch the alignment of the stack if needed:
if (jci.pushed & 1) {InsertIROpAfter(IR_X86_SUB(ir,         IR_X86_SP,         IR_CONSTANT(sizeof(pd_addr)),         IR_X86_SP),     jci->align);}
Step (14) The stack is setup, so the call can be made to the native function:
AppendIRToBlock(block, IR_X86_CALL(ir, IR_LABEL(ir, code)));
Step (15) Restore the stack pointer, and at the same time skip the stack alignment:
AppendIRToBlock(block, IR_X86_MOV_T(ir, IR_PD_ADDR,       JAVA2C_SCRATCH_PRESERVED_REG,       IR_X86_SP));
Step (16) The thread local variable oldframe is a copy of the last Java frame if the native code has thrown exceptions. Otherwise oldframe will be NULL:
AppendIRToBlock(block, IR_X86_MOV_T(ir, IR_PD_ADDR,  JAVA_TRANSIT_THREAD_REG_ID(VMT_CHECKATTRANSITFRAME_OFFSET),  JAVA_TRANSIT_SCRATCH_REG));
Step (17) Restore the last Java frame:
AppendIRToBlock(block, IR_X86_MOV_T(ir, IR_PD_ADDR,     JAVA_TRANSIT_SCRATCH_REG,     JAVA_TRANSIT_THREAD_REG_ID(VMT_LAST_JAVA_FRAME_OFFSET)));
Step (18) Test if oldframe was zero or not. If oldframe is non-zero, an exception has happened and TransitToJava must be executed, otherwise proceed to the end_block:
AppendIRToBlock(block, IR_X86_TEST(ir,     JAVA_TRANSIT_SCRATCH_REG,     JAVA_TRANSIT_SCRATCH_REG));AppendIRToBlock(block, IRBB_JCC, IR_NE);ConnectIRBlock(ir, block, transit_block);ConnectIRBlock(ir, block, end_block);
Step (19) Create the transit block that executes the TransitToJava function that takes care of thrown JNI exceptions. First store the native return value in the frame so it will survive TransitToJava:
for (i=0; i<jci->ret_type_storages; i++){ from = irGetStorageVar(ir, data.ret_callee[i]);  to = IR_RELATIVE_ID(X86_SP, offsetof(struct NativeFrame, space[i]));  switch (CG_GET_STORAGE_TYPE(data.ret_callee[i]))  {   case STORAGE_TYPE_NORMAL:    AppendIRToBlock(transit_block, IR_X86_MOV_T(ir, IR_PD_ADDR, from,to));    break;   case STORAGE_TYPE_FLOAT:    AppendIRToBlock(transit_block, IR_X86_FSTP_T(ir, jci.ret_type ==JLC_FLOAT ? IR_F32 : IR_F64,          from, to));    break;   case STORAGE_TYPE_XMM:    AppendIRToBlock(transit_block, jci.ret_type == JLC_FLOAT ?            IR_X86_MOVSS(ir, from, to) :            IR_X86_MOVSD(ir, from, to));    break;  }}AppendIRToBlock(block, IR_X86_CALL(ir, IR_LABEL(ir,CI_GET_CODE(cgGetCodeMethodCI(transitToJava_V)))));for (i=0 ; i<data.ret_type_storages ; i++){  from = IR_RELATIVE_ID(X86_SP, offsetof(struct NativeFrame, space[i]));  to   = irGetStorageVar(ir, data.ret_callee[i]);  switch (CG_GET_STORAGE_TYPE(data.ret_callee[i]))  {   case STORAGE_TYPE_NORMAL:      AppendIRToBlock(block, IR_X86_MOV_T(ir, IR_PD_ADDR, from,to));     break;   case STORAGE_TYPE_FLOAT:      AppendIRToBlock(block, IR_X86_FLD_T(ir, data.ret_type ==JLC_FLOAT ? IR_F32 : IR_F64, from, to));     break;   case STORAGE_TYPE_XMM:      AppendIRToBlock(block,       data.ret_type == JLC_FLOAT ? IR_X86_MOVSS(ir, from, to) :IR_X86_MOVSD(ir, from, to));     break;  }}ConnectIRBlock(ir, transit_block, end_block);
Step (20) To translate the return value from the native code to Java code, assume that the variable is in the correct native return position and move it to the correct Java return position with proper sign extension or zero extension. IF the jci->ret_type==JLC_VOID this step can be skipped:
from = irGetStorageVar(ir, jci->ret_callee[0]);to = irGetStorageVar(ir, jci->ret_frame[0]);switch (jci->ret_type){  case JLC_BOOLEAN:  case JLC_BYTE:     AppendIRToBlock(end_block, IR_X86_MOVSX_T(ir, IR_I8, from, to));     break;  case JLC_SHORT:     AppendIRToBlock(end_block, IR_X86_MOVSX_T(ir, IR_I16, from, to));     break;  case JLC_CHAR:     AppendIRToBlock(end_block, IR_X86_MOVZX_T(ir, IR_UI16, from, to));     break;  case JLC_OBJECT:  case JLC_INTERFACE:  case JLC_ARRAY:    AppendIRToBlock(block, IR_X86_MOV_T(ir, IR_REF, from, to));    append_nullcheck(ir, to, end_block, &block_null, &block_non_null);    AppendIRToBlock(block_non_null, IR_X86_MOV_T(ir, IR_REF,              IR_MEM(ir, to, 0, IR_NONE, 0),              to));    ConnectIRBlock(ir, block_non_null, block_null);    *end_block = block_null;    break;  case JLC_FLOAT:  case JLC_DOUBLE:    if (jci->ret_callee[0] == jci->ret_frame[0])    {     // Do nothing since native frame and java frame share     // the same return convention.    }    else    {     IRType type;     //this must be x87->xmm or xmm->x87     tmp = IR_RELATIVE_ID(X86_SP,             offsetof(struct NativeFrame, space[0]));     type = jci->ret_type == JLC_FLOAT ? IR_F32 : IR_F64;     switch (CG_GET_STORAGE_TYPE(jci->ret_callee[0]))     {      case STORAGE_TYPE_FLOAT:        AppendIRToBlock(*block,             IR_X86_FSTP_T(ir, type, from, tmp));        AppendIRToBlock(*block, type == IR_F32 ?               IR_X86_MOVSS(ir, tmp, to) :               IR_X86_MOVSD(ir, tmp, to));      break;      case STORAGE_TYPE_XMM:         AppendIRToBlock(*block, type == IR_F32 ?                IR_X86_MOVSS(ir, from, tmp) :                IR_X86_MOVSD(ir, from, tmp));         AppendIRToBlock(*block, IR_X86_FLD_T(ir, type,               tmp, to));      break;    }   }   break;}
Step (21) Restore the old handle pointer from the native frame on the stack to the thread local data:
AppendIRToBlock(end_block, IR_X86_MOV_T(ir, IR_PD_ADDR,       IR_RELATIVE_ID(X86_SP,        offsetof(struct NativeFrame, oldHandles)),       JAVA_TRANSIT_SCRATCH_REG))AppendIRToBlock(end_block, IR_X86_MOV_T(ir, IR_PD_ADDR,       JAVA_TRANSIT_SCRATCH_REG,       JAVA_TRANSIT_THREAD_REG_ID       (VMT_HANDLES_OFFSET)))
Step (22) Restore the preserved registers:
for (i=0 ; i<np ; i++){  AppendIRToBlock(end_block, IR_X86_MOV_T(ir,    IR_PD_ADDR, IR_RELATIVE_ID(X86_SP,    offsetof(struct NativeFrame, preserved) +    sizeof(void*)*(np−i−1)), irGetStorageVar(ir,    preserved[i])));}
Step (23) Deallocate the native frame from the stack:
AppendIRToBlock(end_block, IR_X86_ADD(ir,     IR_X86_SP,     IR_CONSTANT(NATIVEFRAME_SIZE*sizeof(pd_addr)),     IR_X86_SP));
Step (24) Add the return instruction back to compiled Java code:
AppendIRToBlock(end_block, IR_X86_RET(ir));
The function shown below describes with pseudo code how the parameters are setup to follow the calling convention of the native function. The strategy is to walk through all parameters and push all parameters on the stack that are supposed to be on the stack following the calling convention. Then the parameters that should be in registers are also pushed on the stack but immediately afterwards popped into the correct registers.
SetupParametersForCall(IRBlockP *block){ Storage used[PLATFORM_NOOF_STORAGE_TYPES]; MPIIterS iter; int i, n, pos=0, pos0; n = mpiGetNoofArgs(jci->mpi); memset(used, 0, PLATFORM_NOOF_STORAGE_TYPES*sizeof(int)); pos0 = 0; // Fetch the JNIEnvironment jci->storage[pos0] = platformGetNativeParamStorage(STORAGE_TYPE_NORMAL,                 pos0, used); pos0++; for (mpiGetIterator(&iter, jci->mpi, MPIITER_STORAGES, FALSE);    mpiIteratorHasMore(&iter);    mpiIteratorNext(&iter)) {   jci->storage[pos0] = platformGetNativeParamStorage(    STORAGETYPE_FOR(iter.jlcType), pos0, used);   pos0++; } pos = pos0; for (mpiGetIterator(&iter, jci->mpi, MPIITER_STORAGES, TRUE);    mpiIteratorHasMore(&iter);    mpiIteratorNext(&iter)) {  if (CG_IS_STACK_STORAGE(jci->storage[−−pos]))  {   PushNativeParameter(block, iter.jlcType, iter.storage);  } } pos = pos0; for (mpiGetIterator(&iter, jci->mpi, MPIITER_STORAGES, TRUE);    mpiIteratorHasMore(&iter);    mpiIteratorNext(&iter)) {  if (!CG_IS_STACK_STORAGE(jci->storage[−−pos])) {   PushNativeParameter(block, iter.jlcType, iter.storage);  } } // Load the address of the JNIEnvironment AppendIRToBlock(*block, IR_X86_LEA_T(ir, IR_PD_ADDR, JAVA_TRANSIT_THREAD_REG_ID(VMT_JNI_INTERFACE_OFFSET), JAVA_TRANSIT_SCRATCH_REG)); AppendIRToBlock(block, IR_X86_PUSH(ir, get_from(data, ir, irVarGetStorage(ir, JAVA_TRANSIT_SCRATCH_REG)))); jci->types[jci->pushed++] = JLC_INT; memset(used, 0, PLATFORM_NOOF_STORAGE_TYPES*sizeof(int)); for (i=jci->pushed−1 ; i>=0 ; i−−) {  JlcTypejlctype = jci->types[i];  Storagestorage = platformGetNativeParamStorage(       STORAGETYPE_FOR(jlctype), i, used);  StorageType t = CG_GET_STORAGE_TYPE(storage);  IRVARvar = IR_NONE;  // All stack storages are ensured to be at the end   if (t == STORAGE_TYPE_STACK) {   break;  } else {   if (t == STORAGE_TYPE_XMM) {    if (jlctype == JLC_FLOAT)    {// Defaults to 32 bitAppendIRToBlock(*block,IR_X86_MOVSS(ir, IR_RELATIVE_ID(X86_SP,0),irGetOutParamVar(ir, storage)));    }    else if (jlctype == JLC_DOUBLE)    {// Defaults to 64 bitAppendIRToBlock(*block, IR_X86_MOVSD(ir,IR_RELATIVE_ID(X86_SP,0),irGetOutParamVar(ir, storage)));    }    var = JAVA_TRANSIT_SCRATCH_REG;   }   else if (t == STORAGE_TYPE_NORMAL)   {    var = irGetOutParamVar(ir, storage);   }   AppendIRToBlock(*block, IR_X86_POP(ir, var));   jci->pushed−−;  } }}IRVAR GetFrom(Storage storage){ if (CG_IS_STACK_STORAGE(storage)) {  return IR_RELATIVE_ID(JAVA2C_STACKPARAM_STORAGE,sizeof(pd_addr) * (STACK_EXTRA(data) + CG_GET_STORAGE_INDEX(storage))); } else {  return irGetStorageVar(ir, storage); }}PushReference(IRBlockP *block, Storage storage){ IRBlockP block_null, block_not_null, block_done; IRVAR from; if (CG_IS_STACK_STORAGE(storage)) {  AppendIRToBlock(*block, IR_X86_MOV_T(ir, IR_REF,     GetFrom(data, ir, storage),     JAVA_TRANSIT_SCRATCH_REG));  from = JAVA_TRANSIT_SCRATCH_REG; } else {  from = get_from(data, ir, storage); } append_nullcheck(ir, from, *block, &block_null, &block_not_null); block_done = CreateIRBlock(ir); irBBSetMustNotHaveSafepoint(block_done); AppendIRToBlock(block_null, IR_X86_PUSH(ir, from)); irBBConnect(ir, block_null, block_done); PushReference(data, ir, &block_not_null, storage, FALSE); irBBConnect(ir, block_not_null, block_done); *block = block_done;}PushNativeParameter(IRBlockP *block, JlcType type, Storage storage){ if (JLCTYPE_IS_PRIMITIVE(type)) {   AppendIRToBlock(*block, IR_X86_PUSH(ir, GetFrom(data, ir, storage))); } else {   PushReference(block, storage); } jci->types[jci->pushed++] = type;}
Listing 6 shows the final native machine code for the stub. Each machine code instruction maps to one LIR operation in Listing 5, except for the branch instructions, the frame setup and exit that have been added by the compiler.
Listing 6-- Entering code with object pointer in esi and integer argument in eax.0x80c70:sub$0x30,%rsp0x80c74:mov%rbx,0x28(%rsp)0x80c79:mov%rbp,0x20(%rsp)0x80c7e:mov$0x6ab140,%r100x80c88:mov%r10,(%rsp)0x80c8c:mov%rax,0xfffffffffffff000(%rsp)0x80c94:mov%fs:0xd8,%rbp0x80c9d:mov%rsp,0x8(%rbp)0x80ca1:mov0x1f0(%rbp),%rbx0x80ca8:mov%rbx,0x8(%rsp)0x80cad:mov%rsp,%r120x80cb0:mov%r12,%r100x80cb3:and$0xf,%r100x80cb7:sub%r10,%rsp-- Push the integer argument, to be tranferred into rdx.0x80cba:push%rax0x80cbb:test%rsi,%rsi0x80cbe:jne0x80cc3-- Push the object pointer, to be popped into the same register.0x80cc0:push%rsi0x80cc1:jmp0x80ccc0x80cc3:mov%rsi,(%rbx)0x80cc7:push%rbx0x80cc8:add$0x8,%rbx0x80ccc:lea0x1e8(%rbp),%r10-- Push the calculated address to the JNIEnvironment.0x80cd3:push%r10-- Pop the JNIEnvironment into rdi0x80cd5:pop%rdi-- Pop the rsi into rsi.0x80cd6:pop%rsi-- Pop the integer argument into rdx.0x80cd7:pop%rdx0x80cd8:mov%rbx,0x1f0(%rbp)0x80cdf:xor%rbp,%rbp-- Call the native function using ip relative addressing.0x80ce2:callq*74(%rip)    # 0x100080d320x80ce8:mov%fs:0xd8,%rbp0x80cf1:mov%r12,%rsp0x80cf4:mov0x10(%rbp),%r100x80cf8:mov%r10,0x8(%rbp)0x80cfc:test %r10,%r100x80cff:je0x80d060x80d01:callq0x005800x80d06:cmpq$0x0,0x8(%rbp)0x80d0b:jne0x80d2f0x80d0d:cmpq$0x0,0x10(%rbp)0x80d12:jne0x80d2f0x80d14:mov0x8(%rsp),%r100x80d19:mov%r10,0x1f0(%rbp)0x80d20:mov0x28(%rsp),%rbx0x80d25:mov0x20(%rsp),%rbp0x80d2a:add$0x30,%rsp0x80d2e:retq0x80d2f:hlt0x80d30:jmp0x80d14
As can be seen from the pseudo code above, the traditional stub generator uses low-level knowledge of the current CPU/OS platform to generate a proper stub for calling native code from compiled byte code. The programmer encodes this knowledge manually. Since the traditional stub generator generates low-level native machine code, no compiler optimization techniques can be applied to the generated stub. Therefore the current techniques used for translating calling conventions in VMs are unsatisfactory due to non-optimal performance for short native functions, and because of the large amount of manual work needed to both add new calling conventions and to maintain the existing set of calling conventions of a large range of CPU/OS combinations.