1. Field of the Invention
Embodiments of the invention relate generally to information processing systems. More specifically, embodiments of the invention provide a system and a method for improving the performance of compiled Java code.
2. Description of the Related Art
Java is an object oriented programming language and environment that has gained wide acceptance in recent years. One aspect of Java is its portability, which has contributed to its popularity with developers of software applications. Java's approach to portability is to compile Java language code into Java bytecode, which is analogous to machine code, but is instead interpreted by a Java virtual machine (JVM) written specifically for the host computing platform. As a result, software applications written in Java can be written once, compiled once, and then run on any combination of hardware and operating system that supports a JVM. However, interpreted programs typically run slower than programs that are compiled into native executables due to the processing overhead associated with interpreting bytecode. One approach to this issue is the implementation of a just-in-time (JIT) compiler that translates Java bytecode into native code the first time the code is executed and then caches the native code in memory. This results in a program that starts and executes faster than pure interpreted code, at the cost of introducing compilation overhead during its initial execution. In addition, JIT compilers are often able to reorder bytecode and recompile for improved performance.
Some JIT compilers are able to optimize the resulting native code to the targeted central processing unit (CPU) and the underlying operating system for the Java application. As an example, a JIT compiler may select Streaming SIMD (Single Instruction, Multiple Data) Extensions 2 (SSE2) CPU instructions when it detects that they are supported by the CPU. Conversely, a static compiler would need to write two versions of the native code, possibly using in-line assembly. In addition, JIT compilers typically collect performance statistics and are able to rearrange the bytecode for recompilation to improve performance.
These approaches are facilitated by Instruction Set Architectures (ISAs) that abstract underlying physical processor architectures into a common instruction set. For example, the AMD Athlon and Intel Pentium implement nearly identical versions of the x86 instruction set, yet their internal designs are significantly different. As a result, while the native code generated for a given ISA may execute properly, it may not be fully optimized for the target processor. Other performance considerations include the JIT compiler's approach to generating native code for an ISA. These may include the implementation of Instruction Based Sampling (IBS), vectorization, and Lightweight Profiling (LWP). Each of these may have attendant affects, negative or positive, on performance. Furthermore, it is now common to use multiple processors in a system, yet the native code generated for the ISA may not fully utilize their respective capabilities or even use them at all. As an example, a system may comprise a multi-processor CPU, dedicated processors for processing graphics or video streams, or even a dedicated Java code processor. In view of the foregoing, there is a need for a holistic approach to determining the best performing native code for a given system and not simply for its associated ISA.