Software developers typically create conventional software applications by writing software “source code” in a high-level programming language such as C, C++, Java or the like. The software developer can then operate a program called a complier that converts the high-level programming language source code into a machine understandable or machine-readable form called “object code” that the complier creates for a specific target processor architecture. A processor within a computerized device that confirms to the target architecture can “execute” the object code in order to operate the program. As an example, a software developer who creates a software application in the C programming language can use a C compiler designed for a specific processor architecture to convert the C programming language statements (i.e., source code instructions) within the application into machine language instructions that can natively execute as a program on that processor within a computerized device.
Some programming languages are designed to allow a software developer to write application code once and then operate this code on any computerized device that supports that programming language, regardless of the processor or architecture of the computerized device. As an example, a program written in the Java programming language (Java is a registered trademark of Sun Microsystems, Inc. of Palo Alto, Calif., U.S.A.) can operate on any computerized device platform that has or that implements a Java run-time environment known as a Java Virtual Machine (JVM). To run a Java program, a developer first compiles the Java program using a Java compiler (e.g., javac) that produces intermediate instructions called “bytecode”. A user who desires to operate the Java program can transfer the bytecode instructions for that program to any computerized device that runs under the control of any operating system, as long as a “Java Virtual Machine” or JVM exists that can operate in conjunction with that operating system or computerized device to interpret the Java bytecodes. In other words, to accommodate a diversity of operating environments, a Java compiler doesn't generate “machine code” in the sense of native hardware instructions that execute directly in a microprocessor; but rather, the Java compiler generates bytecodes that are a high-level, machine-independent code for a hypothetical machine that is implemented by the Java interpreter and run-time system known as a Java Virtual Machine. The primary benefit of the interpreted bytecode approach is that compiled Java language programs are portable to any system on which a Java Virtual Machine exists.
There has been an ongoing trend in the information technology industry to attempt to execute software programs as rapidly as possible. There are various conventional advancements that provide for increased execution speed of software programs. One technique for increasing execution speed of a program is called parallelism. Parallelism is the practice of executing or performing multiple things at once. Parallelism can be possible on multiple levels, from executing multiple instructions at once, to executing multiple threads at once, to executing multiple programs at once. Instruction Level Parallelism or ILP is parallelism at the lowest level and involves executing multiple instructions at once. Processors that exploit ILP are typically called multiple-issue processors, meaning they can issue multiple instructions in a single clock cycle to the various functional units on the processor chip.
There are different types of conventional multiple-issue processors. One multiple-issue processor is a superscalar processor in which a sequential list of program instructions are dynamically scheduled, and the processor decides which instructions can be executed on the same clock cycle, and sends them out to their respective functional units to be executed. This type of multi-issue processor is called an in-order-issue processor since issuance of instructions is performed in the same sequential order as the program sequence, but issued instructions may complete at different times (e.g., short instructions requiring fewer cycles may complete before longer ones requiring more cycles). Another type of multi-issue processor is called a VLIW (Very Large Instruction Width) processor. A VLIW processor depends on a compiler to do all the work of instruction reordering and the processor executes the instructions that the compiler provides as fast as possible according to the compiler-determined order. Other types of multi-issue processors issue out of order instructions, meaning the instruction issue order is not be the same order as the order of instructions as they appear in the program.
Conventional techniques for executing instructions using ILP often utilized look-ahead techniques to find a larger amount of instructions that can execute in parallel within an instruction window. Looking-ahead often involves determining which instructions might depend upon others during execution for such things as shared variables, shared memory, interference conditions, and the like. When a scheduling portion of the processor detects a group of instructions that do not interfere or depend on each other, the processor can issue execution of these instructions in parallel thus conserving processor cycles and resulting in faster execution of the program.
Conventional computer systems that execute programs written in a programming language such as Java operate a Java Virtual Machine during run-time to interpret or otherwise convert the Java bytecode instructions into native machine language instructions. As an example, to execute a series of Java bytecode instructions, a Java virtual machine can operate a program called a Just-In-Time (JIT) compiler. A JIT compiler is a software layer that compiles or interprets bytecode instructions just before they are executed thus converting the Java bytecode into native machine language code for the processor to natively execute at that moment. Generally then, general purpose computerized devices use either interpretation or Just-In-Time (JIT) compilation to convert the Java bytecodes to native instructions that are then run on conventional processors.
Java developers have also created conventional processors that can execute Java bytecode directly. Such Java bytecode processors or “Java processors” are becoming popular as software application developers create an increasingly large number of complex server and other software applications in Java. Due to the nature of these many of these applications, it is important to achieve very high performance during their execution. The designs of such bytecode processors are mainly based on stack architectures.
One conventional technique that has been used to enhance some JVM implementations in hardware is called “instruction folding”, in which a processor “folds” a set of bytecodes into one instruction. Instruction folding increases the performance of bytecode execution by coalescing a bytecode, for example, which just spends processor cycle time moving data from a stack to the operational units, into another bytecode instruction that does the actual operation on the moved data, rather than executing each bytecode instruction separately.
All conventional Java virtual machine and Java processors utilize a stack-based architecture for execution of Java bytecode. That is, a conventional Java virtual machine and/or a Java processor do not use registers to hold intermediate data values, but rather uses the Java operand stack for storage of all intermediate data values. This approach was taken by Java's designers to keep the Java virtual machine's instruction set compact and to facilitate implementation on architectures with few or irregular general-purpose registers.
During execution of a program containing Java bytecode instructions, the Java virtual machine can recognize different execution threads or paths through the program. During execution of a Java thread, the Java virtual machine provides a Java stack to store the state of execution of bytecode instructions that are interpreted or JIT compiled in that thread. The state of execution can include local variables, bytecode parameters called “operands”, and results of individual bytecode instructions “opcodes” that each correspond to the different processing functions of each bytecode instruction in the Java bytecode instruction set. There is no way for a thread to access or alter the Java stack of another thread. During the execution of each Java bytecode instruction, the Java virtual machine may push and/or pop values onto and off of the stack, thus using the stack as a workspace. Many instructions pop values from the operand stack, operate on them, and push the resultant contents back onto the stack. For example, an “iadd” bytecode instruction adds two integers by popping two integer values off the top of the operand stack, adding them together and pushing the integer result back onto the stack associated with that thread.