While Java has grown in popularity in recent years, many of its critics stand fast on their claim that Java is prohibitively slow to execute in typical software based Java Virtual Machines (JVM), despite exhaustive efforts to optimize the latter. Although Just-in-Time (JIT) compilation technologies provide some benefits, the so-called code-bloat side effect rules out the use of this technology in the embedded systems space.
The most promising approach for increasing the performance of Java execution has been through the use of Java native processors, designed with the Java Virtual Machine (JVM) instruction set as its processor instruction set. While several Java native processor designs have been disclosed thus far, all have been locked-into traditional processor design paradigms, failing to focus on the specific nature of the JVM.
One of the most commonly executed operations in any implementation of the Java Virtual Machine (JVM) is the series of pointer resolutions required for the execution of opcodes that are used to interact with underlying JVM data structures. In general, this type of opcode requires some pointer arithmetic in order to extract from the JVM data structures all the arguments needed for the completion of its execution. The importance of this pointer arithmetic is highlighted by the fact that these type of opcodes occur with a very high frequency in a typical Java trace, and also by the fact that the same pointer arithmetic is invoked several times within the same opcode execution. In particular, so-called invoke instructions, provided to invoke methods in Java, involve several such calculations. It is known that invoke instructions consume 20-40% of execution, thus improving the execution of invoke instructions would yield a substantial overall speed improvement.
In regards to the above, in order to improve the performance of a hardware implementation of a JVM, or any other object-oriented based processing system, it is desirable to accelerate the execution of the pointer arithmetic, and increase the level of concurrency with other operations of the JVM.
Previous attempts at implementing pointer arithmetic in address calculation missed the importance of these types of operations, and therefore adopted under-performing approaches to the issue. In particular, pointer arithmetic has previously been broken up into atomic operations that a standard ALU can perform. Drawbacks with this approach include:                Each atomic operation requires one clock cycle to complete, in addition to the normal instruction cycle;        Intermediate results must be saved temporarily in a register; and        The pointer arithmetic will tie up the ALU until the final address is calculated.        