1. Field of the Invention
The field of the invention relates to data processing and in particular to the field of processing platform independent instructions.
2. Description of the Prior Art
In the field of virtual machines and interpreting platform independent languages such as Java bytecode, each bytecode exists in isolation. Thus, when being processed each bytecode is taken one after another by an interpreter and translated such that it can be processed by a host processor.
This means that optimisation of the code to improve performance such as would occur in a compiler does not occur when processing Java bytecode in this way. Each bytecode is simply translated, sent to the host processor with associated data and any resultant data is received back at the virtual machine, prior to the next bytecode being translated. Thus, in situations where a number of operations are to be performed sequentially by, for example, a coprocessor, the resultant data of one operation being used in the next, this all needs to be separately loaded into the coprocessor for each bytecode instruction, processed and any results sent back to the virtual machine.
For example, in order for an interpreter based Java virtual machine to perform a sequence of floating point operations it must process each Java bytecode in turn. For instance, bytecode DADD pops two stack arguments adds them together and pushes the answer back onto the stack. Thus, the sequence DADD, DMUL means that DADD causes the two stack arguments stored on the stack that is associated with the virtual machine to be sent to the floating point unit where the floating point registers D0 and D1 are loaded, the FP instruction is then executed in this coprocessor i.e., FADDD D0, D0, D1. This causes D0 and D1 to be added and the result to be stored in D0. Then D0 is sent back to the virtual machine and pushed back onto the stack. DMUL then causes D0 and D1 to be popped from the stack and loaded into floating point registers D0 and D1 and the FP instruction is run, i.e. FMULD D0, D0, D1 which causes the answer to be stored into D0. Then D0 is pushed back into the stack.
It would be desirable to improve the efficiency of such processing.
In a slightly different field from virtual machines, instructions from a program that require a coprocessor which is not present in the apparatus in which the program is being executed, will often have an emulation routine for emulating the missing coprocessor such that the program can still run even if not as efficiently. In Linux Kernel FASTFPE for example, there is an emulation of a floating point unit. Thus, when a floating point instruction is detected it generates an interrupt and switches to the emulation routine. This originally occurred for each floating point instruction and as handling interrupts is lengthy this had a big impact on performance. This problem was addressed in this machine by looking ahead at the program and if there were several FPE instructions in a row, an interrupt was not generated for each one, but the data was rather left in the emulator registers and the sequence of FPE instructions were processed together and all the data generated was then loaded back to the CPU.