The invention relates to a method of executing a threaded interpreter for interpreting a program comprising a series of program instructions, the method comprising for the execution of each program instruction: a plurality of preparatory steps making the program instruction available in the threaded interpreter, and an execution step emulating the program instruction.
The invention also relates to a system for executing a threaded interpreter interpreting a program comprising a series of program instructions, the system comprising: a memory for storing the series of program instructions, and the threaded interpreter comprising a preparatory unit for executing a plurality of preparatory steps making a particular program instruction available in the threaded interpreter, and an execution unit for emulating the particular program instruction.
The invention also relates to a data carrier comprising a threaded interpreter for interpreting a program comprising a series of program instructions, the threaded interpreter comprising: a preparatory unit for executing a plurality of preparatory steps making a particular program instruction available in the threaded interpreter, and an execution unit for emulating the particular program instruction.
The invention also relates to a system for generating an executable interpreter for interpreting a program comprising a series of program instructions, the system comprising a compiler for translating the interpreter from a source code into machine instructions, the interpreter in the source code comprising: a-preparatory unit for executing at least one preparatory step making one of the program instructions available in the interpreter, and an execution unit with an emulation code for emulating one of the program instructions.
The invention also relates to a data carrier comprising a compiler for generating an executable interpreter for interpreting a program comprising a series of program instructions, the compiler being arranged to translate the interpreter from a source code into executable machine instructions, the interpreter in the source code comprising: a preparatory unit for executing at least one preparatory step making one of the program instructions available in the interpreter, and an execution unit with emulation code for emulating one of the program instructions.
It is known to execute a program by means of an interpreter. Interpretation is a program execution technique where, as opposed to the execution techniques using a compiler, the program is not translated in advance into a form suitable for direct execution by a specific processor. The program to be executed is described in a standard form which is not dedicated to a specific processor. An interpreter, being a program specific for the processor at hand, reads a program instruction of the program to be executed and analyses this program instruction. Subsequently, the interpreter determines what actions must be taken and has these actions executed by the processor. Reading a program instruction and execution of the corresponding machine instructions are carried out in an alternating fashion, without storing the translated program instructions in an intermediate format. A program instruction has an operation code that indicates the type of operation to be carried out, e.g. an add operation. Furthermore, a program instruction may have one or immediate arguments following the operation; they are operands for the operation. Suitable examples of a standard form in which the program to be interpreted can be described are the Java byte code and the P-code into which a Pascal program is translated.
Program execution on the basis of interpretation of the program to be executed is slower than on the basis of a compiled program. In the latter case, the program is translated in advance and stored in the form of machine instructions directly executable by the processor. In case of interpretation, at least the final phase of the translation is done at runtime by the interpreter running on the processor and using resources and time of the processor. This makes the execution of a program on the basis of an interpreter slower. The article xe2x80x98Interpretation Techniquesxe2x80x99, Paul Klint, Softwarexe2x80x94Practice and Experience, Vol. 11, pages 963-973, September 1981, describes a so-called threaded interpreter, which is a relatively fast interpreter that does not require techniques which are costly in respect of memory. A threaded interpreter contains a block of machine instructions for each of the program instructions to be interpreted and executed. Such a block contains the following elements:
emulation code for the program instruction, i.e. one or more machine instructions to be executed by the processor for realizing the purpose of the program instruction;
a fetch instruction for fetching the next program instruction to be executed;
a decode instruction for decoding that program instruction after it has been fetched;
a jump to the block of that program instruction.
The threaded interpreter can be seen as several of these blocks in parallel. The threaded interpreter has a block for each kind of program instruction that has to be interpreted, e.g. 256 blocks when 256 different program instructions are supported. After the execution of a certain block, a jump is made to the block implementing the next program instruction to be executed. Then this block is executed and again a jump is made to the block of the then next program instruction and so on.
It is an object of the invention to provide a method of the kind set forth which is comparatively faster than the known method. This object is achieved according to the invention in a method which is characterized in that during the execution of the interpreter on an instruction-level parallel processor machine instructions implementing a first one of the preparatory steps are executed in parallel with machine instructions implementing a second one of the preparatory steps for respective ones of the series of program instructions. Executing the machine instructions for two of the preparatory steps in parallel, each step being executed for its own program instruction, makes that at least two different program instructions are being executed simultaneously. This significantly improves the speed of program execution, because it is no longer necessary to execute all required machine instructions in a single and hence longer sequence.
Parallel processing of instructions is known per se. It is described, for example, in the article xe2x80x98Instruction-Level Parallel Processing: History, Overview, and Perspectivexe2x80x99, B. Ramakrishna Rau and Joseph A. Fisher, The Journal of Supercomputing, 7, pages 9-50, May 1993. In particular page 19 of that article describes instruction-level parallel processing on a VLIW (Very Long Instruction Word) processor. Such a processor has a number of slots and an instruction may be placed in each slot. The instructions together form the so-called very long instruction word, which is executed by the processor as a single (very long) instruction. This results in the parallel processing of the individual instructions placed in the respective slots. It is the task of the compiler to identify which of the instructions are independent from each other and may be carried out in parallel. These instructions are thus candidates to be placed together in respective slots. An important aspect of this task of the compiler is the identification of loops in the execution of the program instructions and to move program instructions inside the loop. The purpose is to identify which of the instructions is independent from the others and is, therefore, a candidate to be executed in parallel with the others.
The textbook xe2x80x98Compiler: Principles, Techniques and Toolsxe2x80x99, Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman, Addison-Wesley Series in Computer Science, Addison-Wesley Publishing Company, Reading, Mass., 1985, describes on pages 602 to 608 how loops in a program code are to be treated for program code optimization by the compiler. To enable optimization by the compiler, there should be no jump into the middle of a loop from the outside. The only entry into a loop is then via its header. According to the textbook, the control flow edges of a loop can be partitioned into back edges and forward edges. A back edge has the property of pointing to an entry block of the loop and the forward edges are the remaining edges. A loop can be optimized if its forward edges form an acyclic graph, i.e. a graph with no further loops. The structure of a threaded interpreter can thus be seen as a control flow graph comprising a complex arrangement of loops. Through each block, a loop may pass and after that block the loop may continue at each of the blocks, after which it may continue again at each of the blocks and so on. All control flow edges are forward edges and do not form an acyclic graph. Therefore, this control flow graph of the interpreter can not be optimised by the known software pipeline algorithms disclosed in the textbook. Despite this teaching, the inventors have found that some of the preparatory steps of a threaded interpreter can be executed in parallel as described above.
An embodiment of the method according to the invention is defined in claim 1. In this embodiment, the machine instructions implementing the steps for interpreting the series of program instructions are executed in a three-stage pipeline. This means that three program instructions are interpreted in parallel; this significantly reduces the time needed to interpret and execute the program
An embodiment of the method according to the invention is defined in claim 1. A byte code format is very suitable for describing and storing the program to be interpreted. The byte code format allows for easy retrieval and analysis of the program instruction, resulting in a simpler interpreter.
It is a further object of the invention to provide a system for executing an interpreter of the kind set forth which allows faster execution than the known system. This object is achieved according to the invention by a system for executing a program that is characterized in that the threaded interpreter is arranged to have machine instructions implementing a first one of the preparatory steps executed on an instruction-level parallel processor in parallel with machine instructions implementing a second one of the preparatory steps for respective ones of the series of program instructions. Since the machine instructions implementing two steps in the interpretation of the series of program instructions are carried out in parallel on the instruction-level parallel processor, the execution of the interpreter is faster.
The data carrier comprising the threaded interpreter according to the invention is characterized in that the threaded interpreter is arranged to have machine instructions implementing a first one of the preparatory steps executed on an instruction-level parallel processor in parallel with machine instructions implementing a second one of the preparatory steps for respective ones of the series of program instructions.
It is a further object of the invention to provide a system for generating an interpreter of the kind set forth, which interpreter is suitable for faster execution of a program than the known interpreter. This object is achieved according to the invention by a system for generating an interpreter that is characterized in that the compiler is arranged to generate, for a particular program instruction by means of code duplication in the executable interpreter, a block comprising a translation into machine instructions of the execution unit for this particular program instruction, followed by a translation into machine instructions of the preparatory unit for a successor program instruction immediately succeeding the particular program instruction so as to obtain the executable interpreter in a threaded form. The system generates the executable threaded interpreter from a source code that does not comprise this threaded structure. This allows the source code to be written in the standard programming language ANSI C.
A version of the method according to the invention is defined in claim 3. Since the generated interpreter is arranged to carry out the machine instructions implementing two of the preparatory steps in parallel on an instruction-level parallel processor, two different program instructions are executed simultaneously during the execution of a program by this interpreter. This significantly reduces the time needed to execute the interpreter interpreting the program.
The data carrier comprising the compiler according to the invention is characterized in that the compiler is arranged to generate, for a particular program instruction by means of code duplication in the executable interpreter, a block comprising a translation into machine instructions of the execution unit for this particular program instruction, followed by a translation into machine instructions of the preparatory unit for a successor program instruction immediately succeeding the particular program instruction so as to obtain the executable interpreter in a threaded form.
Further advantageous embodiments of the invention are recited in the dependent Claims.