The present invention relates to a digital computer and a digital processor and in particular a pipelining unit or program code translation unit used twin and also to a method performed by a digital processor, in particular the procedure for decoding instructions to be executed by the processor, and to a method and device for loading instructions to the processor.
When designing a modern, fast, central processor unit (CPU), one important technique used is pipelining allowing a fast execution of instructions by the processor unit. In pipelining the execution of an instruction can overlap the execution of instructions following after the considered instruction. Such a processor has a pipelining unit or execution pipeline in which an instruction is completed in several substeps. Each substep is connected to the next substep, thereby forming a xe2x80x9cpipexe2x80x9d in which instructions enter in one end, are processed in the substeps or stages, and exit at the other end. The implementation of an execution pipeline therefore makes it possible to execute portions of several instructions at the same time, in different substeps of the pipeline.
Furthermore, modern processors tend to be optimized for execution of simple instructions having fixed lengths and two or three operands, so called Reduced Instruction Set Computer (RISC)-type instructions. As conventional, an instruction always contains an operation code or op-code and the operands are fixed data, memory addresses for e.g. jumps, other parameters for accessing a memory, etc.
When running an instruction designed for a conventional complex instruction set computer (CISC) in such a modern processor, which is optimized for RISC-style instructions, i.e. for instructions which all can be executed in a time period having a fixed length, the execution in the pipeline must be provided with a number of additional sub-steps in order to adapt the CISC-type instruction for execution in the processor designed for RISC-style instructions.
Conventionally, digital processors execute the instructions in definite clock cycles, as defined by clocking signals or clock pulses and having a suitable length, the clock cycles being provided by an oscillator. Usually, during one clock cycle, there is time for one memory access or for one arithmetic/logic operation. Also the processing made in the pipelining units of the processor is determined by the clock cycles (or clock pulses). In order to avoid some of the clock cycles associated with the additional substeps, which are required for adapting CISC-type instructions to be executed in a processor designed for RISC-style instructions, it would be necessary to have, at an early stage of the decoding an instruction, knowledge of the length of the instruction and also whether or not the instruction involves a jump parameter or memory access parameters, thus before the operation code of the instruction is decoded and the actual or detailed operation to be executed is determined.
In order to run instructions at a very high speed in an execution pipeline, the common solution today is to execute the decoding of instructions in several steps in the execution pipeline. As an optimisation, some of the decoding of instructions can be carried out in advance and a decoded instruction is then stored in an instruction cache memory. Such an approach is usually called pre-decoding. The result of the pre-decoding comprises some additional information stored in an instruction cache memory, so called pre-decoded bits. One major drawback associated with the use of pre-decode bits is that, since more bits are used, the size of the instruction cache memory must be increased. Also, it is common to perform, during the same clock cycle, i.e. nearly simultaneously, the decoding of more than one instruction in an execution pipeline.
In a computer designed for special tasks such as a computer controlling or actually being the main part of a large telephone switch, a very high quantity of program code may be used and various program modules used by the computer can have been developed during a rather extended time period. Each program module can be designed for a special purpose or performing specialized tasks and have a high degree of complexity. The modules can have been written in different versions of assembler language and/or processed by different versions of assemblers/compilers generating program code which may slightly differ from one program module to another, the generated code e.g. being adapted to be executed at high speed by the processor used at the time when the original program module was developed. Then there is naturally a desire of reusing older versions without having to develop new program code, where the reused code still can be processed at high speed by newer processors. A special requirement may then be that each modified instruction should have the same length as the original one, since the older program code has been designed to fit into the memory and into addressing system used in the computer and that the addressing systems should be the same, even if fields located inside an instruction and containing operation codes, memory references, addresses and similar items can be relocated inside the instruction. Thus, the addresses used in memory references in the instructions should not be changed and the address of each instruction should not be changed.
In the published International patent application WO 97/24659 a method for fast decoding special instructions is disclosed. The length of program instructions is found in a complex manner requiring that a sufficient number of bytes are first received, then that the bytes are detected and removed which contribute no length information and finally that the remaining instruction bytes are decoded. In David R. Ditzel and Alan D. Berenbaum, xe2x80x9cThe Hardware Architecture of the Crisp Microprocessorxe2x80x9d, 14th Annual International Symposium on Computer Architecture, Jun. 2-5, 1987, Pittsburgh, pp. 309-319, an instruction format allowing a fast decoding of instruction length is disclosed. In the first two bits of each instruction the length of the instruction is encoded, the length varying between tee different values.
In the published European patent application 0 475 919 a set of instructions for a digital computer is disclosed. In the instructions a flag in the eighth bit indicates that the contents of a particular register should be entered in the fetch queue. In U.S. Pat. No. 4,791,559 an instruction flow control system is disclosed in which a remapping of instructions is made.
It is an object of the present invention to overcome the problems as outlined above and in particular to reduce the number of clock cycles used for instruction decoding to during program execution in a digital processor without the need for an increased instruction cache memory or the need for making the decoding in several steps in a pipelining unit for the processor.
One problem which the invention tries to solve is how to make the decoding of instructions to be executed by a processor and how to configure the instructions so that various steps of decoding the instructions are made as fast and as little complicated as possible.
The object as outlined above and others are obtained by a method and a device for improving the execution of instructions in processors designed for RISC-style instructions, without a need for a pre-decoding step and an increased size of the instruction cache memory. The method and the device also improves run time performance compared to prior known solutions.
Thus, when loading the instructions into the program memory from an external program memory, such as a tape, the instructions are modified. This is performed by means of a recoding procedure, according to which the instructions are changed or remapped.
The object of the remapping is to add, without increasing the number of bits in the instruction, i.e. the instruction length, information not present in the operation code field of the instruction, but which is useful when running the instructions in a processor designed for fixed-length instructions. The reason for this is that fewer sub-steps need to be executed in the pipeline, since fewer additional steps are required for the instruction decoding in the execution pipeline, or equivalently that simpler and faster circuits can be used in the execution pipeline.
The code remapping of the instruction comprises two different parts:
remapping the operation code in order to make the instruction carry more information, in particular more direct information that can be more easily decoded. Such information can comprise: instruction length, information on whether or not the instruction involves a jump or memory access, i.e. whether the instruction contains memory references or addresses to a memory, the existence of certain operands or parameters in the instruction, the length of such operands or parameters, etc.
relocation of parameters or operands, so that a parameter or operand, which is directly referred to by a bit in the remapped operation code, is located in a standard or predetermined position in the instruction, this position possibly being dependent on the length of the instruction.
Hence, if possible, the original operation code field, for example consisting of at least four and up to twelve bits in typical processors as considered herein, in which field the instruction is specified, is remapped or transformed in order to make the instruction code field carry information which can be directly decoded.
The possibility to read the length of an instruction directly from the operation code makes it possible to tanslate two instructions during the same clock cycle.
Efficient execution of program code in a processor is thus achieved, also for program code of previous or older types, intended for former versions of processors. The program code for former versions is converted, either in a separate code processing step or by the processor itself, when the old program is loaded into a primary memory used by the processor. In the conversion each instruction is modified so that it will be better adapted to the operation of the processor. Operation codes, parameters and constants can be displaced or relocated inside each instruction and the operation codes can be changed so that they will also contain some additional information or generally, information that allows a more rapid decoding, the modification always being made preserving the length of the instruction.
The operation codes can then be designed to include a simple, direct indication of the length of an instruction and also direct indications whether the instruction contains particular parameters. Such indications then always have definite places in the operation code so that they can be easily accessed by the processor in an early stage of the execution of an instruction. The definite places are preferably counted from that position of the instruction which is first read from the memory, when executing the instructions. That position can be called the start position or start bit of the instruction. When illustrating instructions in drawings, that position is generally identical to the position of the least significant bit of the instruction.
The direct indications in the instructions are made by setting or resetting predetermined bits in the instructions. In particular, the length indication can be made by setting a bit in the instruction, the place of the bit indicating the length of the instruction, the place being counted from a predetermined position in the instruction such as the start of the instruction. Then all the bits between the bit indicating the length and the predetermined position are reset or equivalently worded all bits as counted from the predetermined position in a predetermined direction up to the set bit indicating the length are reset.
The part of the processor making the decoding or translation of instructions will then have to be modified accordingly, to be adapted to the new instruction format. The modified instructions will then allow a very rapid and easy decoding, using a minimum number of gates in the decoding part.