1. Field of the Invention
The present invention relates to the techniques for translating instructions that are to operate on different processors. The invention has been developed with particular reference to the possible application to the translation of instructions that can be executed on a processor of the ARM type into instructions that can be executed on a processor of the LX type, such as, for example, the microprocessor ST200-LX produced by STMicroelectronics, Srl, which is the assignee of the present application.
2. Description of the Related Art
An ARM microprocessor is typically a 32-bit pipelined scalar microprocessor, i.e., a microprocessor the internal architecture of which is constituted by different logic stages, each of which contains an instruction in a very specific state. Said state may be one of the following: loading of the instruction itself from the memory; decoding; addressing of a file of registers; execution; or writing/reading data from the memory. The number of bits refers to the width of the data and of the instructions on which the microprocessor operates. The instructions are generated in a specific order by compiling and executed in the same order. An LX microprocessor is typically a microprocessor of the type defined as very-long-instruction-word (VLIW) microprocessor, namely, a 128-bit pipelined VLIW microprocessor. A pipelined superscalar microprocessor possesses an internal architecture made up of different logic stages, some of which are able to execute instructions in parallel, for example in the execution step. Typically, the parallelism is of four instructions of 32 bits each (equal to 128 bits), whilst the data are expressed in 32 bits.
The processor is referred to as superscalar if the instructions are re-ordered dynamically in the execution step so as to supply the execution stages that may potentially work in parallel and if the instructions are not mutually dependent, thus altering the order generated statically by the compiling of the source code.
The processor is referred to as VLIW if, instead, the instructions are re-ordered statically during compiling and executed in the same fixed order, which cannot be modified in the execution step.
For more detailed information regarding the architecture of the microprocessors, reference may be made to the description given in the text: Computer Organization & Design: The hardware/software interface, D. A. Patterson & J. L. Hennessy, Morgan Kaufmann.
The ARM processor is a single-issue RISC machine, provided in any case with a sufficiently extensive set of addressing modes (the data-processing instructions support as many as nine different modes), and affords the possibility of conditional execution of all its instructions on the basis of the flags contained in the status register referred to as CPSR.
The LX processor is a four-issue VLIW processor, which in the sequel of the present description will always be illustrated in the single-cluster version. The LX processor, unlike the ARM processor, has only two addressing modes (from immediate and from register) and does not enable conditioned execution, but given the presence of four lanes operating in parallel, allows execution in parallel of a number of alternatives (with a maximum of 4 instructions) and then selection of the appropriate result once the condition on the execution has been evaluated.
The ARM microprocessor in the version 5, to which reference will be made hereinafter, possesses a 32-bit internal architecture that guarantees a 4-Gbyte address space and has 31 general-purpose registers, of which, however, only 16, designated by the references from R1 to R16, are accessible simultaneously.
There exist, in fact, seven different modes of operation necessary for handling the various types of exceptions to which the processor must respond:
USERnormal execution modeFIQfast interrupt controlIRWgeneric interrupt controlSUPERVISORprivileged mode for the operating systemABORTprotection of access to memory and/orvirtual memoryUNDEFINEDoperating code not defined, for emulationof coprocessorSYSTEMprivileged mode for particular operationsof the operating system.
Two of the 16 accessible registers have a particular role:                the register R15 is used as program counter (PC), i.e., it contains the address of the instruction to be executed;        the register R14 is used as link register (LR); i.e., it contains the address of the instruction to be executed following upon return from execution of a subroutine.        
Furthermore, normally the register R13 is used by the software as stack pointer.
Two or more of the general-purpose registers are replicated for the various modes of operation in order to speed up handling of exceptions.
In the IRQ, Abort, Undefined and Supervisor modes, as compared to the User mode, only the registers R13 and R14 (i.e., link register and stack pointer) are replicated.
In the FIQ mode, to make the handling of the exception even faster, also the registers from R8 to R12 have been replicated.
The System mode, whilst presenting all the benefits of a privileged mode, sees all the same registers as the User mode.
Obviously, the program counter is not replicated in any of the modes.
In addition to the general-purpose registers, there is available a status register CPSR (the content of which is illustrated in Table 1) containing information on the result of the execution and on the mode of operation.
TABLE 13130292827268 76540NZCVQ(RAZ)  IFTMODE
where                N flag (negative flag): N=1 if the result of a operation is negative;        C flag (carry flag): C=1 if the result of an add operation generates carry or else if during the step of generation of the operands for a logic operation particular conditions have arisen; C=0 if the result of an operation of subtraction generates borrow;        V flag (overflow flag): V=1 if an arithmetic operation has generated overflow;        Z flag (zero flag): Z=1 if the result of an operation is zero;        Q flag: in the Extended versions Q=1 if the result of one of the operations of the group Enhanced DSP generates overflow or saturation.        
The bits from 26 to 8 must not be modified and are read as zero.                I bit: if I=1, it disables the interrupt IRQ;        F bit: if F=1 it disables the interrupt FIQ;        T bit: if T=0 the processor is operating in the normal ARM mode; if T=1 the Thumb execution mode is active. In this mode, ARM interprets a reduced set of instructions, with operation codes, or opcodes, that occupy only 16 bits but with 32-bit register arithmetic, and sees simultaneously only 8 general-purpose registers.        
The 5 least significant bits of the status register describe the mode of operation of the ARM processor, as may be seen from the following Table 2:
TABLE 2CPSR (4:0)MODE0b10000USER0b10001FIQ0b10010IRQ0b10011SUPERVISOR0b10111ABORT0b11011UNDEFINED0b11111SYSTEM
All the privileged modes, in addition to the register CPSR, then present a register SPSR, replicated for each mode. The register SPSR associated to a given mode is used for saving the status word contained in the register CPSR when the exception corresponding to that mode is raised; at the end of handling of the exception, the register CPSR will be restored with the value of the register SPSR. The instructions of the ARM processor may be classified in six groups:                data processing (addressing mode 1);        load&store word (32 bits) or unsigned byte (addressing mode 2);        load&store halfword (16 bits) or signed byte (addressing mode 3);        multiple load&store (addressing mode 4);        instructions for the coprocessors (addressing mode 5);        jumps.        
The ARM processor enables the conditioned execution of almost all its instructions on the basis of the flags N, C, V, Z contained in the status register CPSR.
The condition is described in the four most significant bits of the opcode of the ARM processor.
Exceptions to the above are the instruction BLX (branch, link and exchange to Thumb state) and the instructions that refer to the coprocessors, which are not conditional.
The various combinations of the flags generate sixteen types of conditioned execution:                AL (always): the instruction is always executed);        NV (never): the instruction is never executed, is not defined, or else forms part of the non-conditional instructions referred to previously;        EQ(equal): Z=1;        NE (not equal): Z=0;        CS/HS (carry set—unsigned higher or same): C=1;        CC/LO (carry clear—unsigned lower): C=0;        MI (minus—negative): N=1;        PL (plus—positive or zero): N=0;        VS (overflow): V=1;        VC (no overflow): V=0;        HI (unsigned higher): C=1 and Z=0;        LS (unsigned lower or same): C=0 or Z=1;        GE (unsigned greater than or equal): N=V;        LT (signed less than): N!=V;        GT (signed greater than): Z=0 and N=V;        LE (signed less than or equal): Z=1 or N!=V.        
There are eleven addressing modes of the ARM processor for the data-processing instructions:                Immediate;        direct from register;        logic shift to the left from register (the amount of the shift is contained in a register);        logic shift to the left from immediate (the amount of the shift is expressed by a 5-bit immediate contained in the opcode);        logic shift to the right from register;        logic shift to the right from immediate;        arithmetic shift to the right from register;        arithmetic shift to the right from immediate;        rotation to the right from register;        rotation to the right from immediate;        rotation through the carry flag.        
The data-processing instructions are operations of a logic or arithmetic type that are executed by the 32-bit arithmetic logic unit (ALU) of the ARM processor.
The above operations can modify the value of the flags of the register CPSR on the basis of their result when the bit 20 (S bit) of the opcode is at a high level. The execution step of these operations always lasts just one clock cycle.
The ARM processor is then able to perform multiplications and multiplications-with-accumulation of numbers up to 32 bits, generating a 64-bit result that is split into two destination registers.
All the multiplication operations support only direct-from-register addressing, and their execution step lasts just one clock cycle, irrespective of the need or otherwise for performing the operation of accumulation at the end of the multiplication itself.
The operations of load&store in memory of Mode 2 act on words and unsigned bytes and support nine addressing modes, which in any case make use of a base register and a displacement:                base register +/− 12-bit immediate;        base register +/− offset register;        base register +/− scaled offset register (the offset register is shifted with modes similar to the data-processing instructions; the amount of the shift is described by an immediate);        base register +/− pre-indexed immediate (the base register is updated before accessing memory);        base register +/− pre-indexed offset register;        base register +/− pre-indexed scaled register;        base register +/− post-indexed immediate (the base register is updated after accessing memory);        base register +/− post-indexed offset register;        base register +/− post-indexed scaled register.        
The operation of reading a 32-bit word from the memory does not require the address to be in itself word-aligned; the reading is made in any case, after which the word is rotated by 8, 16 or 24 if the address was not word-aligned but ended in 0b01, 0b10 or 0b11.
The operation of writing a word, instead, is self-aligned by ignoring completely the two least significant bits of the address; hence, it is not exactly the dual of the reading operation.
The operations of load&store in memory of Mode 3 act on halfwords and signed bytes and support only six of the nine addressing modes associated to Mode 2:                base register +/− 8-bit immediate;        base register +/− offset register;        base register +/− pre-indexed immediate;        base register +/− pre-indexed offset register;        base register +/− post-indexed immediate;        base register +/− post-indexed offset register.        
Unlike what occurs in the case of the instructions of Mode 2, the reading and writing operations on halfwords (16 bits) entail the need for halfword-aligned addresses to be executed correctly.
The operations of multiple load&store of mode 4 contain within their opcode a 16-bit field that marks with a high-level bit the registers involved in the transfer.
The above operations present four addressing modes:                increment after: the list of registers is loaded into memory (for the store operations) or from the memory (for the load operations) starting from the address pointed to by a base register. The subsequent registers will be loaded into addresses obtained by incrementing by four (given that access is by words) the address of the previous access;        increment before: the basic address is first incremented by four and then used for the first access. The subsequent registers will be loaded into addresses obtained from the previous one by increment;        decrement after: as for increment after, but the next address is obtained by decrement;        decrement before: as for increment before, but the addresses are obtained by decrement.        
The base register may optionally be updated at the end of the operation with the value of the next location pointed to if the bit 21 (W bit) of the opcode is at a high level.
There moreover exist instructions of multiple load&store which can be executed only in privileged operating mode and which enable loading of the program counter from the memory or accessing of the-general-purpose registers of the User mode.
The ARM processor then envisages a further two instructions that access the memory:                SWP: swap word;        SWPB: swap byte.        
These instructions each access the memory twice, by loading into a first register the contents of a memory location pointed to by a base register and by writing in the same memory location the contents of a second register. If the first and the second register coincide, the contents of the register and of the memory location have been swapped.
The operations on the coprocessors of mode 5 comprise:                load from memory to coprocessor;        store from coprocessor to memory;        move from general-purpose register to coprocessor's register;        move from coprocessor's register to general-purpose register;        execute coprocessor's data-processing operation.        
The instructions for the coprocessors are not described here. The ARM processor then envisages three jump instructions:                PC-relative conditioned jump (with and without storage of the return address): the 24-bit offset is contained in the opcode of the jump. To calculate the destination address, the offset is multiplied by four (in that each opcode of the ARM microprocessor occupies 32 bits) and extended with sign, and is then added to the current value of the program counter. It should be pointed out that, as a result of the architecture of the pipeline of the ARM processor, at the moment of updating, which takes place in the execution step, the program counter contains the address of the jump instruction incremented by eight;        unconditioned jump with change of mode: the processor performs a jump with 24-bit offset, stores the return address in the link register and enters Thumb mode, modifying the T bit of the status word;        conditioned jump with change of mode (with or without storage of the return address): the processor performs a jump to the address contained in an index register. The value of the index register is aligned by neglecting its least significant bit, which is used for deciding the mode of operation (if it is at a high level Thumb mode; otherwise ARM mode).        
It is to be emphasized that, unlike the case of the LX processor, for the ARM processor the program counter forms part of the general-purpose registers; hence, any operation of data processing or of load from memory that will have R15 as destination register may generate a jump.
The commitment step of the operations that have the program counter as destination is therefore different from the normal load or data-processing instructions and must envisage restoring of the register CPSR with the value contained in the register SPSR associated to the current mode.
Two special instructions concern handling of the status registers:                MSR: moves an immediate or a general-purpose register of the current mode into one of the status registers of the current mode (CPSR or SPSR);        MRS: moves a status register of the current mode into a general-purpose register of the current mode.        
The above instructions can be executed correctly only in a privileged execution mode and must not be used for modifying the T bit of the register CPSR, which would cause a transition from ARM mode to Thumb mode, or vice versa.
Accessing the register SPSR in the System mode, which does not see this register, has an unforeseeable effect on the execution.
There now follows a description of the architecture of the LX microprocessor.
The LX processor is a core with the possibility of assuming different configurations according to the use; in what follows, reference will be made to the 4-issue single-cluster version.
The entire architecture is 32-bit and has 64 general-purpose registers plus a program counter not accessible directly by the user.
Two of the general-purpose registers have, however, particular functions:                the register R63 is used as link register;        the register R0 contains always the value zero and is used for comparisons and assignments that cannot use explicitly a further immediate field, as will be clarified in what follows.        
There then exists a series of special registers (always 32-bit ones) mapped in a reserved area that occupies the last 4 Kbytes of the address space of the LX processor, which is of 4 Gbytes.
These registers, among other things, comprise:                a status register PSW, which contains the mode of operation (either User or Supervisor) and information on the devices for the protection and management of the memory;        a stack register for the status register, used in the presence of exceptions;        a HANDLER_PC register, used in the presence of exceptions for containing the address of the exception handler;        other registers that contain information required for recognition and management of the exceptions;        registers for control of the protection unit for the program memory (IPU) and data memory (DPU).        
In each cluster of the LX processor there are therefore four lanes, to each of which there is associated an ALU capable of executing the normal 32-bit logic-arithmetic operations. There are then two units capable of making the multiplications of a 16-bit number with a 32-bit number, with result truncated at 32 bits. These units are associated to lanes 1 and 3 of the cluster.
The LX processor enables just one access to memory for each cluster; hence, there exists a single Load&Store unit, which is able to execute operations on words, halfwords, or bytes and which may be associated to any one of the lanes of the cluster.
A unit referred to as Instruction Issue Unit allocates the operations contained in one and the same bundle or set of instructions on the lanes in such a way that the two least significant bits of the word address of each instruction determine the lane on which the instruction itself is run.
A direct consequence of this is that a multiplication instruction, which must be executed on an odd lane, must occupy an odd word address in the program memory. It is therefore necessary to make the alignment by inserting into the code, if necessary, NOP (no operation) instructions.
In each cluster there is then present a unit referred to as branch unit, which executes the jump operations. The LX processor performs the conditioned-jump operations on the basis of one of the branch-bit registers, a group of eight registers of one bit each, which contain the result of logic operations or comparison operations.
The value of a branch-bit register must be assigned at least two bundles before the corresponding conditioned jump occurs.
All the jump operations must occupy the first instruction of the bundle, and there cannot be two jump instructions within the same bundle, even if the two constructs are alternative.
The LX processor has just two addressing modes for the data-processing instructions:                from register;        from immediate.        
The immediates may, however, be of two types: short and long.
The short immediates are 9-bit signed numbers, which are able to represent a number from −128 to +128 and are incorporated into the 32 bits of the opcode.
The long immediates are 32-bit signed numbers and occupy with the 9 least significant bits part of the 32 bits of the opcode. The remaining 23 bits are contained in one of the words adjacent to the opcode, with the constraint of being associated to lane 0 or lane 2 of the cluster, and hence occupying an even word address.
The operations of access to the memory enable only addressing by means of the base register plus 9-bit offset and, unlike what occurs in the case of the ARM processor, they involve alignment.
Accesses to words on addresses that are not word-aligned, as well as accesses to non-halfword-aligned halfwords, generate exceptions.
As regards the jump instructions, mention of which has already been made previously, there are conditioned-jump operations (BR, BRF), which make offset jumps (23-bit) and unconditioned-jump instructions (CALL, GOTO, RTI), which can make offset jumps (23-bit) or else jumps to the address pointed to by the link register, with the constraint that the link register must be modified at least three bundles before the corresponding jump.
There are then two instructions (SLCT, SLCTF), which enable a conditional MOV operation to be performed on the basis of the evaluation of a branch bit: if this has a high level, the first source register is brought into the destination register; otherwise, the second source register or an immediate is loaded according to the addressing mode.
Finally, it should be emphasized that the LX processor, unlike the ARM processor, does not contain a register of the flags, and that hence it is not able to point out automatically whether the arithmetic operations generate carry or overflow.
Already known to the art are various solutions that aim at enabling a given microprocessor to execute instructions of a set originally designed for a different processor.
For example, the European patent application EP-A-0 747 808 describes a dual-instruction-set processor that is able to interpret both the native code of an IBM PowerPC computer and the code for the Intel x86 family of processors.
The above-mentioned document describes the management of the system of virtual memory necessary for enabling multitasking of two applications developed for different instruction sets, but does not describe a translation process.
To carry out an efficient translation of the x86 instructions, the original structure of the PowerPC is extended with instructions and registers dedicated to the execution in x86 mode.
The issue logic of the core is moreover modified by the addition of units for decoding and translating x86 opcodes. These units work in parallel with the native decoding unit of the PowerPC, and on the basis of the current operating mode the choice is made as to which of the two decodings is to be applied. To enable determination of the operating mode of the processor, there is added a Control Unit Mode, which is responsible for handling switching between the x86 mode and the PowerPC mode.
The above unit is able to interact with the Memory Management Unit to enable a proper management of the system of virtual memory.