In each computing system, there has to exist a precise definition of the instructions the processor can handle, of the format of these instructions, and of the arguments and operands required by these instructions. This definition of a processor's instruction set is usually referred to as the architected code.
There exist a multitude of different codes, and each code can only be handled by the corresponding processor type. Compilers, which translate programs written in a higher language to a sequence of basic processor instructions, are only capable of producing code corresponding to one specific computer architecture. It would be desirable to be able to process programs written in different codes by one processor type. Thus, programs written for different computer architectures could be processed, which implies that the range of programs available for each processor type would increase. Recompiling source code in order to produce suitable code for the different processor types would no longer be necessary.
There often exist different versions of instruction sets for one processor type. Being able to convert instructions from one instruction set to another would thus also allow a quick handling of updated code.
For a long time, the debate has been going on about whether CISC (Complex Instruction Set Computing) or RISC (Reduced Instruction Set Computing) is better suited for high performance computing. When CISC instructions are processed, lots of tasks are performed in parallel. CISC instructions usually are long and complex. Because of the inherent parallelism, large processor cycle times are required. The typical pipeline for CISC computing comprises a rather low amount of pipeline stages. CISC instructions are hard to bring to a common format.
In contrary, RISC instructions are short and simple and do not start a lot of parallel tasks when they are executed. They only fulfill one well-defined task, and they can easily be brought to one common format. Execution units for RISC processing comprise pipelines with a large amount of pipeline stages. The processor cycle for RISC processing can be very short, which means that instructions can be quickly clocked through the different pipeline stages.
Modern superscalar processing concepts suggest to process the basic instructions out of their sequential order, which means that any parallelism hidden in a sequential program is exploited. A performance gain is achieved by dispatching a multitude of completely independent instructions from different points of the instruction stream to various execution units in the same clock cycle. For a number of reasons, RISC instructions are better suited for out-of-order processing than CISC instructions. Because a typical RISC instruction only defines one task, the instruction itself, together with its source and target operands, can be brought to a simple format. Furtheron short cycle times are possible. This is especially important for out-of-order processing since the resolution of data dependencies, including register renaming, is one of the major challenges.
Considering all these arguments, one might conclude that a superscalar processor having a RISC architecture is favorable. On the other hand, CISC is a widespread standard in many fields, and there exists a lot of code for CISC architectures. For this reason, it makes sense to break up external CISC instructions to a number of internal RISC instructions which can then be processed out-of-order by a superscalar RISC processor. Such a processing concept requires a powerful CISC to RISC converter.
Several solutions have been proposed for code conversion at run time. In IBM TDB Publication "System/370 Emulator Assist Processor For A Reduced Instruction Set Computer", Vol. 30, No. 10, March 1988, to J. Garcia, E. S. Hannon, R. Kalla, J. A. Mitchell and D. M. Zareski, microcode-controlled conversion of the external S/370 code, which is a CISC code, to the internal RISC instructions is described. First, the "Emulation Assist Processor" loads an external S/370 instruction which is to be translated. Next, microcode from a specialized local microcode control store is executed in order to generate multiple host instructions for said single S/370 instruction.
Compared to a real hardware-controlled translation, any translation that is done by executing microcode routines is slow. For a code conversion at run time, code translation from the external CISC instructions to internal RISC instructions may only take several cycles. Microcode-controlled conversion takes a lot of cycles, because the whole microcode routine has to be executed.
Furtheron, the emulation assist processor can only translate one instruction at a time. Parallel conversion of several instructions dispatched at the same time is therefore impossible.
A more refined converting scheme is described in European Patent Application 651 320 A1, "Superscalar Instruction Decoder", to D. B. Witt and M. D. Goddard. In this document, a decoder is described, which translates CISC instructions to a number of RISC instructions at run time. The RISC instructions emerging from the decode process are forwarded to a RISC superscalar processor. Thus, use of a CISC instruction set can be combined with the advantages of superscalar RISC processing. Dependent on the number of RISC-like operations to be generated from one CISC instruction, two translation paths are described: in case the number of RISC operations to be generated exceeds three, code translation is performed by executing a microcode routine. For CISC operations with less complexity, which can be expressed by less than four RISC-like operations, a fast conversion path is implemented. Register identifiers of the CISC instruction are routed by means of programmable array logic (PAL) or combinatorial logic to the corresponding RISC-like operations.
In case any architectural changes are made to either the external or the internal code, the decode logic would have to be changed. It also would have to be changed if errors in the conversion path arise. In case of combinatorial logic, the whole logic will have to be remapped. In case of programmable array logic, reprogramming the logic is possible in order to implement changes. But it is difficult to implement selective changes. Reprogramming a certain conversion function of the PAL requires to reprogram the whole logic, and therefore, it would be desirable to be able to change the conversion path in a more selective way.