Instruction storage is an important issue in the design of microprocessors, DSPs and other types of processors. In general, instructions may be stored in internal memory, i.e., on-chip memory, or external memory, i.e., off-chip memory. Since instructions stored on-chip are typically accessed more efficiently than those stored off-chip, it is important to store as large a percentage of the instructions on-chip as possible. This issue is particularly important for embedded processors, in which the on-chip memory space is rather limited. For example, the Lucent Technologies Inc. 1600 family of processors, which are often utilized in embedded applications, include about 65 kilobytes of internal memory. Although more recently developed embedded processors can include more than 1 megabyte of on-chip memory, modern wide-issue processors can require as much as ten times this amount of on-chip memory. It is therefore becoming increasingly important to reduce the instruction storage space requirements for processors, particularly for those processors used in embedded applications.
FIG. 1 shows an example of a conventional digital signal processor (DSP) architecture. A processor 10 includes control logic 12, a global memory 14, internal storage 16 and a datapath 18. The control logic 12 is the "glue" of the processor architecture. It coordinates the operation of the other elements by issuing control signals to regulate interaction. The global memory 14 is used to store data and programs. The internal storage 16, which has a substantially faster access time than the global memory 14, is used to store data that is to be processed in accordance with a program currently being run by the processor 10. The datapath 18 manipulates this data and processes the results of arithmetic and logical operations, and will generally include well-known elements such as fetch, decode and execution units. DSP applications executed in the processor 10 will typically include at least two types of instructions: control instructions and DSP inner loop computational instructions. Control instructions generally issue a single operation and can be encoded in relatively small word-length instructions. DSP computational instructions, in contrast, can typically issue multiple operations from a single instruction or provide parallel issue of multiple instructions. A typical dynamic DSP processing application includes about 70% DSP computational code and about 30% control code. Dynamic video processing applications can include up to 90% DSP computational code and only 10% control code. Traditionally, processor designers have reduced the amount of instruction storage required in a DSP architecture by either providing variable length instructions or providing reduced access instructions of fixed width which partition the register files into multiple shared files. See, for example, K. Kissell, "MIPS16: High-Density MIPS for the Embedded Market," In Proceedings of the ACM SIGPLAN Workshop on Languages, Compilers and Tools for Real-Time Systems, Las Vegas, Nev., Jun. 15, 1997.
Processors of the type illustrated in FIG. 1 may be, for example, stack-based or register-based. A stack is an internal storage space which is typically partitioned into words of equal size and follows last-in-first-out (LIFO) operation. The first entry placed on the stack is said to be at the bottom of the stack, and each subsequent entry is placed at the top of the stack. In other words, the stack grows from last entry to first entry. Entries may only be accessed from the top of the stack. Push and pop operations are generally required to add (load) and remove (store) words from the top of the stack. More complex stacks can include special instructions to access words not currently at the top of the stack. It is important to note that in this type of architecture, the operations do not specify the address of the operands on which they operate. Thus, there must exist an implicit ordering of operands in a stack-based architecture. FIG. 2 illustrates the manner in which an addition operation may be carried out in a stack-based processor, using an implicit ordering of operands. In this example, operands A and B are loaded from memory, and the result of their addition is stored in memory at location C. The advantage of a stack-based architecture is that the stack is a simple, easily-implementable structure, which does not require an explicit address in the instruction format to access operands.
In a register-based architecture, operands are loaded from memory and stored in a register file. Unlike stack-based instructions, such as those illustrated in the example of FIG. 2, register-based instructions must specify an explicit address to access operands. In general, operands contained in registers are more accessible than those contained in a stack. FIG. 3 illustrates the manner in which an addition operation may be carried out in a register-based processor. In this example, like the previous example, operands A and B are loaded from memory, and the result of their addition is stored in memory at location C.
As noted previously, the stack-based architecture may be advantageous for certain applications due to its reduced storage requirements. On the other hand, registers can hold variables that may need to be accessed multiple times in a concurrent, nonsequential manner. Register-based instructions may decrease the number of external memory accesses and thus decrease execution time of a program that contains many data accesses. For example, suppose that a compiler for each of a stack-based and a register-based processor is to compile the expression C(AB)+C. The compiler for the register-based processor will have the ability to calculate the individual arithmetic operations in any order, determining the efficiency of each order with respect to data hazards, operand location, etc. For this particular expression, a stack-based architecture may require two memory accesses of the variable C, while a register-based architecture may store the variable C in a register and thus potentially avoid multiple memory accesses.
A number of techniques have been developed which can be used to allow a given processing system to support multiple architectural spaces. One such technique involves the use of a branch-exchange instruction to pass control from one processor to another within the system. The branch-exchange instruction invokes an interrupt on a requesting processor to pass control to the other processor, and control returns back to the requesting processor by a similar mechanism. However, this technique generally does not allow any sharing of dataflow execution units. A related technique which does allow some sharing of execution units has been used in the Delft-Java processor to branch between a Java Virtual Machine view and a RISC-based machine view, as described in greater detail in C. J. Glossner and S. Vassiliadis, "The Delft-Java Engine: An Introduction," Lecture Notes in Computer Science, Springer-Verlag, Third International Euro-Par Conference (Euro-Par '97 Parallel Processing), pp. 766-770, Passau, Germany, Aug. 26-29, 1997, which is incorporated by reference herein. In the Delft-Java processor, a reserved opcode is used as a branch-exchange instruction to allow control to be passed back and forth between the two views. Another dual machine view technique is implemented in the ARM Thumb processor, as described in ARM 7TDMI Datasheet, Advanced RISC Machines, Ltd., UK, Document No. ARM DDI 0029E, August 1995. However, in this technique the "Thumb" architecture is a subset of the full ARM32 architecture. A similar approach is used in the above-cited MIPS16 reference.
Although the techniques described above can permit a processor to execute multiple architectures, further improvements in code compression and processing efficiency are needed, particularly for embedded processor applications with limited on-chip storage space.