The present application relates to instruction processing in microprocessor architectures, and more particularly to multimedia processing using parallel-processing architectures.
Any microprocessor (or analogous programmable logic) has to translate a stream of instructions into electrical operations in hardware: at the lowest level, the logical bits of the instruction must be translated into appropriate electrical signals sent to physical devices (e.g. transistors, gates, latches, or registers). One common way to implement this is with microcoded instructions, where a large number of bits specify signals to be applied to various lines, within a known hardware structure. Such instructions are necessarily bulky, because nearly all possible outputs are specified in each instruction. Moreover, such instructions become even more cumbersome in multiprocessor implementations.
Various attempts have been made to reduce the bulk of microcoded programs. One way which has been suggested to reduce the bulk of microcode is known as “vertical” microcode. This approach uses a decoding table to reduce the storage requirements. With this decoding table defined, each microcode instruction itself. (For example, if there are not more than 256 instructions, each can be referred to by an 8-bit name, even if the separate instructions are hundreds of bits in length.) The short “names” of instructions are referred to as vertical microcode, and the actual executable microcode instructions are referred to as “horizontal” microcode. In this approach, the lookup table is sometimes used to encode instruction fields rather than complete instructions. This reduces the memory space needed for the lookup. Additional logic is needed for appropriate field combination. The vertical microcode approach has been generally abandoned, because it is too slow.
A processor will usually have only a limited amount of writable control storage (“WCS”) available. When there are too many routines to fit in WCS at once, some form of overlaying is necessary. A serial loop can be used to load microcode at startup, but using a serial loop to load overlays is not practical, since the host can load instructions only slowly (e.g. 100 microsecond—3 ms per instruction, depending on disk accesses). Some array processors provide microcode overlaying facilities, but these are normally host driven (using polled I/O or DMA), and are implemented via the normal microcode load mechanism.
One way to cope with parallel-processing hardware is to use instruction-level parallelism. A notable example of this is Very Long Instruction Word (“VLIW”) architectures. In such architectures a single instruction can contain separate fields for separate paralleled portions of hardware, e.g. for separate paralleled ALUs, or even for alternative logical conditions.
Processing Architectures With Types Instruction Sets
The present application describes a new architecture, for microprocessors and the like, in which a new layer of indirection is added: the instruction sequence includes type identifiers which define how the individual instructions are to be translated. (Preferably but not necessarily, the type identifier points into a set of interpretation registers, and the selected register includes insertions which are combined with the opcode of the original instruction to produce an expanded executable instruction.)
This architecture overcomes many of the disadvantages of traditional Very Long Instruction Word (VLIW) architectures and, in various embodiments, provides one or more of at least the following advantages:                The instruction set can be expanded while maintaining backward compatibility with existing programs;        Program code density is much higher than with traditional VLIW instruction sets; and        Algorithms can be coded independently of the type of data to be processed.        