1. Field of the Invention
The present invention relates to a processor architecture, and more particularly to a processor architecture for handling variable bit width data.
2. Description of the Related Art
Processors generally process a single instruction of an instruction set in several steps. Early technology processors performed these steps serially. Advances in technology have led to pipelined-architecture processors, also called scalar processors, which perform different steps of many instructions concurrently. A "superscalar" processor is implemented using a pipelined structure, but improves performance by concurrently executing scalar instructions.
In a superscalar processor, instruction conflicts and dependency conditions arise in which an issued instruction cannot be executed because data or resources are not available. For example, an issued instruction cannot execute when its operands are dependent upon data calculated by other nonexecuted instructions.
Superscalar processor performance is improved by the speculative execution of branching instructions and by continuing to decode instructions regardless of the ability to execute instructions immediately. Decoupling of instruction decoding and instruction execution requires a buffer between the processor's instruction decoder and functional units that execute instructions.
Performance of a superscalar processor is also improved when multiple concurrently-executing instructions are allowed to access a common register. However, this inherently creates a resource conflict. One technique for resolving register conflicts is called "register renaming". Multiple temporary renaming registers are dynamically allocated, one for each instruction that sets a value for a permanent register. In this manner, one permanent register may serve as a destination for receiving the results of multiple instructions. These results are temporarily held in the multiple allocated temporary renaming registers. The processor keeps track the renaming registers so that an instruction that receives data from a renaming register accesses the appropriate register. This register renaming function may be implemented using a reorder buffer which contains temporary renaming registers.
Many existing processors run a large base of computer programs but are limited in performance. To improve instruction throughput in such processors, it may be desirable to incorporate superscalar capabilities therein. W. M. Johnson in Superscalar Processor Design, Englewood Cliffs, N.J., Prentice Hall, 1991, p. 261-272, discusses such a superscalar implementation.
For example, a family of processors, called the .times.86 family, have been developed including 8086, 80286, 80386, 80486 and Pentium.TM. (Intel Corporation, Santa Clara, Calif.) processors. Advantageously, .times.86 processors are backward compatible. The newest processors run the same programs as older processors. .times.86 processors are considered to employ a complex-instruction-set-computer (CISC) architecture, in which many different densely-coded instructions are implemented.
A variety of techniques have been used in the .times.86 family to implement backward compatibility. These techniques have made the implementation of register renaming very difficult. For example, the .times.86 instruction set uses registers for which at least a subset of bits overlap the bits of another register, such as word registers that overlap double word registers and byte registers that overlap word and doubleword registers. As .times.86 processors evolved from 8 to 16-bit and then to 32-bit processors, the register architecture similarly evolved into a form in which 8-bit general registers AH and AL, respectively, comprise the high and low bytes of a 16-bit general register AX. AX, in turn, includes the low order 16 bits of a 32-bit extended general register EAX. B, C and D registers are similarly constrained. These registers are supplemented by additional register pairs: ESI:SI, EDI:DI, ESP:SP and EBP:BP, having low order bits of the 32-bit extended (E) doubleword registers overlapped by 16-bit word registers. In addition, .times.86 processors have an extensive and complicated instruction set that introduces additional complexity so that some instruction opcode fields that specify overlapping registers for some data widths also specify nonoverlapping registers of other data widths.
If registers cannot be renamed, register access conflicts are resolved only by having one instruction cede control to another, delaying the dispatch of an instruction until the instruction is free of dependencies and causing stalling of the parallel dispatching of instructions in the processor pipeline. This causes serial operation of instructions that are intended to be executed in parallel.
Because the .times.86 architecture includes a small number of registers (eight), frequent register reusage is encouraged for a superscalar processor that is intended to execute instructions in parallel. It is thus desirable to allow register reusage, perhaps by employing register renaming. Unfortunately, the overlapping nature of .times.86 instructions limits the renaming of overlapped registers for resolving mutual data dependencies. Register renaming is impeded because, although the overlap relationship of registers is known and invariable and thus predictable, architectural and code-compatibility constraints require that the registers be considered independent entities. Thus, although register renaming could resolve register resource conflicts in an .times.86 processor, the .times.86 architecture substantially limits register renaming.
It is fundamental to achieving a performance improvement using parallel processing that instructions be dispatched regularly and rapidly. When dispatching of instructions is stalled awaiting execution of another instruction, the processor performs only as well as a serial processor.