1. Field of the Invention
This invention is related to the field of superscalar microprocessors and, more particularly, to the storing of control bit vectors representing instructions prior to the execution of these instructions.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by executing multiple instructions during a clock cycle and by specifying the shortest possible clock cycle consistent with the design. As used herein, the term xe2x80x9cclock cyclexe2x80x9d refers to an interval of time accorded to various stages of an instruction processing pipeline. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively.
Many superscalar microprocessor manufacturers design microprocessors in accordance with the x86 microprocessor architecture. Due to its wide acceptance in the computer industry, the x86 microprocessor architecture is employed by a large body of software. Microprocessors designed in accordance with the x86 microprocessor architecture advantageously enjoy compatibility with this large body of software. Computer systems manufacturers may be more inclined to choose an x86-compatible microprocessor than a non-x86-compatible microprocessor for computer systems. Information regarding the x86 microprocessor architecture may be found in the publication entitled: xe2x80x9cPC Magazine Programmer""s Technical Reference: The Processor and the CoProcessorxe2x80x9d by Hummel, Ziff-Davis Press, Emeryville, Calif. 1992. This publication is incorporated herein by reference in its entirety.
The x86 microprocessor architecture is characterized by a plurality of complex variable byte length instructions. A particular instruction may comprise a single byte, or multiple bytes. Additionally, instructions may specify one or more instruction operations. As used herein, an xe2x80x9cinstruction operationxe2x80x9d refers to an operation which may be executed by a functional unit to produce a result. Exemplary instruction operations may include an arithmetic operation, a logical operation, an address generation, etc. Instructions may have explicit instruction operations (i.e. operations defined by the opcode of the instruction) and implicit operations defined by the particular instruction coded by the programmer (i.e. an address generation and a load or store operation for an operand stored in memory).
Unfortunately, complex instructions employing more than one instruction operation are often difficult to execute in superscalar microprocessors. Superscalar microprocessors typically employ multiple functional units to perform concurrent execution of instructions. Therefore, it is desirable that the functional units be relatively simple, such that each functional unit occupies a small amount of silicon area. For example, a functional unit may be configured to execute one instruction operation during a clock cycle. Complex instructions may utilize multiple clock cycles within such functional units. The complex instruction must be interpreted differently by the functional unit during each clock cycle to perform the instruction operations specified by that instruction. Complex logic may be employed to correctly execute these complex instructions, deleteriously enlarging the size of the functional unit. A less costly solution (in terms of complexity and silicon area) to the execution of complex instructions employing multiple instruction operations is desired.
The multiple functional units employed by a superscalar microprocessor may be equipped with reservation stations to store instructions and operands prior to their execution by the respective functional unit. Reservation stations are useful in a superscalar microprocessor because instructions may be decoded and dispatched prior to the source operands of the instruction being available. An xe2x80x9coperandxe2x80x9d or xe2x80x9coperand valuexe2x80x9d of an instruction is a value the instruction is intended to operate upon. Operands may be located by an xe2x80x9coperand addressxe2x80x9d which may define a register or a memory location storing the operand. Operands may be register operands in which the value is stored in a register, memory operands in which the value is stored in a memory location, or immediate operands in which the value is stored within the instruction itself. A source operand value is a value upon which the instruction operates, and a destination operand is a location to store the result of executing the instruction. A result is a value generated by operating upon the source operands according to the instruction operation(s) defined by the instruction.
Generally speaking, a reservation station comprises one or more storage locations (referred to as xe2x80x9creservation station entriesxe2x80x9d). Each reservation station entry may store a decoded instruction and operands or operand values. Other useful information may also be stored in a reservation station entry.
Typically, a decoded instruction is transferred to a storage device within a functional unit when the operand values have been provided. The decoded instruction is then decomposed into a plurality of control bits. The control bits are conveyed to the dataflow elements within the functional unit, and cause the dataflow elements to perform the instruction operation. A xe2x80x9cdataflow elementxe2x80x9d is a device which performs a particular manipulation upon an input operand or operands according to a set of control bits conveyed thereto. For example, a multiplexor is a dataflow element which selects one of multiple input operands. The control bits provided to the multiplexor indicate which of the multiple input operands should be selected. As another example, an arithmetic unit is a dataflow element which may add or subtract input operands dependent upon the state of its input control bits.
Unfortunately, decomposing a decoded instruction into control bits and then performing the instruction operation defined by the instruction during a clock cycle may limit the frequency (i.e. the inverse of the clock cycle) of a superscalar microprocessor. It would be desirable to perform an equivalent function to the reservation station/functional unit pair wherein an instruction is selected from the reservation station during a clock cycle and the result is produced in a subsequent clock cycle without limiting the frequency of the superscalar microprocessor.
The problems outlined above are in large part solved by a control bit vector storage according to the present invention. The present control bit vector storage (preferably included within a functional unit) stores control bits indicative of a particular instruction. The control bits are divided into multiple control vectors, each vector indicative of one instruction operation. The control bits control dataflow elements within the functional unit to cause the instruction operation to be performed. Advantageously, logic for determining the control bits for the data flow of a functional unit is removed from the functional unit. The clock cycle time characterizing the functional unit may be advantageously reduced by the amount of time previously used by the logic to generate the control bits.
Additionally, the present control bit vector storage allows complex instructions (or instructions which produce multiple results) to be divided into simpler operations. The hardware included within the functional unit may be reduced to that employed to perform the simpler operations. Advantageously, the amount of silicon area occupied by the functional unit may be reduced. Superscalar microprocessors which employ multiple functional units may particularly benefit from this utility.
In one embodiment, the control bit vector storage comprises a plurality of vector storages. Each vector storage comprises a pair of individual vector storages and a shared vector storage. The shared vector storage stores control bits common to both control vectors. Advantageously, control bits which are the same for both control vectors are not redundantly stored in each of the individual vector storages.
Broadly speaking, the present invention contemplates a control bit vector storage for a microprocessor, comprising plurality of vector storage locations, a first selection device, and a control unit. Each of the plurality of vector storage locations is configured to store a plurality of control bits indicative of at least one instruction operation associated with an instruction. Coupled to the plurality of vector storage locations, the first selection device is configured to select the plurality of control bits stored in one of the plurality of vector storage locations. The first selection device is further configured to convey the plurality of control bits to a plurality of functional units configured to receive the plurality of control bits. Each of the plurality of control bits controls a particular portion of the plurality of functional blocks. Coupled to the selection device, the control unit is configured to control the selection device.
The present invention further contemplates a functional unit comprising a plurality of functional blocks, a plurality of storage locations, and a multiplexor. The plurality of functional blocks are configured to perform instruction operations. Each of the plurality of vector storage locations is configured to store a plurality of control bits indicative of at least one instruction operation. Coupled between the plurality of functional blocks and the plurality of vector storage locations, the multiplexor is configured to select an instruction operation from the plurality of vector storage locations for conveyance to the plurality of functional blocks.