1. Technical Field
The present invention relates in general to data processing systems and in particular to a processor in a data processing system. More particularly, the present invention relates to scoreboarded special purpose registers on board the processor.
2. Description of the Related Art
Reduced instruction set computer (xe2x80x9cRISCxe2x80x9d) processors are employed in many data processing systems and are generally characterized by high throughput of instructions. RISC processors usually operate at a high clock frequency and because of the minimal instruction set do so very efficiently. In addition to high clock speed, processor efficiency is improved even more by the inclusion of multiple execution units allowing the execution of two, and sometimes more, instructions per clock cycle.
As used herein, xe2x80x9cclock cyclexe2x80x9d refers to an interval of time accorded to various stages of an instruction processing pipeline within the processor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively.
Processors with the ability to execute multiple instructions per clock cycle are described as xe2x80x9csuperscalar.xe2x80x9d Superscalar processors, such as the PowerPC(trademark) family of processors available from IBM Corporation of Armonk, N.Y., provide simultaneous dispatch of multiple instructions. Included in the processor are an Instruction Cache (IC), an Instruction Dispatch Unit (IDU), an Execution Unit (EU), an Instruction Sequencer Unit (ISU) and a Completion Unit (CU). Generally, a superscalar, RISC processor is xe2x80x9cpipelined,xe2x80x9d meaning that a second instruction is waiting to enter the execution unit as soon as the previous instruction is finished.
Generally a pipeline comprises a plurality of pipeline stages. Each pipeline stage is configured to perform an operation assigned to that stage upon a value while other pipeline stages independently operate upon other values. When a value exits the pipeline, the function employed as the sum of the operations of each pipeline stage is complete. In a pipelined superscalar processor, instruction processing is usually accomplished in six stagesxe2x80x94fetch, decode, dispatch, execute, writeback and completion stages.
The fetch stage is primarily responsible for fetching instructions from the instruction cache and determining the address of the next instruction to be fetched. The decode stage generally handles all time-critical instruction decoding for instructions in the instruction buffer. The dispatch stage is responsible for non-time-critical decoding of instructions supplied by the decode stage and for determining which of the instructions can be dispatched in the current cycle. A typical RISC instruction set (for PowerPC(trademark)) contains three broad categories of instructions: branch instructions (including specific branching instructions, system calls and Condition Register logical instructions); fixed point instructions and floating point instructions. Each group is executed by an appropriate function unit.
The execute stage executes the instruction selected in the dispatch stage, which may come from the reservation stations or from instructions arriving from dispatch. The completion stage maintains the correct architectural machine state by considering instructions residing in the completion buffer and utilizes information about the status of instructions provided by the execute stage. The write back stage is used to write back any information from the rename buffers that is not written back by the completion stage.
All pipelined instructions pass through an issue stage sequentially, but enter different pipeline stages so instructions may be stalled or out of order for proper execution. Utilizing scoreboard controls is a technique for resolving register access conflicts in a pipelined computer. Each potential dependency is recorded as a single bit, set when a register source operand is decoded and another single bit set when a register destination operand is decoded. The use of a register for fetching an operand is stalled if that register is indicated as the destination for a decoded but not yet executed instruction.
Scoreboard controls are often implemented because there are registers which are not renamed that could potentially be written to out of order or read from before they had been properly updated by a write operation. Also, register renaming may not be appropriate because of the complexity of the renaming scheme and the physical cost in processor area and timing of the rename hardware. In a microcode expansion unit, which uses data from various scoreboarded registers (such as the Integer Exception Register (XER) or Special Purpose Registers (SPR)), utilizing scoreboard controls prior to or during action by a microcode expansion unit is undesirable. It is undesirable to implement such a mechanism due to the complexity and potential timing impact on critical path circuitry.
X-form string instructions, which utilize the string count field of the XER to determine how many bytes are to be loaded or stored, require the XER to determine the count of generating instructions from microcode (Ucode). The string count field of the XER is not renamed and the instruction sequence generated by the Ucode unit is many pipe stages earlier. Because of this, the Ucode unit and the Instruction Sequencer Unit (ISU) must determine that no Internal Operation (IOP) that may trigger the ISU""s XER scoreboard is in flight between the IDU and the ISU. Also, if the ISU""s XER scoreboard is active, the IDU must be stalled. The Ucode generation for the string instruction must wait until the correct XER value is sent to the IDU or the registers that have not been renamed could be potentially written to out-of-order. If scoreboard controls are used in a microcode expansion unit the timing impact on critical path circuitry is significant.
It would be desirable therefore, to improve performance of microcode implementation of string instructions requiring count data in a superscalar processor without utilizing scoreboard controls prior to or during microcode expansion unit operation.
It is therefore one object of the present invention to provide a method and apparatus such that proper ordering of register reads and writes is enforced.
It is another object of the present invention to provide a method and system that will utilize an existing scoreboard function to stall the pipeline until an XER count is confirmed valid.
It is yet another object of the present invention to provide a method and apparatus that will test the existing scoreboard and maintain separation between testing and executing an instruction.
The foregoing objects are achieved as is now described. A dummy instruction, xe2x80x9cmfXERxe2x80x9d (move from integer exception register), is issued. An instruction sequencer unit (ISU) detects the mfXER instruction and stalls the pipeline until the scoreboard indicates the XER count is valid. No Operationxe2x80x94Internal Operations (NOPxe2x80x94IOPs) are inserted between write and read SPR IOPs to allow an ISU scoreboard mechanism to be activated before being tested by the read SPR IOP. A dummy read of the string count field or a predetermined scoreboarded SPR, is employed to read from a scoreboarded SPR. A predetermined number of dummy IOPs follow the initial dummy read to prevent the broadcast value of the string count field from being sampled. Further, a non-functional or xe2x80x9creserve from normal usexe2x80x9d SPR, which may be written to and then read from, will implement the same function.
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.