1. Technical Field
The present invention generally relates to a superscalar processor in a data processing system and in particular to string operations within the processor. Still more particularly, the present invention relates to scoreboarding operations to a user-level register.
2. Description of the Related Art
Complex, high speed processors often utilize multiple reduced instruction set computer (RISC) processor cores which are generally characterized by high throughput of instructions. RISC processors have the ability to execute multiple instructions per clock cycle and are described as xe2x80x9csuperscalar.xe2x80x9d Superscalar processors, such as the PowerPC(trademark) family of processors available from IBM Corporation of Armonk, N.Y., provide simultaneous dispatch of multiple instructions. Included in the processor are an Instruction Cache (xe2x80x9cICxe2x80x9d), an Instruction Dispatch Unit (xe2x80x9cIDUxe2x80x9d), an Execution Unit (xe2x80x9cEUxe2x80x9d) and a Completion Unit (xe2x80x9cCUxe2x80x9d). A typical RISC instruction set (PowerPC(trademark)) contains three broad categories of instructions: branch instructions (including specific branching instructions, system calls and Condition Register logical instructions); fixed point instructions and floating point instructions. Each group is executed by an appropriate function unit. While all instructions pass through an issue stage in order, the instructions may enter the execution stage out of order. Scoreboarding is utilized to allow instructions to execute out of order and to maintain a preset instruction execution rate. The scoreboard also controls when an instruction can write its result to a destination register.
Generally, a superscalar, RISC processor is xe2x80x9cpipelined,xe2x80x9d meaning that a second instruction is waiting to enter the execution unit as soon as the previous instruction is finished. The processor includes a number of stages and an instruction is separated into components and operated on in each stage. In a typical first stage, instruction fetch, an instruction is fetched from memory. In a decode stage, the instruction is decoded into different control bits, which in general designate (1) a type of functional unit for performing the operation specified by the instruction, (2) source operands for the operation and (3) destinations for results of operations.
In a dispatch stage, the decoded instruction is dispatched per control bits to a unit having an execution stage or to an intervening reservation station which in turn issues the instruction to an associated execution stage (execution unit). The execution stage processes the operation as specified by the instruction by accepting one or more operands and producing one or more results in the order of available operands.
A completion stage maintains the correct architectural machine state by considering instructions residing in a completion buffer and utilizing information about the status of instructions provided by the execute stage. The completion stage deals with program issues that occur because of concurrently executed instructions that allow multiple instruction results to be loaded to a single register.
Some instructions, such as xe2x80x9cmove toxe2x80x9d and xe2x80x9cmove fromxe2x80x9d instructions and condition register instructions, require serializing to execute properly. Also, serialization is required for all load/store multiple/string instructions. These string instructions are generally broken into a sequence of register-aligned operations and the first operation is usually dispatched with any preceding instructions in the dispatch buffer. Subsequent operations are dispatched at the rate of one word per cycle until finished.
A microcode unit, which generates sequences of Internal Operations (IOPs) that emulate X-form strings (instructions that use the string count field of an Integer Exception Register (XER) to determine how many bytes are to be moved), requires that the Integer Exception Register (XER) be valid before generating an appropriate sequence of IOPs. The XER is a 32-bit, user-level register and indicates overflow and carries for integer operations and is also used to retain instruction string length for string operations.
There is no explicit scoreboard mechanism within the microcode unit and implementation of a true scoreboard would be costly in both timing of the rename hardware and physical space on the processor. A scoreboard""s function is to maintain a preset instruction rate per clock cycle and generally every instruction goes through the scoreboard, corresponding to instruction issue and replacing part of the instruction decode in the pipeline. It is undesirable to utilize scoreboard controls during action by the microcode unit due to the complexity and potential timing impact on critical path circuitry. Additionally, X-form string instructions have a built in delay for XER interlock and frequently there is no need for this delay because the XER string count is known.
It would be desirable therefore, to provide a scoreboard function that would allow an existing scoreboard to be utilized for scoreboarding an XER.
It is therefore one object of the present invention to provide a scoreboard function for operations relative to an integer exception register.
It is another object of the present invention to provide a method and apparatus that will allow an existing scoreboard function to stall a pipeline that is using microcode operations.
The foregoing objects are achieved as is now described. An XER scoreboard function is provided by utilizing the Instruction Sequencer Unit scoreboard. A scoreboard bit is generated and set if the XER is being used. If it is not being used, another instruction is fetched. If the XER is being used, a dummy read (mfXER) is generated to test the bit to determine if the XER is busy. Padding (dummy, or NOPs) IOPs are then issued and if the scoreboard bit is not set, the dummy XER Read will be executed and dispatch hold is not activated. After a padded X-form string has been executedxe2x80x94providing for a pipeline stallxe2x80x94the scoreboard bit is cleared.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.