In powerful digital data processors, multiple, substantially autonomous data units are commonly employed to allow concurrent or "overlapped" execution of instructions, wherein a different instruction may be simultaneously executing in each of the data units. In simple data units, each instruction may take only a single machine cycle to execute, so that a new instruction may be "issued" to the data unit every machine cycle. In complex data units, several machine cycles may be required to execute an instruction, so that a subsequent instruction for that data unit cannot be issued until the data unit has completed the last instruction issued to it. To minimize the likelihood of such "stalls", the complex data unit may be constructed as a series of relatively independent "stages" which together comprise a "pipeline", such that a different instruction can be concurrently executing at each stage of the pipeline. Both techniques, overlapping and pipelining, allow greatly increased performance. However, having paid the price in hardware to enable the higher performance level, considerable additional investment must be made in sophisticated optimizing compilers and scheduling hardware to fully realize the potential performance levels.
In general, even multi-stage pipelined data units have only a single configuration control register for controlling the operating characteristics for all of the stages of the pipeline. As a result, if a particular instruction requires a different configuration or operating characteristic in one or more of the several pipeline stages than another instruction that is "close" in the instruction stream, then the later instruction in the stream cannot be safely issued until execution of the earlier instruction has been completed. For example, in a floating point type of data unit, it is not at all unusual for consecutive instructions in an instruction stream to need different rounding modes, and the resultant stall reduces the throughput of the data unit.
In some prior art processors, such as the TI ASC, the configuration of the pipeline itself, that is, the number and ordering of the stages used to execute a particular instruction, could be dynamically reconfigured based upon the opcode of that instruction. However, such operating characteristics as rounding mode were either fixed for all instructions or controlled by a single control register.
It would be desirable to provide a mechanism whereby the control information would flow down the pipeline together with the instruction for which that control information is required, so that instructions executing consecutively in the data unit can each have unique control information.