1. Technical Field
The present invention relates in general to improved data processing systems and in particular to improvements in instruction dispatch efficiency in a data processing system. Still more particularly, the present invention relates to a method and system for increased instruction dispatch efficiency in a superscalar processor system.
2. Description of the Related Art
Designers of modern state-of-the-art data processing systems are continually attempting to enhance the performance aspects of such systems. One technique for enhancing data processing system efficiency is the achievement of short cycle times and a low Cycles-Per-Instruction (CPI) ratio. An excellent example of the application of these techniques to an enhanced data processing system is the International Business Machines Corporation RISC System/6000 (RS/6000) computer. The RS/6000 system is designed to perform well in numerically intensive engineering and scientific applications as well as in multi-user, commercial environments. The RS/6000 processor employs a superscalar implementation, which means that multiple instructions are issued and executed simultaneously.
The simultaneous issuance and execution of multiple instructions requires independent functional units that can execute concurrently with a high instruction bandwidth. The RS/6000 system achieves this by utilizing separate branch, fixed point and floating point processing units which are pipelined in nature. In such systems a significant pipeline delay penalty may result from the execution of conditional branch instructions. Conditional branch instructions are instructions which dictate the taking of a specified conditional branch within a application in response to a selected outcome of the processing of one or more other instructions. Thus, by the time a conditional branch instruction propagates through a pipeline queue to an execution position within the queue, it will have been necessary to load instructions into the queue behind the conditional branch instruction prior to resolving the conditional branch in order to avoid run-time delays.
Another source of delays within superscalar processor systems is the fact that such systems typically execute multiple tasks simultaneously. Each of these multiple tasks typically has a effective or virtual address space which is utilized for execution of that task. Locations within such a effective or virtual address space include addresses which "map" to a real address within system memory. It is not uncommon for a single space within real memory to map to multiple effective or virtual memory addresses within a multiscalar processor system. The utilization of effective or virtual addresses by each of the multiple tasks creates additional delays within a multiscalar processor system due to the necessity of translating these addresses into real addresses within system memory, so that the appropriate instruction or data may be retrieved from memory and placed within an instruction queue for dispatching to one of the multiple independent functional units which make up the multiscalar processor system.
In modern superscalar processors groups of instructions are often dispatched from the instruction buffer in a priority order as execution units are available to process those instructions. Often the instructions at the beginning of an instruction buffer are dispatched and the instructions within the remainder of that group remain in the buffer for several cycles waiting for execution units or other resources. Additionally, there may be available execution units of a type not required for the remaining instructions. It should thus be apparent that instruction dispatch efficiency may be increased if a method and system were available for shifting instructions within an instruction buffer in an application specified sequential order, such that additional instructions may be placed within the buffer for dispatch to execution units.