1. Field of the Invention
The present invention relates to pipelined data processing systems. In particular, the present invention provides an architecture for optimizing the flow of data and instructions in a pipelined microprocessor.
2. Description of Related Art
High-speed data processing systems are often implemented using a pipeline architecture in order to support high processing rates. Instruction processing is overlapped with a different instruction in each of a plurality of pipeline stages. Each stage is adapted for completion within one instruction cycle, so that the results from execution of one instruction is posted every cycle when the pipeline is full. By maintaining a full pipeline, the execution rate of the processor will approach one instruction per cycle.
Considerable development has been directed to reducing the cycle time for pipelined data processing systems. The performance of the pipelined processor, therefore, is becoming more and more dependent on the flow of instructions and data into the system to keep the pipeline full. An instruction and the operand data upon which it will execute, must be provided to the processing unit in time with the pipeline in order to keep the pipeline full. Because of the short cycle time of advanced processors, the time it takes to fetch instructions or operand data from external storage media is often longer than one cycle. Therefore, the architecture of the microprocessor must be optimized to maintain this flow of instructions and data at a high rate to minimize stalls in the pipeline due to fetches and stores.
The flow of instructions in prior art systems is maintained by providing storage devices that are able to generate a sequence of instructions automatically for supply to the processor. This relieves the processor of the burden of generating an address for each instruction to be processed, thereby allowing the supply of instructions to the pipeline to be carried out as a background task that does not burden the pipeline. However, when programs branch to other sequences of instructions, the processor must communicate with a sequential transfer device and provide a new starting address. Thus, branches in the instruction stream may cause instances in which the pipeline of the processor will be stalled, waiting until the sequential transfer device can start up a new flow of instructions.
The flow of operand data in to the processor is another source for pipeline stalls. When data required for execution of a given instruction is stored in a device external to the processor, the fetch operation can take several cycles. Prior art systems have provided a register file in the execution unit of a data processing system in which operand data can be stored. However, the contents of the register file are required to be changed from process to process. Therefore, the time involved in swapping the contents of the register file when control of the pipeline changes from one process to another can cause degradation of performance due to sequences of external fetches.
In addition, the ability to keep a pipeline full in a given data processing system is influenced by contention for buses used to transfer instructions, data and addresses between the processor and external storage devices. Various bus architectures exist. For instance, a separate bus could be provided for each path. For single chip processors, however, the number of input/output pins required for separate buses is excessive. Further, the interfaces required for all of the buses would be wasteful of chip space. Accordingly, in microprocessors implemented on a single chip, a variety of bus-sharing architectures is used, all of which cause performance degradation in the processor because of competing uses
An architecture that optimizes the flow of data and instructions into a pipelined processing unit and minimizes the amount of external fetches required to supply instructions and operand data to the processing unit, is desirable However, a combination of processor features which optimizes the flow of instructions and data must be viewed as a total system. A particular feature which, considered alone, increases one aspect of processor performance, may actually decrease the performance of the total system, because of the burden which it places elsewhere in the system.