1. Technical Field
The present invention relates in general to data processing systems and in particular to a mechanism for fetching instructions within a data processing system. Still more particularly, the present invention relates to a method and apparatus for accelerating instruction fetching for a microprocessor in which both in-line and target instructions are provided in an efficient manner.
2. Description of the Related Art
A conventional processor may include an instruction fetch unit (IFU) for requesting instructions to be loaded, an instruction cache for storing instructions, an instruction buffer for temporarily storing instructions fetched from the instruction cache for execution, a number of execution units for executing sequential instructions, a branch processing unit (BPU) for executing branch instructions, a dispatch unit for dispatching sequential instructions from the instruction buffer to particular execution units, and a completion buffer for temporarily storing instructions that have finished execution, but have not been completed.
Conventional methods of fetching instructions using the above processor components are known in the art. Typically, these methods often result in cycle delays due to incorrectly fetched in-line instructions or incorrect predictions during branch processing. For example, during branch processing, if the branch that was predicted as taken is resolved as mis-predicted, a mis-predict penalty of several cycles is incurred by the processor due to the cycle time required to restore the sequential execution stream following the branch instructions (i.e., the processor has to abandon any result that the speculative instruction produced and begin executing the path that should have been taken). Another example occurs during in-line fetching, when an instruction is emulated (e.g., when a different set of code is utilized to perform the same function as the instruction) and/or an instruction fetch is temporarily halted. The IFU has to generate a return stack which takes a relatively long time to store values and read values prior to and after emulation.
Conventional instruction caches are typically large static blocks of temporarily stored instructions. The cache typically has very little functional logic and is designed solely to issue instructions out of a cache line once a corresponding line address is provided. The cache plays no part in the actual selection of instructions, branch processing, fetching branch targets, etc.
The present invention recognizes that it would be desirable and beneficial to have a system or processor architecture for effectively handling the fetching of instructions for a high-frequency, short pipeline processor. A system which also supported branch/target fetching without leading to process stalls or loss of cycles when target direction is incorrectly predicted would be a welcomed improvement. It would be further be desirable to have such a system which also permitted for fast restart of a process after instruction emulation. These and other benefits are presented in the invention described herein.
An instruction fetching system (and/or architecture) is disclosed, which may be utilized by a high-frequency short-pipeline microprocessor, for efficient fetching of both in-line and target instructions. The system contains an instruction fetching unit (IFU), having a control logic and associated components for controlling a specially designed instruction cache (I-cache). The I-cache is a sum-address cache, i.e., it receives two address inputs, which compiled by a decoder to provide the address of the line of instructions desired fetch. The I-cache is designed with an array of cache lines that can contain 32 instructions, and three buffers that each have a capacity of 32 instructions. The three buffers include a Predicted (PRED) buffer that holds the instructions which are currently being executed, a NEXT buffer that holds the instructions which are to be executed after the instructions in the PRED buffer, and an ALT buffer that holds the alternate set of instructions when a branch is predicted taken/not taken and is utilized along with the PRED buffer to permit branch target retrieval within I-cache prior to a prediction.
When a branch is encountered, instruction lines of both paths (taken/not taken) are sent to the instruction buffers and no stall occurs on the pipeline if the prediction is correct.
Address registers hold the line addresses of the instructions in each of the PRED. NEXT and ALT buffers and a 5-bit address corresponding to the specific instruction within the line to be sent to the instruction buffers. Up to four instructions are selected at a time from the instruction line in the PRED, ALT or NEXT buffer and sent to the instruction buffers. These instructions are typically selected sequentially (in-line). Use of the IFU and the I-cache provides a continuous supply of instructions to the processor pipeline without interruptions. Also, the invention supports efficient restart after a bad instruction is encountered or an instruction is emulated
The above as well as additional objects, features, and advantages of the present invention will become apparent in the following detailed written description.