1. Field of the Invention
The present invention relates to the field of electronics. Particularly, the invention relates to a circuit and method for handling a BTB hardware conflict within a deeply pipelined electronic system without inducing a stall.
2. Description of Related Art
Early microprocessors generally processed instructions in a serial manner. Each instruction was processed in four (4) sequential stages: instruction fetch, instruction decode, instruction execute, and result writeback (retire). Within these microprocessors, different dedicated logic blocks were implemented to support each processing stage. Thus, before beginning its operation, a logic block was required to wait until any previous logic block(s) completed its (their) operations.
To improve efficiency, more recent microprocessors (referred to as "pipelined microprocessors") have been designed to operate on several instructions simultaneously by overlapping the operations performed in the fetch, decode, execute, and retire stages. More specifically, during each processing stage, a pipelined microprocessor concurrently processes different instructions. At the beginning of each clock cycle, the result of each processing stage is passed to the next processing stage. One type of pipelined microprocessor, referred to as a "deeply pipelined" microprocessor, further divides each selected processing stage into substages for additional performance improvement.
In order for a deeply pipelined microprocessor to operate efficiently, an instruction fetch unit (IFU) is situated at the front of an instruction pipeline in order to continually provide the pipeline with a stream of instructions, namely macro-instructions. However, a branch instruction within an instruction stream prevents the IFU from fetching subsequent instructions until the branch is fully resolved. A "branch instruction" is any instruction disrupting normal, sequential program flow such as, for example, a conditional JUMP, an unconditional JUMP, a CALL instruction or a RETURN instruction. In pipelined microprocessors, the branch cannot be fully resolved until the branch instruction reaches the execution stage near the end of the pipeline. As a consequence, the IFU usually will be temporarily stalled by not fetching any more instructions because the unresolved branch condition prevents the IFU from knowing which instruction(s) to fetch next.
To alleviate this problem, many pipelined microprocessors use branch prediction mechanisms to predict the existence and the outcome of branch instructions within an instruction stream. One type of branch prediction mechanism is a branch target buffer (BTB) circuit which receives an instruction pointer (IP) from the IFU every clock cycle and accesses information within a memory unit of the BTB circuit for predicting code flow. Typically, the memory unit (referred to as a "BTB cache") is a random access memory (RAM) having a single read/write port and several read-only and/or write-only ports. The BTB cache contains historical information regarding IPs that have already been identified as branch instructions and that are again being executed by a pipelined microprocessor. A single read/write port architecture is preferred for cost reasons and reduced die area requirements.
By accumulating historical information (e.g., branch type, IP, target IP, etc.) associated with previously executed branch instructions, the BTB circuit is able to better predict whether an incoming branch should be "not taken" (e.g., follow sequential address retrieval) or "taken" (e.g., follow the instruction fetch path created through prediction). The action of "taking" an incoming branch involves "resteering" the IFU to start fetching instructions starting at the targeted IP. The benefit of the BTB circuit is to improve performance by providing advance information to the IFU, instead of the IFU waiting for an instruction decode unit (IDU) to decode the instruction.
Due to the preferred BTB cache architecture, a BTB hardware conflict may occur when a cache read occurs concurrently with a particular type of cache write referred to as an allocation. An "allocation" is a condition where information associated with a new branch instruction is written into a newly created entry of the BTB cache. The cache read is used to obtain information as to whether the current instruction is a known branch instruction.
If a BTB hardware conflict is detected, two conventional solutions have usually been used. The first conventional solution involves implementing a dual read/write port RAM as the BTB cache. However, this would be a costly solution because the BTB cache would occupy a greater percentage of die area. The second conventional solution involves inducing a stall. However, a stall requires a great amount of complexity to avoid pipeline slips (i.e., loss of synchronization between multiple pipelines) and to accurately stall logic units substantially downstream from the IFU. Both of these conventional solutions are inadequate respective to cost and performance issues.