This invention relates to the field of microprocessors fabricated on an integrated circuit or chip. More specifically, the invention relates to methods and apparatus for improved instruction throughput in a high-performance processor.
Microprocessors are typically divided into functional blocks or stages through which instructions are propagated and processed. This allows for pipelining of instructions such that when one instruction has completed the first stage of processing and moves on to the second stage, a second instruction may begin the first stage. Thus, even where each instruction requires a number of clock cycles to complete all stages of processing, pipelining provides for the completion of instructions on every clock cycle. This single-cycle throughput of a pipelined processor greatly increases the overall performance of computer systems. Superscalar processors are capable of initiating more than one instruction at the initial stage of the pipeline per clock cycle. Frequently, more than one instruction completes on each given clock cycle of the machine.
Many modem processors employ a separate instruction cache for storing instructions to be executed by the program or code sequence running on the computer system. Usually, a fast, local instruction cache memory (L0), which is incorporated on the same integrated circuit as the processor itself, is utilized for this purpose. In many cases, a processor includes an instruction fetch unit that is responsible for deciding which instruction cache entry ought to be accessed next to maximize program performance. To operate efficiently, the instruction fetch unit should provide a continual stream of instructions from the instruction cache memory to the pipeline, where they eventually get dispersed to the processor""s execution core.
Difficulties arise in computer systems that attempt to take advantage of the parallelism present in a program by executing instructions based on data dependencies and resource availability. These types of machines are referred to as xe2x80x9cout-of-orderxe2x80x9d computing machines. The term xe2x80x9cout-of-orderxe2x80x9d means not necessarily executed in the same sequence implied by the source program. Moreover, there exists a further problem in keeping track of pending instruction fetch requests from in the face of mispredicted branches. In some instances, instructions are fetched speculatively, based on a predicted program execution path. These machines place enormous performance demands on the fetch logic circuitry of the processor.
The present invention is useful in optimizing the speculative fetching engine of a high-performance processor and advantageously maximizes the supply of instructions to the processor""s execution core. In one embodiment, the invention comprises an instruction cache that stores a cache line of instructions and an execution engine for executing the instructions. A buffer is provided to store a plurality of entries. A first logic circuit divides the cache line into instruction bundles, each of which gets written into an entry of the buffer. A second logic circuit reads out a number of consecutive instruction bundles from the buffer for dispersal to the execution engine.