A fundamental property of a superscalar pipeline is the execution of multiple instructions per clock cycle. In essence, the superscalar pipeline allows for various processing components (if not all) in the microprocessor to be used during a single clock cycle. Superscalar techniques based on extracting instruction-level parallelism (ILP) have been a major contribution to high performance microprocessor design throughout the last two decades. The number of instructions executed per cycle (IPC) has increased substantially through superscalar techniques like speculative execution and dynamic scheduling.
To allow for the execution of multiple instructions, the out-of-order superscalar pipeline includes an instruction issue circuit. The instruction issue circuit includes an instruction queue that stores instructions awaiting execution. The maximum number of instructions that can be held in the instruction queue is generally referred to as a window size of the instruction queue. The number of instructions issued for execution by the instruction issue circuit is referred to as the issue width of the instruction issue circuit.
Unfortunately, increasing the window size and the issue width leads to a quadratic increase in the delay through the instruction issue circuit. To increase the window size and issue width, some tree-based schemes have attempted to distribute instructions into FIFO buffers so that only instructions at the head of the FIFO buffers are issued. Oldest-first selection gives an IPC benefit of up to 8% over a random position based scheme and provides better instruction sequencing. However, the steering logic for these tree-based schemes is immensely complex. There have been attempts to reduce this complexity. For example, a dynamic request-grant arbitration scheme has been proposed using an instruction queue compaction scheme that preserves the temporal order of the instructions within the instruction queue. However, the dynamic request-grant arbitration scheme requires a multitude of serial operations thereby resulting in increased delay. Dynamic logic is used to compensate for the higher delay but comes at the cost of higher power consumption.
Therefore, what is needed is an instruction issue circuit that reduces serial operations while having a more simplified configuration.