Microprocessors are classified as superscalar if the microprocessor (“processor”) is capable of completing multiple instructions per clock cycle. The architecture of a superscalar processor utilizes multiple parallel processing units within the processor to allow completion of more multiple instructions per clock cycle. These processing units generally include multiple execution units operating in parallel, a dispatch unit for sending instructions and data to the execution units, and rename buffers (rename registers) for preloading instructions for the execution units. These processing units may further include a completion unit containing a (“completion table”) for tracking and retiring the instructions. For example, the completion unit may keep track of when instructions have been “completed”. An instruction may be said to be “completed” when it has been executed and is at a stage where any exception will not cause the reissuance of this instruction.
In a typical superscalar processor, multiple instructions are retrieved from an instruction cache and placed in a queue, commonly referred to as an instruction queue. After entering the instruction queue, instructions are issued to various execution units by the dispatch unit. Upon executing the received instructions, the execution units may transmit an indication to the completion unit indicating the execution of the received instruction. This information may be stored in the completion table. The completion unit then completes, or retires, the instruction and sends a completion signal to the remaining execution units, allowing write-back of finished data into architected registers.
The size of the completion table corresponds to the number of outstanding instructions to be tracked. For example, the greater the number of outstanding instructions to be tracked, the greater the size of the completion table. However, the greater the size of the completion table, the more power is consumed and the greater amount of silicon area is used. While a smaller completion table reduces the silicon area and power, fewer outstanding instructions are tracked which may reduce performance.
Therefore, there is a need in the art for a completion table to track a larger number of outstanding instructions without increasing its size.