In high-performance, superscalar microprocessors, a decoded instruction cache is used to improve performance. This type of instruction cache improves the bandwidth, throughput, and latency of “fetch” and “decode” portions of microprocessors by quickly sending packets of decoded macro-instructions (called micro-operations) into the core of the microprocessor. At the end of the pipeline that fetches and decodes macro instructions, the micro-operations are typically assembled into packets and written into a decoded cache on their way into an allocation pipeline.
Since branch prediction is a critical element to microprocessor performance, the use of a decoded instruction cache typically requires the construction of a branch prediction mechanism capable of interfacing with the decoded nature of the cache. This is especially complex in x86 microprocessors, developed by Intel Corporation of Santa Clara, Calif., due to the variable length of the macroinstructions, and the fact that the complex instruction set nature of each macroinstruction usually causes a variable number of micro-operations to represent it.
Due to aggressive pipelining and the need to provide quick predictions, a branch predictor used in such a machine could be required to provide branch predictions and act upon those predictions without being able to verify that the prediction it is making is really meant for the cache line being fetched. The prediction being made may have been meant for an older cache line mapped to the same position.
Typically, these problems arise due to replacing lines in the decoded instruction cache that had active branch predictions. These prediction entries become stale once the lines they were meant to predict for are removed. This type of invalid control speculation has serious performance implications and a mechanism is required to prevent it from happening too often. Current mechanisms that deal with the removal of stale prediction information do so at the end of a microprocessor pipeline based on post-retirement information. Removing stale prediction information at this stage may be unreliable in some instances.