1. Field of the Invention
The present invention pertains to the field of integrated circuits. More particularly, this invention relates to accelerating instruction restart in a microprocessor.
2. Background of the Related Art
In a micro-coded microprocessor, a set of macro-instruction bytes are decomposed into a sequence of micro-operations by an instruction decoder. These micro-operations flow through the processor and are eventually executed and retired. As these micro-operations flow through the machine and are executed, it may be found that a micro-operation may require special handling, separate from the normal micro-code flow. This event is referred to as an exception. The exception causes the processor to stall the normal micro-operation flow, and causes the processor to execute a micro-coded exception handler, which will attempt to cure the situation that resulted in the exception. If the micro-coded exception handler is able to correct the situation, the macro-instruction must be restarted from the micro-operation following the excepting micro-operation. This situation is called instruction restart. In order for the processor to perform this task, the original instruction bytes must be presented to the instruction decoder. Thus, the microprocessor must maintain the instruction bytes for any macro-instruction alive in the machine.
Pipelined microprocessors pose particular challenges in the area of instruction restart. In a pipelined microprocessor, instruction processing functions such as fetch, decode, and execute are performed simultaneously. Each instruction executed by the microprocessor flows through the pipeline sequentially, for example from the fetch stage to the decode stage, then to the execute stage. This allows the microprocessor to execute several instructions at a time, with each instruction at a different stage in the pipeline. Also, microprocessors may include more than a single pipeline to further improve throughput. Greater instruction throughput can be achieved by increasing the depth of the pipeline, which means increasing the number of stages an instruction must flow through before being retired. This allows more instructions to be processed at one time. As the number of stages increases, each stage is made more simple. Since each of the stages is less complex, each stage requires less time to complete, and the microprocessor's clock speed can be increased. This increases instruction throughput.
The challenges presented by deeply pipelined microprocessors in performing instruction restart include guaranteeing the maintenance of the required instructions and minimizing performance penalties. Performance penalties result from the need to flush the pipeline and re-fetch the instructions needed for instruction restart whenever an exception occurs. In a pipelined microprocessor, whenever an exception occurs the pipeline must be flushed before the instruction restart can begin. In deeply pipelined processors this results in a significant performance penalty.
A deeply pipelined microprocessor must maintain many instructions for instruction restart purposes. Instruction caches generally do not guarantee that the instructions will be maintained due to replacement of cache lines, possible invalidation, and self-modifying code. Self-modifying code may allow the modification of instructions contained in a cache. Instruction restart requires the original instructions to be maintained. Methods for maintaining the required instructions include the use of prefecth buffers and specialized caches. These buffers and caches are positions in the pipeline before the fetch stage. The prefecth buffer is used in microprocessors with very short pipelines, resulting in very few instructions alive in the processor at any given time. Thus, little or no special handling is required to ensure the maintenance of the required insturctions. For microprocessors with moderate pipeline depth, specialized caches are used. Replacement of lines in these caches is maintained by the microarchitecture so that any instructions alive in the machine are guaranteed to be in the cache.
These methods, in addition to being increasingly difficult to implement as pipeline depthes increase, also share the disadvantage of imposing undesirable preformance penalties due to the need re-fetch instructions from either the pre-fetch queue or the specialized cache.