Computer processors contain arithmetic, logic, and control circuitry that interpret and execute instructions from a computer program. In the pursuit of improving processor performance, designers have sought two main goals: making operations faster and executing more operations in parallel. Making operations faster can be approached in several ways. For example, transistors can be made to switch faster and thus propagate signals faster by improving semiconductor processes; execution-unit latency can be reduced by increasing the number of transistors in the design; and the levels of logic required by the design to implement a given function can be minimized to increase speed. To execute more operations in parallel, designers mainly rely on one, or a combination of pipelining and superscalar techniques. Pipelined processors overlap instructions in time on common execution resources. Superscalar processors overlap instructions in space on separate resources.
Pipeline stalls are a main performance inhibitor with regard to parallel processing. Stalls arise from data dependencies, changes in program flow, and hardware resource conflicts. At times, pipeline stalls can be avoided by rearranging the order of execution for a set of instructions. Compilers can be used to statically reschedule instructions, however, incomplete knowledge of run-time information reduces the effectiveness of static rescheduling. In-order processors, i.e., processors that issue, execute, complete, and retire instructions in strict program order, have to rely entirely on static rescheduling and thus are prone to pipeline stalls.
As a result, designers use out-of-order processors and seek to implement dynamic instruction rescheduling. The simplest out-of-order processors issue instructions in order but allow them to execute and complete out of order. Even these simple out-of-order processors require complex hardware to reorder results before the corresponding instructions are retired. A strict result order is not required from a data-flow perspective, however, such ordering is necessary to maintain precise exceptions and to recover from mispredicted speculative execution.
A well-known method of reordering is through the use of a reorder buffer, i.e., a buffer that maintains results until written to the register file in program order. Designers also use other types of reordering hardware, such as history buffers and future files. History buffers record source-operand history so the processor can backtrack to a precise architectural state and future files store the current state and the architectural state in separate register files allowing the processor to be restored to a precise check-point state.
Branch prediction and speculative execution are additional techniques used to reduce pipeline stalls. In a pipelined processor, the outcomes of conditional branches are often determined after fetching subsequent instructions. Thus, if the correct direction of the unresolved branch can be predicted, the instruction queue can be kept full of instructions that have a high probability of being used. In some processors, instructions are actually executed speculatively beyond unresolved conditional branches. This technique completely avoids pipeline stalls when the branch proceeds in the predicted direction. On the other hand, if the branch direction is mispredicted, the pipeline must be flushed, instruction fetch redirected, and the pipeline refilled.
It is also important to effectively process and handle instruction exceptions, i.e., an event that suspends normal processing for a given instruction. When an instruction exception is encountered, the flow of control is temporarily diverted through a trap handler. A trap handler is a routine that investigates the cause of the exception and completes any processes necessary to discharge the exception. Generally, processors store certain information required by trap handlers. This information includes the current state of running programs, identification of the source of the exception, etc.
Instruction exceptions are fairly rare, however, if encountered, preserving the state of the processor precisely as it was before the instruction executed is extremely useful. Such precise exceptions allow for easier diagnosis of exceptions by trap handlers. However, achieving precise exceptions without slowing down the common case where no exceptions are encountered is a difficult task. This is particularly true in an out-of-order processor where an instruction generating an exception may be issued and executed when the instruction is very young, i.e., fetched more recently than other given instructions.
Referring to FIG. 1, a typical computer system includes a microprocessor (10) having, among other things, a CPU (12), a load/store unit (14), and an on-board cache memory (16). The microprocessor (10) is connected external cache memory (17) and a main memory (18) that both hold data and program instructions to be executed by the microprocessor (10). Internally, the execution of program instructions is carried out by the CPU (12). Data needed by the CPU (12) to carry out an instruction are fetched by the load/store unit (14) and loaded into internal registers (15) of the CPU (12). A memory queue (not shown) maintains a list of outstanding memory requests. The load/store unit adds requests into the memory queue and also loads registers with values from the memory queue. Upon command from the CPU (12), the load/store unit (14) searches for the data first in the fast on-board cache memory (16), then in external cache memory (17), and finally in the slow main memory (18). Finding the data in the cache memory is referred to as a xe2x80x9chit.xe2x80x9d Not finding the data in the cache memory is referred to as a xe2x80x9cmiss.xe2x80x9d
In one aspect, a method of handling an exception in a processor comprises setting a state upon detection of an exception, signaling a trap for the exception if the state is set, and based on a class of the exception, processing the exception differently before signaling the trap. The method may comprise replaying an instruction causing the exception before signaling the trap for the exception based on the class of the exception. The method may comprise replaying the instruction causing the exception after the instruction causing the exception becomes an oldest, unretired instruction. The method may comprise signaling the trap for the exception after an instruction causing the exception becomes an oldest, unretired instruction. The method may comprise marking an instruction causing the exception as complete without issuing the instruction causing the exception.
In one aspect, an apparatus for handling exceptions in a processor comprises an instruction scheduler for setting a state upon detection of an exception and signaling a trap for the exception if the state is set. The instruction scheduler, based on a class of the exception, processes the exception differently before signaling the trap. The instruction scheduler based on the class of the exception may replay an instruction causing the exception before signaling the trap for the exception. The instruction scheduler may replay an instruction causing the exception after the instruction causing the exception becomes an oldest, unretired instruction. The instruction scheduler may signal the trap for the exception after an instruction causing the exception becomes an oldest, unretired instruction. The instruction scheduler may mark an instruction causing the exception as complete without issuing the instruction causing the exception.
In one aspect, an apparatus for handling exceptions in a processor comprises means for setting a state upon detection of an exception, means for signaling a trap for the exception when the state is set, and means for processing the exception differently based on a class of the exception before the signaling of the trap.
Other aspects and advantages of the invention will be apparent from the following description and the appended claims.