Processors used in digital computer systems process data in accordance with a series of instructions which comprise an instruction stream. Typically, each instruction will contain one or more elements, which can include, for example, an operation code that identifies the particular operation to be performed, one or more source operand specifier(s) which identify storage locations in the respective digital computer system which contain the data, or "operands," on which the operation is to be performed, and a destination operand specifier which identifies a storage location in the system in which the processed data is to be stored. Some types of instructions may only comprise operation codes, particularly if they are provided to control the sequence of operations in the instruction stream, while other types of instructions also include the source and destination operand specifiers.
Typically a processor performs a series of general phases in connection with processing of an individual instruction including:
(i) decoding the instruction's operation code to determine the type of operation to be performed, and coincidentally to identify the number of operands, if any; PA1 (ii) if the instruction requires operands, retrieving the operands from the storage locations identified by the instruction; PA1 (iii) performing the operation required by the instruction; and PA1 (iv) storing the result in the storage location identified by the instruction. PA1 (a) performing the operation required by instruction (s+1) in connection with its operands (phase (iii) above), PA1 (b) retrieving the operands required by instruction (s+2) (phase (ii) above), and PA1 (c) decoding the operation code of instruction (s+3) (phase (i) above). PA1 (i) assignment of instructions based on an explicit encoding of a trap barrier value which is contained in the respective instructions; PA1 (ii) assignment of instructions based on selected resource(s) of the processor which is or are used in their execution, and PA1 (iii) assignment of instructions based on where they are located in the instruction stream in relation to the series of partial trap barrier instructions in the instruction stream.
A processor could perform each of the above-identified phases in series for each successive instruction in the instruction stream in series, with very little if any overlap. In that case, for example, the processor would begin decoding the operation code of instruction (s+1) of the instruction stream (phase (i) above) after the result generated for instruction (s) of the instruction stream (phase (iv) above) has been stored.
Each of the phases (i) through (iv) can generally be performed by different circuit elements in a processor, and so so-called "pipelined" processors were developed in which each of the phases could be performed concurrently with the other phases, but for different instructions in the instruction stream. See, for example, Peter M. Kogge, The Architecture Of Pipelined Computers (McGraw-Hill Book Company, 1981) (hereinafter, "Kogge"). A pipelined processor may execute successive instructions in an instruction stream, in successive phases (i) through (iv), such that, while, for example, the processor is storing the result of (phase (iv) above) of instruction(s), it will concurrently be
It will be appreciated that, if a sequence of four instructions(s) through (s+3) in an instruction stream can be processed concurrently in this manner, the instructions can be processed in seven time steps rather than the sixteen time steps that would be necessary in a non-pipelined processor. In many cases, sequences of instructions from an instruction stream can be executed in this manner, which can lead to substantially reduced processing time and increased through-put. As described in Kogge, processors having a variety of other, more complex pipelines have been developed.
Problems can arise, however, in connection with pipelined execution of instructions. During processing, unusual conditions variously known as "faults," "traps" or "exceptions" (generally "exceptions") may be encountered which need to be handled. A number of types of exceptions may arise during processing. The specific types of exceptions are generally determined by the particular architecture which defines the operation of a particular type of processor; the types of exceptions that are provided for one particular type of microprocessor, which is constructed in accordance with the SPARC Version 9 architecture, is described in the SPARC International, Inc [David L. Weaver and Tom Germond (eds)], The SPARC Architecture Manual Version 9 (Prentice-Hall, 1994) (hereinafter referred to as "the SPARC Architecture Manual, Version 9"), chapter 7.
When a processor detects an exception in connection with processing of an instruction, it calls an exception handler to process the exception, that is, to perform selected operations as required by the exception condition. Two general methodologies have been developed for handling exception conditions. In one methodology, which is representative of computers whose processors follow a "precise" exception handling model, if an exception condition is detected during processing of an instruction, the exception handler is invoked immediately following operations performed for the instruction. On the other hand, in a second methodology, which is representative of processors whose architectures specify a "delayed" exception handling model, if an exception is detected during processing of an instruction, the exception handler is not invoked until some point after the processor has sequenced to processing an instruction after the instruction for which the exception was indicated. Some architectures make use of both the precise and delayed exception handling models for different types of exceptions.
In both methodologies, the processor will generally need to retain certain exception status information, perhaps for some time, after the instruction for which an exception condition is detected so that, when the exception handler is invoked, it has the information which it needs to process the exception. One benefit of the handling exception conditions according to the precise exception handling model is that, since the exception handler is processed immediately after the processing of the instruction which gave rise to the exception condition, the exception status information needed by the exception handler will be directly available and need not be saved beyond processing for the program instruction which gave rise to the exception condition.
With exceptions handled according to the delayed exception handling model, however, the processor will generally need to ensure that certain exception status information be retained, perhaps for some time, after the floating point instruction for which an exception condition is detected so that, if the exception handler is eventually invoked, it has the information which it needs to process the exception. The instruction which causes an exception condition which is handled according to the delayed exception handling model casts a "trap shadow" over subsequent instructions in the instruction stream, the trap shadow representing the time following the execution phase of an instruction, to the point at which if an exception condition would have occurred in connection with processing of the instruction, the exception condition would actually have occurred. Otherwise stated, if an exception condition is detected in connection processing of an instruction, the exception condition will have been detected by the end of the trap shadow. Different types of instructions can have trap shadows of different lengths, but generally the length of an instruction's trap shadow can be predicted a fortiori based on the type of processing operation to be performed while the instruction is being executed.
To accommodate pipelining in such an environment, the programmer (in the case of a program written in assembly language) or compiler (in the case of a program written in a high-level language), will need to ensure that instructions in the instruction stream which follow an instruction which may cause a trap will not use the same resources of the processor as those which are used to retain the exception status information. Since the actual number of instructions which may fall into the instruction shadow can be difficult to predict, some architectures provide for a particular instruction, called a "trap barrier" instruction, which the assembly language programmer (if the program is being written in the assembly language of the particular processor on which the program is to be processed) or the compiler (if the program is being written in a high-level language) can insert into the instruction stream at some point after respective instructions which can result in exception conditions which can serve to cut off trap shadows. In particular, if execution of an instruction(s) can result in a exceptional condition, the trap barrier instruction can be inserted into the instruction stream at instruction (s+b) (b=1, 2, . . .). The trap barrier instruction effectively stalls processing of instructions in the pipeline after the trap barrier instruction until the processor has completed processing of the instructions preceding the trap barrier instruction, at least to a point at which it (that is, the processor) can determine whether an exceptional condition can be detected in connection with processing of instructions preceding the trap barrier instruction, that is, until the end of the trap shadows of all of the instructions preceding the trap barrier instruction in the pipeline. When trap barrier instruction is provided in an instruction stream, it essentially cuts off trap shadows for all of the instructions preceding it in the instruction stream. However, since each trap barrier instruction in the instruction stream also "stalls" the pipeline whether or not an exception condition is detected, they can result in a significant decrease in the processing performance that may otherwise be provided by the pipelining of the processor.