1. Field of the Invention
This invention relates to computer processor operation, and more particularly to a method for optimizing the ability of a pipelined processor to respond to Direct Memory Access (DMA) interrupts. Described herein are means for reducing the time required for the processor to service a DMA request (or other exceptions or interrupts), without adversely impacting instruction flow in the processor""s pipeline.
2. Description of the Related Art
Although nominally a computational device, the central processing unit (CPU) in a computing system is typically charged with a variety of other tasks. In addition to strictly computational functions, the CPU may be required to handle input/output from peripheral devices, manage memory, etc. Many of these activities are driven by external events, which may occur randomly with respect to the sequence of operations being carried out by the CPU. It is important that these event-driven functions be performed expediently by the CPU, and with minimal disruption of its computational activities. Polling external inputs to detect whether the event in question has occurred is an obvious, but very inefficient, way of doing this. Polling refers to the option of simply adding instructions to the main program sequence of the CPU to periodically test all of the event-driven inputs. However, since polling diverts the CPU from its main computational task, it presents a dilemma. If polling is done too infrequently, latency in responding to external events may become intolerable. On the other hand, polling too frequently, while improving the ability of the CPU to respond to external events, may add excessive overhead to the computational task.
Interrupts provide a way out of this dilemma. An interrupt is a special type of input to the CPU. When an interrupt occurs, the CPU temporarily suspends whatever it is doing and executes special interrupt-related instructions in response to the external event responsible for the interrupt. The interrupt-related instructions are typically referred to as an Interrupt Service Routine (ISR), and may perform some function requested by an external device. For example, an interrupt from a keyboard can momentarily divert the processor from executing main program instructions to accept a typed character. An ISR is typically executed as promptly as possible after the interrupt is received. Prior to entering the ISR, the CPU makes preparations so that, upon completion of the ISR, it can resume the process that was suspended when the interrupt occurred. This may involve saving the current context (i.e., program counter, status register, etc.). The advantage of using interrupts is that no time is wasted in polling the external inputs, since the CPU is never diverted from its computational activities until an interrupt occurs. Furthermore, the worst-case response time to an external event is no longer based on the polling interval. The interval between the occurrence of an interrupt and the completion of the ISR (known as the interrupt latency) is now dependent on shorter times, such as the time required for the CPU to save the context.
An architectural feature of many modern CPUs is the instruction pipeline. A pipeline consists of a sequence of stages through which instructions pass as they are executed, with partial processing of an instruction being performed in each stage. Each instruction typically comprises an operator and one or more operands. The operator represents a code designating the particular operation to be performed (e.g., MOVE, ADD, etc.), and the operand denotes an address or data upon which the operation is to be performed. Execution of the instruction requires several steps; e.g., the instruction must be decoded, the addresses of the operands computed, the operands fetched, and the operation executed. In a non-pipelined processor, only one instruction is processed at a time. Therefore, the instruction rate is based on the time required to perform all of these separate steps. However, in a pipelined processor, the steps are performed concurrently on multiple instructions, as they advance through the pipeline. An example of this is shown in FIG. 1, for a four-stage pipeline. The processing sequence for each instruction is from top to bottom. Each stage of processing is assumed to require one clock cycle, and the clock cycles are represented as time steps T1-T6. Instruction I1 enters the first stage of the pipeline at time T1, where it is decoded. One clock cycle later, at time T2, instruction I1 advances to the second stage of the pipeline, where the addresses of its operands are computed; simultaneously, a second instruction I2 enters the first stage of the pipeline to be decoded. This process continues to time T4, where instruction I1 is finally executed. By time T5, instruction I1 has fallen out of the pipeline and instruction I2 is executed. Note that once the pipeline is full, an instruction emerges from the pipeline for each clock cyclexe2x80x94four times faster than if each instruction had to be completed before processing the next one. In effect, the pipeline allows multiple instructions to be processed concurrently, and greatly enhances the bandwidth (i.e., instructions per second) of the CPU.
To operate efficiently, a pipeline must remain full to the extent possible. Anything that disrupts the flow of instructions into and out of the pipeline negates its benefits and diminishes bandwidth. In particular, if it becomes necessary to empty and refill the pipeline very frequently, performance may begin to approach that of a non-pipelined processor. This can potentially occur with an interrupt. As stated above, it is usually desirable to allow an interrupt to preempt the processor. To promptly respond to an interrupt, a pipelined processor typically discards unexecuted instructions from its pipeline, and then refills the pipeline as quickly as possible with the instructions required to service the interrupt (i.e., the ISR). After servicing the interrupt, the pipeline has to be refilled with the main program instructions that were pending when the interrupt took place. Obviously, emptying and refilling the pipeline reduces processor bandwidth. Moreover, the time required to refill the pipeline prior to executing the ISR adds to the interrupt latency.
Direct Memory Access (DMA) transfers are a type of external event capable of interrupting a CPU. A DMA transfer is typically used to move a large amount of data into or out of memory (e.g., when an image file is read from a hard disk into memory). It may be inefficient for the CPU to directly transfer blocks of data, so a special DMA memory controller typically manages the transaction. To initiate a DMA transfer, the controller interrupts the CPU. In response, the CPU gives the controller a few key parameters, such as a target address, size of the data block, etc., and allows it to carry out the data transfer. Although the DMA controller relieves the processor of having to oversee the mass data transfer, the DMA interrupt still disrupts the instruction pipeline, as described in the preceding paragraph, resulting in a loss of efficiency. In systems in which there is a great deal of DMA activity, the impact on latency and bandwidth may be significant. Efficient handling of DMA interrupts may therefore be an important factor in overall system performance in applications such as graphics processing, for example.
For a high-performance pipelined CPU, it would be desirable to avoid the above-mentioned disadvantages associated with responding to a DMA interrupt. It would be beneficial in particular, to minimize the loss in CPU bandwidth and the increased interrupt latency that result from having to empty and refill the pipeline to service the interrupt. It would be especially desirable if this could be accomplished in a straightforward manner, without extensively modifying the CPU.
The problems outlined above are in large part solved by a method for minimizing latency and loss of processor bandwidth in a pipelined processor when responding to an interrupt. The method advantageously avoids emptying and refilling the processor""s instruction pipeline in order to service an interrupt request. Instead, a short sequence of instructions comprising the interrupt response is inserted into the pipeline. Normal pipeline operation stalls while the inserted instructions execute, but since flow is not disrupted the loss in bandwidth is not as great as if the pipeline were flushed. Furthermore, direct insertion of the instructions into the pipeline avoids the need for the processor to save its context and branch to an interrupt service routine in memory; this results in much faster response in servicing the interrupt, thereby reducing latency.
In a preferred embodiment, the method applies to a pipelined processor having a RISC architecture. A RISC (Reduced Instruction Set Computer) is a type of processor that is designed to perform a smaller number of types of computer instructions so that it can operate at a higher speed. In an embodiment, the processor receives interrupt requests from one or more DMA memory controllers, and the instructions inserted into the pipeline compute block address information for a DMA transfer.
A method is presented herein for servicing an interrupt in a pipelined processor, including generating one or more interrupt-related instructions within the processor in response to the interrupt and inserting the interrupt-related instructions into the pipeline of the processor for execution. These interrupt-related instructions generated within the processor may constitute the entire interrupt service routine, or alternatively, a portion of the interrupt service routine. Main program instructions which may be present in the instruction pipeline of the processor prior to receiving the interrupt are retained when the interrupt-related instructions are inserted. Normal operation of the pipeline may be resumed subsequent to execution of the interrupt-related instructions, beginning with execution of any main program instructions retained in the pipeline at the time of the interrupt.
In an embodiment, the interrupt-related instructions compute address information for a DMA request from a memory channel. In such embodiments, the interrupt-related instructions may send the contents of an address register to a data bus, compute a new address, and then store that address in the address register. A count register may also be decremented each time an interrupt is serviced, to avoid transferring more than a predetermined maximum number of data blocks. The DMA request in an embodiment is a block address request (BARq) from a memory channel, and is assigned the highest available interrupt priority. Alternatively, the request may be one of a plurality of BARq interrupts, each of which is assigned a different priority higher than that of other types of interrupt.
In another embodiment of the method for servicing an interrupt in a pipelined processor, a stage of the pipeline is frozen in response to the interrupt, and one or more interrupt-related instructions are inserted into a stage of the pipeline succeeding the frozen stage. In this embodiment, main program instructions may pass through the pipeline to be executed, prior to the processor""s receiving the interrupt. While the pipeline is frozen, each stage prior in the execution sequence to the stage into which the interrupt-related instructions are inserted may therefore retain a main program instruction present in the stage at the time the interrupt was detected. Subsequent to execution of the interrupt-related instructions, execution of the retained main program instructions may resume.
A processor is also described herein, where the processor includes interrupt handling circuitry adapted to generate one or more interrupt-related instructions in response to an interrupt signal, and insert the interrupt-related instructions into a pipeline of the processor for execution. These interrupt-related instructions may include all instructions used to service the interrupt associated with the interrupt signal, or alternatively, may include some of the instructions used to service the interrupt. In an embodiment, the interrupt handling circuitry recognizes the interrupt signal (and distinguishes it from other types of interrupts), and transmits a stall signal to a stage of the pipeline preceding the stage into which the generated interrupt-related instruction is inserted, which freezes the pipeline above the insertion point. The interrupt handling circuitry may remove the stall signal after insertion of the last interrupt-related instruction. In an embodiment, the interrupt handling circuitry receives the interrupt signal from a memory controller, which generates the interrupt as a DMA request.
A processor-based system is also disclosed, consisting of a pipelined processor as described above, together with a memory controller that transmits the interrupt signal to the processor. The interrupt signal in a preferred embodiment of the system is a DMA request, and the interrupt-related instructions send an address to a memory system in response to the interrupt signal. The memory controller in this embodiment may issue multiple DMA requests for different memory channels. Each of the DMA requests may be assigned a different priority, and DMA requests preferably have a higher priority than other interrupts. The memory controller may receive an acknowledge signal from the interrupt handling circuitry for each DMA request.