Processors (e.g., microprocessors) are well known and used in a wide variety of products and applications, from desktop computers to portable electronic devices, such as cellular phones and PDAs (personal digital assistants). As is known, some processors are extremely powerful (e.g., processors in high-end computer workstations), while other processors have a simpler design, for lower-end, less expensive applications and products.
As is known, many processors have pipelined architectures to increase instruction throughput. In theory, scalar pipelined processors can execute one instruction per machine cycle (and more in super-scalar architectures) when executing a well-ordered, sequential instruction stream. This is accomplished even though an instruction itself may implicate or require a number of separate micro-instructions to be effectuated. Pipelined processors operate by breaking up the execution of an instruction into several stages that each require one machine cycle to complete. For example, in a typical system, an instruction could require many machine cycles to complete (fetch, decode, ALU operations, etc.).
Reference is made to FIG. 1, which is a block diagram illustrating certain stages within a pipelined processor, as is known. In the architecture of FIG. 1, illustrated are an instruction fetch unit 10, a decode unit 20, an execute unit 30, a memory access unit 40, and a register writeback unit 50. The operation of these units (or logic blocks) is known by persons skilled in the art. In this regard, an instruction fetch unit 10 performs instruction memory fetches. This unit is configured to determine the value or contents of a program counter (within the register file 60) for in-order instruction execution, as well as exception vectors, branches, and returns. The instruction fetch unit 10 is also configured to determine the return address for all exceptions and branch-link instructions, and write or store that return address into an appropriate register within the register file 60. Consistent with the invention, addressing of instruction fetches may be through physical addresses directly to memory, or through an instruction cache (not shown) using physical or virtual addresses. Although the internal architecture of the register file 60 is not shown, the register file 60 includes various registers utilized by the processor. As is known, such registers may include general-purpose registers or special-purpose registers (such as status registers, a program counter, etc.).
The decode unit 20 operates to decode instructions passed to it from the instruction fetch unit 10 and generate the necessary control signals for the execute unit 30 to carry out the execution of the particular instruction. The specific architecture of decode units (like decode unit 20) are processor dependent, but the operation and organization of such will be understood by persons skilled in the art. Likewise, the structure and operation of the execute unit 30 are processor dependent, but is understood by persons skilled in the art. Generally, an execute unit includes circuitry to carry out the execution of instructions as determined by the control signals generated from the decode unit 20.
As illustrated in FIG. 1, the execute unit 30 of the illustrated embodiment includes logic 32 for generating one or more interrupt signals (or interrupt requests) 33. As the names imply, the interrupt signal 33 indicates an interrupt condition (e.g., IRQ, FIQ, etc.) is pending or requested. The memory access unit 40 interfaces with external data memory for reading and writing data in response to the instruction being executed by the execute unit 30. Of course, not all instructions require memory accesses, but for those that do, the memory access unit 40 carries out the requisite access to external memory. Consistent with the invention, such memory access may be direct, or may be made through a data cache using either physical or virtual addressing.
Finally, the register writeback unit 50 is responsible for storing or writing contents (resulting from instruction execution), where appropriate, into registers within the register file 60. For example, consider the execution of an instruction that adds the contents of two general-purpose registers and stores the contents of that addition into a third general-purpose register. After execution of such an instruction, the register writeback unit 50 causes the value obtained in the summation to be written into the third general-purpose register.
With regard to interrupt handling, processors like that of FIG. 1 generally operate as follows. An external interrupt request is made of the processor, and this exception request is communicated to the execute unit 30. In response, the execute unit 30 examines the interrupt status of the processor and when the particular interrupt is enabled, it generates a recognized interrupt request 33 that is communicated to the instruction fetch unit 10. The fetch unit 10 then “vectors” to an address corresponding to the requested interrupt (e.g., an address dedicated via hardware to store instructions to handle an interrupt) and retrieves a first instruction associated with the interrupt service routine. Generally, this first instruction is a branch to a location where a user-defined interrupt service routine is stored. Then, the interrupt-related instruction progresses through the pipeline like any other instruction.
In addition, when the execute unit 30 receives the interrupt request and generates the recognized interrupt request 33, it also generates a flush signal/command (not shown), which causes all preceding pipeline stages to flush their contents. This flush is generally performed as a routine, cautionary measure to ensure that no later intervening instruction is encountered that causes the execute unit to, for example, change modes (which may mask or otherwise adversely impact the execution of the interrupt). Thus, any pending instructions within the pipeline are flushed (e.g., replaced with NO-OPs), and the address of the first flushed instruction is set as a return address that is accessed upon completion of the interrupt service routine. This has the result of having the first instruction associated with the interrupt to be the first instruction executed after receipt of the exception request.
Several clock cycles are lost, however, in connection with the flush, until the first instruction associated with the interrupt service routine can pass through the pipeline and reach the execute unit 30. Accordingly, it is desired to provide an improved architecture for handling interrupts to improve the processor efficiency in connection with this flush operation.