The present invention is generally directed to data processors and, more specifically, to systems and methods for supporting precise exceptions in a data processor having a clustered architecture.
The demand for high performance computers requires that state-of-the-art microprocessors execute instructions in the minimum amount of time. A number of different approaches have been taken to decrease instruction execution time, thereby increasing processor throughput. One way to increase processor throughput is to use a pipeline architecture in which the processor is divided into separate processing stages that form the pipeline. Instructions are broken down into elemental steps that are executed in different stages in an assembly line fashion.
A pipelined processor is capable of executing several different machine instructions concurrently. This is accomplished by breaking down the processing steps for each instruction into several discrete processing phases, each of which is executed by a separate pipeline stage. Hence, each instruction must pass sequentially through each pipeline stage in order to complete its execution. In general, a given instruction is processed by only one pipeline stage at a time, with one clock cycle being required for each stage. Since instructions use the pipeline stages in the same order and typically only stay in each stage for a single clock cycle, an N stage pipeline is capable of simultaneously processing N instructions. When filled with instructions, a processor with N pipeline stages completes one instruction each clock cycle.
The execution rate of an N-stage pipeline processor is theoretically N times faster than an equivalent non-pipelined processor. A non-pipelined processor is a processor that completes execution of one instruction before proceeding to the next instruction. Typically, pipeline overheads and other factors decrease somewhat the execution advantage rate that a pipelined processor has over a non-pipelined processor.
An exemplary seven stage processor pipeline may consist of an address generation stage, an instruction fetch stage, a decode stage, a read stage, a pair of execution (E1 and E2) stages, and a write (or write-back) stage. In addition, the processor may have an instruction cache that stores program instructions for execution, a data cache that temporarily stores data operands that otherwise are stored in processor memory, and a register file that also temporarily stores data operands.
The address generation stage generates the address of the next instruction to be fetched from the instruction cache. The instruction fetch stage fetches an instruction for execution from the instruction cache and stores the fetched instruction in an instruction buffer. The decode stage takes the instruction from the instruction buffer and decodes the instruction into a set of signals that can be directly used for executing subsequent pipeline stages. The read stage fetches required operands from the data cache or registers in the register file. The E1 and E2 stages perform the actual program operation (e.g., add, multiply, divide, and the like) on the operands fetched by the read stage and generates the result. The write stage then writes the result generated by the E1 and E2 stages back into the data cache or the register file.
Assuming that each pipeline stage completes its operation in one clock cycle, the exemplary seven stage processor pipeline takes seven clock cycles to process one instruction. As previously described, once the pipeline is full, an instruction can theoretically be completed every clock cycle.
The throughput of a processor also is affected by the size of the instruction set executed by the processor and the resulting complexity of the instruction decoder. Large instruction sets require large, complex decoders in order to maintain a high processor throughput. However, large complex decoders tend to increase power dissipation, die size and the cost of the processor. The throughput of a processor also may be affected by other factors, such as exception handling, data and instruction cache sizes, multiple parallel instruction pipelines, and the like. All of these factors increase or at least maintain processor throughput by means of complex and/or redundant circuitry that simultaneously increases power dissipation, die size and cost.
In many processor applications, the increased cost, increased power dissipation, and increased die size are tolerable, such as in personal computers and network servers that use x86-based processors. These types of processors include, for example, Intel Pentium(trademark) processors and AMD Athlon(trademark) processors.
However, in many applications it is essential to minimize the size, cost, and power requirements of a data processor. This has led to the development of processors that are optimized to meet particular size, cost and/or power limits. For example, the recently developed Transmeta Crusoe(trademark) processor reduces the amount of power consumed by the processor when executing most x86 based programs. This is particularly useful in laptop computer applications. Other types of data processors may be optimized for use in consumer appliances (e.g., televisions, video players, radios, digital music players, and the like) and office equipment (e.g., printers, copiers, fax machines, telephone systems, and other peripheral devices).
In general, an important design objective for data processors used in consumer appliances and office equipment is the minimization of cost and complexity of the data processor. One way to minimize cost and complexity is to exclude from the processor core functions that can be implemented with memory-mapped peripherals external to the core. For example, cache flushing may be performed using a small memory-mapped device controlled by a specialized software function. The cost and complexity of a data processor may be minimized by implementing extremely simple exception behavior in the processor core.
Exceptions are interrupts produced by the data processor itself. The cause of an exception is generally an internal processor error. Exceptions are commonly distinguished as one of faults (i.e., issues an exception prior to completing instruction execution), traps (i.e., issues an exception after completing instruction execution) and aborts (i.e., unlike faults and traps, does not always indicate an address of the error, therefore recovering instruction execution after an abort is not always possible).
A wide-issue processor is a pipelined data processor well-suited for use in consumer appliances and office equipment. A wide-issue processor operates to execute bundles of operations in multiple stagesxe2x80x94multiple concurrent operations are bundled into a single instruction and are issued and executed as a unit. In a wide-issue processor, having a clustered architecture, data processor resources are further divided into clusters wherein each cluster consists of one or more register files each of which is associated with a subset of the execution units of the data processor.
Conventionally, an exception will cause the wide-issue processor to enter immediately into an excepting state where it will wait until activity for a given set of instructions or operations has completed. Hardware for partial re-execution of the interrupted instruction bundles is often employed for xe2x80x9ccleanup.xe2x80x9d A primary disadvantage is found in the time expended waiting for the processor to xe2x80x9ccleanupxe2x80x9d and to determine its state. This has a related disadvantage of requiring complex hardware logic to handle instruction re-execution. An alternate approach does not provide support for certain precise exception conditions, meaning that some combinations of operations are not allowed. A primary disadvantage is found in limiting legal code combinations.
Therefore, there is a need in the art for improved data processors in which the cost and complexity of the processor core is minimized while maintaining the processor throughput. In particular, there is a need for improved systems and methods for supporting precise exceptions in a wide-issue data processor. More particularly, there is a need for systems and methods capable of identifying a precise exception early in a pipeline and efficiently completing operations previously executing in the pipeline, thereby addressing wasted power/time resources associated with prior art implementations.
To address the above-discussed deficiencies of the prior art, it is a primary object of the present invention to provide a data processor having a clustered architecture that comprises an exception controller supporting precise exceptions therein. The principles hereof reduce the complexity of circuit logic previously necessary to take an exception safely in a data processor supporting a clustered architecture. The present invention is well suited for implementation in data processors having multiple functional units that allow multiple operations to be explicitly executed in a single cycle, such as wide-issue (or xe2x80x9cVLIWxe2x80x9d) processors. An exemplary implementation introduced hereafter illustrates that the principles hereof are extensible to wider issue processorsxe2x80x94introducing a high degree of scalability.
According to one advantageous embodiment, each cluster of the data processor comprises an instruction execution pipeline having N processing stages. Each of the N processing stages is capable of performing at least one of a plurality of execution steps associated with instructions being executed by the clusters. The interrupt and exception controller operates to (i) monitor each instruction execution pipeline to detect exception conditions associated with the executing instructions, (ii) detect an exception condition associated with one of the executing instructions, wherein this executing instruction issued at time t0, and (iii) generate an exception in response to the exception condition upon completed execution of earlier ones of the executing instructions, these earlier executing instructions issued at time preceding t0.
An important aspect of this embodiment is that even if an exception is generated by some condition earlier in the instruction pipeline the instructions issued prior to the excepting instruction are allowed to complete. According to one related embodiment, the exception condition is detected while an execution step associated with the excepting instruction is performed by a processing stage preceding a Nth processing stage (which in a preferred embodiment is the write (xe2x80x9cWxe2x80x9d) processing stage). In this manner, the exception may be deemed to occur when it reaches the Nth (or xe2x80x9cWxe2x80x9d) processing stage of the pipeline, at which point the remaining pipeline can be aborted immediately and all subsequent instructions discarded. This is reflected in a related embodiment of the present invention wherein the interrupt and exception controller further operates to abort later executing instructions that issued at time subsequent t0.
This mechanism allows exceptions to be serviced quickly and precisely, and is possible because it is inherently known whether a given instruction has architecturally executed at the point at which the exception occurs (Nth processing stage). As a result, no complex circuit logic is required, as is evident from a related embodiment of the present invention wherein the interrupt and exception controller further comprises exception generator circuitry and a plurality of latching circuits. The exception generator circuitry operates to generate the exception in response to the excepting instruction entering the xe2x80x9cWxe2x80x9d processing stage, and communicate the exception to fetch address generation circuitry, which operates, in response thereto, to fetch an instruction from an interrupt handler. The latching circuits control execution flow of the earlier executing instructions among associated processing stages, wherein ones of the latching circuits are associated with at least each of a xe2x80x9cRxe2x80x9d processing stage, an xe2x80x9cE1xe2x80x9d processing stage and an xe2x80x9cE2xe2x80x9d processing stage.
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.
Before undertaking the DETAILED DESCRIPTION OF THE INVENTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms xe2x80x9cincludexe2x80x9d and xe2x80x9ccomprise,xe2x80x9d as well as derivatives thereof, mean inclusion without limitation; the term xe2x80x9cor,xe2x80x9d is inclusive, meaning and/or; the phrases xe2x80x9cassociated withxe2x80x9d and xe2x80x9cassociated therewith,xe2x80x9d as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term xe2x80x9ccontrollerxe2x80x9d and xe2x80x9ccircuitryxe2x80x9d means any device, system or part thereof that controls at least one operation, such a device, system or part thereof may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller or circuitry may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.