The present invention relates to digital computers, and, more particularly, to a novel method and apparatus for increasing the speed with which a micro-processor handles excepted instructions during program execution.
A digital computer includes a central processing unit, such as a microprocessor, several types of memory, input-output devices and the like. The operation of a computer is controlled by computer programs, such as operating systems and application programs. Those programs provide instructions to those functional units of a central processor responsible for handling and execution as binary information. Different binary sequences represent different instructions for a particular machine and the instructions tend to be peculiar to a particular processor or processor family. Thus, typically, different families of processors have different instruction sets, unless they are specifically designed to utilize an instruction set of another processor family.
The instructions of one processor typically cannot be understood directly by any of the other types of processors. The difference in instruction sets is often due to the format chosen by the designer for presenting instructions to the processor. In general, a designer may chose to design a processor for a complex instruction set computer (CISC) or reduced instruction set computer (RISC) or for a newer very long instruction set (VLIW) computer.
CISC processors provide special hardware for executing an entire operation. For example, an ADD instruction may provide one operand to an integer unit register, fetch a second operand from memory and place it in a second register, and combine the two operands in a third register. Because it does so, instruction formats are very complicated. Such structure takes a large amount of hardware and processing effort to tell one instruction from another.
A RISC processor, on the other hand, is much simpler and treats each part of an ADD operation as a separate element. By providing consistently sized instructions, a RISC processor eliminates significant hardware and reduces the processing time needed to decode instructions.
A newer type of processor called a very long instruction word (VLIW) processor attempts to make use of the best attributes of both CISC and RISC. It uses consistent sized instructions (herein called xe2x80x9catomsxe2x80x9d) as do RISC processors, but groups a number of those instructions together in a VLIW word (herein called a xe2x80x9cmoleculexe2x80x9d) and provides processing units to execute the individual atoms in parallel.
The execution of most operations by a processor requires a number of steps. For example, instructions must be fetched from memory, sometimes a second instruction must be fetched from memory, the instruction is decoded and finally it is executed. This takes a number of operational cycles of the processor. In order to produce results as fast as possible, computers are designed so that each sequential instruction is begun (as far as is possible) on the next operation cycle after the preceding instruction has already begun the steps leading to its execution. This causes the steps leading to execution of subsequent instructions to overlap. In this manner, an instruction may often be executed each cycle.
The manner in which this starting and carrying out the steps leading to the execution of instructions so that an instruction executes each operation cycle is referred to by those skilled in the art as an instruction pipeline, which is included within processors. In operation to process an application program, the instructions of the application are serially entered into the pipeline by a pipeline control unit. The pipeline control unit includes a program counter, that loads instructions in a serial order into the pipeline.
Instructions do not always execute in the ideal order provided by the pipeline. Often things happen which interfere with the process. For example, a memory page at which an instruction resides cannot be found in main memory, creating an exception. To resolve that exception, the memory page must be paged in before the instruction can execute. Exceptions, such as the foregoing, occur for a wide variety of reasons. When such exception occurs, further internal processing activity of the application program is temporarily halted, until that exception is resolved. The excepting functional unit issues a stop signal, herein called a xe2x80x9ckillxe2x80x9d signal, which pauses all other functional units in the microprocessor, until the exception is resolved, empties the pipeline of any instructions and immediately prompts an exception handler into action.
The exception handler fixes the problem which has arisen. An exception handler is a software routine conventionally designed to deal with such exceptions; and different exception handlers (i.e., different routines) are prepared to handle each different kind of exception. Thus, every computer contains a library of such software handlers in its associated memory. The pipeline control unit calls up an appropriate exception handler and executes that routine.
As an example, if a memory exception is due to the failure to locate data referenced by a load instruction within main memory, the exception handler is one that pages in that data and then returns back to the original excepted instruction. Processing of the application recommences with re-execution of the same instruction. However, this time the instruction is executed without generating that memory exception. In rare instances an exception handler of a RISC or CISC processors may emulate the instruction that caused the exception and then execute the emulated instruction to achieve the result desired. In other instances, an exception handler may only note that an exception occurred and return control to the excepted instruction or may decide to skip the excepted instruction and have execution resume at the next instruction. The exception handler thereafter returns control to the instruction pipeline controller by issuing a xe2x80x9creturn-from-exceptionxe2x80x9d (RFE) signal. The latter signals the pipeline counter in the pipeline control unit to reissue and execute the instruction that was subject to the exception or, alternatively as called for by the exception handler, signals to advance the next instruction into the pipeline, that is, insert the memory address of the succeeding instruction into the pipeline, thereby moving pipeline activity beyond the instruction that generated the exception. In either event, the return-from-exception procedure is a very simple step.
Such a simple step of skipping past an excepted instruction is not possible for VLIW processors. In VLIW processors it is frequently desired for exception handlers to emulate an excepting instruction. Although re-execution of the excepting atom instruction is not desired, other instructions in the VLIW instruction must be executed. For example, one VLIW computer system is described U.S. Pat. No. 5,832,205 to Kelly et al, granted Nov. 3, 1998, entitled, Memory Controller For A Microprocessor For Detecting A Failure of Speculation On The Physical Nature of A Component Being Addressed (the xe2x80x9c205 Kelly patentxe2x80x9d), assigned to Transmeta Corporation, assignee of the present invention, the content of which patent is incorporated by reference herein in its entirety. The present invention has particular application to VLIW computers, and, in particular, to VLIW computers described by the ""205 Kelly patent, although it should be understood that the invention may be found to also be applicable to other types of computers.
The ""205 Kelly patent discloses a novel microprocessor formed by a combination of hardware processing portion, much simpler in structure than competitive prior state of the art microprocessors, and an emulating software portion, referred to therein as xe2x80x9ccode morphing softwarexe2x80x9d. Among other things, in the Kelly system the code morphing software carries out a significant portion of the functions of processors in software, thereby reducing the hardware required for processing and the greater electrical power that such hardware requires. For a better understanding of the foregoing and other functions accomplished by the code morphing software, the interested reader is invited to study the ""205 Kelly patent.
A VLIW processor constructed in accordance with the ""205 Kelly patent also contains an instruction pipeline. However, because a VLIW instruction (xe2x80x9cmoleculexe2x80x9d) is packed with a number of individual instructions (xe2x80x9catomsxe2x80x9d), which are to be executed in parallel, what is generally referred to as the instruction pipeline in a processor prescribed in the ""205 Kelly patent is actually a composite of multiple parallel pipelines. The stages of the instruction pipeline in the latter processor number, typically, five. Those stages comprise, as example, first and second stages (cycles) to fetch an instruction from an instruction cache, two register operations and, ultimately, the execution stage, at which an instruction is executed (or is found subject to an exception).
The pipeline formatting of a molecule rarely contains a single atom prescribing an operation, but, typically, comprises two and as many as four separate atoms prescribing different operations. All those atoms pass along the instruction pipeline as a collective group constituting the molecule. Since individual atoms are intended to be executed by separate functional units and execute in parallel, i.e., simultaneously, the VLIW processor comprises multiple instruction pipelines, one for each functional unit in the computer that may be called upon during the processing steps to execute atom instructions. In making reference herein to the VLIW processors pipeline, it should be understood that reference is being made collectively to the multiple pipelines, unless the context of the statement indicates that reference is made only to a specific individual pipeline, as example, to the memory pipeline, the ALU1 pipeline and so on.
At the respective pipeline""s execution stage, the atomic instructions are executed by separate non-conflicting functional units in the computer, ideally concurrently for optimal speed of processing. If execution of one of those component instructions causes an exception, which, as a consequence, halts further application processing and forces clearance of the instruction pipeline, as many as three atomic instructions in the VLIW molecule are also halted and cleared from their respective instruction pipelines.
In VLIW computers an exception handler cannot simply advance the pipeline control unit""s instruction counter another step, as occurs in the prior CISC computers as earlier described, since the remaining operation atoms in the same molecule also must be accounted for and require execution. Instead the VLIW computers exception handlers are required to either emulate all the atoms in the molecule before returning control back to the pipeline control unit or otherwise store, retrieve and execute those remaining atoms.
The foregoing exception handling process is very expensive in terms of VLIW processor time (clock cycles) and is likely to dramatically slow execution of the program. It also requires the software comprising the exception handlers to be significantly more complex than that for the CISC type processors. Neither result is attractive. Both detract from inherent advantage of the VLIW processor.
The present invention offers a better approach for handling those atoms in the event of an exception. As an advantage VLIW computers, such as those constructed in accordance with the ""205 Kelly patent, are no longer required to increase the complexity of exception handling software to account for and/or handle the other atoms in the VLIW molecule. As a further advantage, the invention permits continued reassertion of a VLIW instruction (molecule) that generated the exception within the execution pipeline for execution of remaining atomic instructions within the molecule by disabling the atom (or atoms) responsible for the exception (or exceptions), permitting the remaining atoms to execute. A VLIW molecule is repeatedly asserted into the pipeline until all the individual atoms within the molecule have been executed or excepted.
Accordingly, an object of the invention is to improve the internal operating efficiency of a microprocessor, more particularly, a VLIW microprocessor.
A further object of the invention is to provide a new more efficient process and apparatus internal to a VLIW microprocessor for handling atoms in a molecule (VLIW instruction), both those atoms responsible for generating an exception and the remaining atoms, that ensures that any atom requiring execution is executed.
A still further object of the invention is to minimize the time (system clock cycles) required to process a VLIW instruction in the event the instruction is responsible for an exception.
And an ancillary object of the invention is to reduce the need for complex exception handling software in those computers that execute complex instructions, defining multiple independent operations there within, and permit use of exception handling software of the complexity level inherent in those computers which execute ordinary instructions.
In accordance with the foregoing objects and advantages, the improved VLIW processor defined by the invention provides a separate information control pipeline to parallel the instruction pipeline. The control information pipeline is provided with control information about the VLIW instruction that was input into the instruction pipeline. That control information accompanies that VLIW instruction until all the component operations prescribed in the VLIW instruction have been executed, effectively linking the control information with the corresponding VLIW instruction. When the atoms within a VLIW instruction are presented for execution to the respective functional execution units of the processor, the respective functional units are able to access and use that control information.
In a more specific aspect to the invention, the respective functional unit determines whether the atom that is presented for execution is to be executed or masked.
In accordance with a more specific aspect to the invention, the control information, defining an instruction pipeline control packet for the VLIW processor, comprises extra bits, herein referred to as xe2x80x9cenable bitsxe2x80x9d. Each enable bit is associated with a respective one of the plurality of different functional units of the processor responsible for executing an atom (operation), such as the memory unit, Arithmetic and Logic units, the floating point unit and the like. The enable bits permit software to specify which parts of a VLIW instruction are subsequently to be executed (enabled) and which parts are to be masked (disabled) when returning from an exception. The VLIW instruction may then be reasserted in the instruction pipeline for execution of the unmasked atoms.
In accordance with the latter, the VLIW processor includes a pipeline control unit that includes a source to generate the enable bits and link those bits to a VLIW instruction; an instruction pipeline for the VLIW instruction; and an enable bit pipeline or xe2x80x9csidebandxe2x80x9d pipeline through which the enable bits are advanced in synchronism with the instruction""s step-by-step advance through the instruction pipeline. The pipeline control unit further includes a register, herein referred to as the error register, for saving both instruction address of a VLIW instruction that is subject to an exception and the companion enable bits linked thereby. The latter register serves within the control unit as an alternate supply of the enable bits that the control unit""s source provides. Each functional unit""s decode logic is expanded to first decode and interpret the respective enable bit and some of the processor""s exception handler""s are modified to carry out the additional task of modifying the respective enable bit, changing that bit from enabled to disabled when appropriate.
The group of enable bits, one per computational pipeline, thereby tracks or follows each VLIW instruction as it is executed. By default, all the enable bits associated with each VLIW instruction are set, that is, enabled, resulting in the normal execution of all operations specified by the multiple atom instructions in the VLIW molecule. If an exception or interrupt occurs during the execution of an individual instruction (atom) contained in the VLIW molecule, the pipeline control units saves both the instruction address of the molecule and the accompanying group of enable bits in the error register. The exception handler called up to handle the exception is free to modify any of those enable bits. The exception handler can change the state of each from enable to disable (or vice-versa, if desired).
Should the exception handler for any reason determine to disable an atom that was subject to an exception, the bit associated with that atom is disabled. Upon resolving the exception, the exception handler issues a return-from-exception (xe2x80x9cRFExe2x80x9d) instruction, which returns control via the pipeline control unit to the original molecule. The pipeline control unit restarts the same VLIW molecule instruction along with the companion enable bits, as so possibly modified, retrieved from the error register, placing them into the respective pipelines.
The disabled atom is masked and will not be executed. Since the enable bits effectively track the VLIW molecule instruction as it executes, changes made to the enable bits by the exception handlers are cumulative, until the entire molecule instruction (all the atom instructions) completes successfully.