The present invention relates to digital computers, and, more particularly, to a novel method and apparatus for reducing complexity of the logic circuits required to handle unaligned memory operations.
The operation of a digital computer is controlled by computer programs, such as operating systems and application programs. Those programs provide instructions to those functional units of a central processor in the digital computer responsible for handling and execution as binary information. Different binary sequences represent different instructions for a particular machine and the instructions tend to be unique to a particular processor or processor family. Thus, typically, different families of processors have different instruction sets, unless they are specifically designed to utilize an instruction set of another processor family.
The instructions of one processor typically cannot be understood directly by any of the other types of processors. The difference in instruction sets is often due to the format chosen by the designer for presenting instructions to the processor. In general, a designer may chose to design a processor for a complex instruction set computer (CISC) or reduced instruction set computer (RISC) or for a newer very long instruction set (VLIW) computer.
CISC processors provide special hardware for executing an entire operation. For example, an ADD instruction may provide one operand to an integer unit register, fetch a second operand from memory and place it in a second register, and combine the two operands in a third register. Because it does so, instruction formats are very complicated. Such structure takes a large amount of hardware and processing effort to tell one instruction from another. A RISC processor, on the other hand, is much simpler and treats each part of an ADD operation as a separate element. By providing consistently sized instructions, a RISC processor eliminates significant hardware and reduces the processing time needed to decode instructions.
A newer type of processor called a very long instruction word (VLIW) processor attempts to make use of the best attributes of both CISC and RISC. It uses consistent sized instructions (herein called xe2x80x9catomsxe2x80x9d) as do RISC processors, but groups a number of those instructions together in a VLIW word (herein called a xe2x80x9cmoleculexe2x80x9d) and provides processing units to execute the individual atoms in parallel.
The execution of most operations by a processor requires a number of steps. For example, instructions must be fetched from memory, sometimes a second instruction must be fetched from memory, the instruction is decoded and finally it is executed. This takes a number of operational cycles of the processor. In order to produce results as fast as possible, computers are designed so that each sequential instruction is begun (as far as is possible) on the next operation cycle after the preceding instruction has already begun the steps leading to its execution. This causes the steps leading to execution of subsequent instructions to overlap. In this manner, an instruction may often be executed each cycle.
The manner in which this starting and carrying out the steps leading to the execution of instructions so that an instruction executes each operation cycle is referred to by those skilled in the art as an instruction pipeline, which is included within processors. In operation to process an application program, the instructions of the application are serially entered into the pipeline by a pipeline control unit. The pipeline control unit includes a program counter, that loads instructions in a serial order into the pipeline. Via that pipeline, instructions are presented to a respective functional execution unit of the processor that is to execute the instruction.
Instructions do not always execute in the ideal order provided by the pipeline. Often things happen which interfere with the process. For example, a memory page at which an instruction resides may not be in main memory and must be paged in before the instruction can execute. Exceptions occur for a wide variety of reasons. When such exception occurs, further internal processing activity of the application program is temporarily halted, until that exception is resolved. The excepting functional unit issues a stop signal, herein called a xe2x80x9ckillxe2x80x9d signal, which pauses all other functional units in the microprocessor, until the exception is resolved, empties the pipeline of any instructions and immediately prompts an exception handler into action.
The exception handler fixes the problem which has arisen. An exception handler is a software routine conventionally designed to deal with such exceptions; and different exception handlers exist to handle each different kind of exceptions. A different routine is prepared to handle different exception. Thus, every computer contains a library of such software handlers in its associated memory. The pipeline control unit calls up an appropriate exception handler and executes that routine.
As an example, if a memory exception is due to the failure to locate data referenced by a load instruction within main memory, the exception handler is one that pages in that data and then returns back to the original excepted instruction. Processing of the application recommences with re-execution of the same instruction. However, this time the instruction is executed without generating that memory exception. In rare instances an exception handler of a RISC or CISC processors may emulate the instruction that caused the exception and then execute the emulated instruction to achieve the result desired. In other instances, an exception handler may only note that an exception occurred and return control to the excepted instruction or may decide to skip the excepted instruction and have execution resume at the next instruction. The exception handler thereafter returns control to the instruction pipeline controller by issuing a xe2x80x9creturn-from-exceptionxe2x80x9d (RFE) signal. The latter signals the pipeline counter in the pipeline control unit to reissue and execute the instruction that was subject to the exception or, alternatively as called for by the exception handler, signals to advance the next instruction into the pipeline, that is, insert the memory address of the succeeding instruction into the pipeline, thereby moving pipeline activity beyond the instruction that generated the exception. In either event, the return-from-exception procedure is a very simple step.
Such a simple step of skipping past an excepted instruction is not possible for VLIW processors. In VLIW processors it is frequently desired for exception handlers to emulate an excepting atom instruction. Although re-execution of the excepting atom instruction is not desired, other atoms in the same molecule instruction must be executed. For example, one VLIW computer system is described U.S. Pat. No. 5,832,205 to Kelly et al, granted Nov. 3, 1998, entitled, Memory Controller For A Microprocessor For Detecting A Failure of Speculation On The Physical Nature of A Component Being Addressed (the ""205 Kelly patent), assigned to Transmeta Corporation, assignee of the present invention, the content of which is incorporated by reference herein in its entirety. The present invention has particular application to VLIW computers, and, in particular, to VLIW computers described by the ""205 Kelly patent, although it should be understood that the invention may be found to also be applicable to other types of computers.
The ""205 Kelly patent discloses a novel microprocessor formed by a combination of a hardware processing portion, much simpler in structure than competitive prior state of the art microprocessors, and an emulating software portion, referred to therein as xe2x80x9ccode morphing softwarexe2x80x9d. Among other things, in the Kelly system the code morphing software carries out a significant portion of the functions of processors in software, thereby reducing the hardware required for processing and the greater electrical power that such hardware requires. For a better understanding of the foregoing and other functions accomplished by the code morphing software, the interested reader is invited to study the ""205 Kelly patent.
A VLIW processor constructed in accordance with the ""205 Kelly patent also contains an instruction pipeline. However, because a VLIW instruction (xe2x80x9cmoleculexe2x80x9d) is packed with a number of individual instructions (xe2x80x9catomsxe2x80x9d), which are to be executed in parallel, what is generally referred to as the instruction pipeline in a processor prescribed in the ""205 Kelly patent is actually a composite of multiple parallel pipelines. The stages of the instruction pipeline in the latter processor number, typically, five. Those stages comprise, as an example, first and second fetches (from memory), two register operations and, ultimately, the execution stage, at which an instruction is executed (or is found subject to an exception).
The pipeline formatting of a molecule rarely contains a single atom prescribing an operation, but, typically, comprises two and as many as four separate atoms prescribing different operations. Those atoms pass along the instruction pipeline as a collective group constituting the molecule. Since individual atoms are intended to be executed by separate functional execution units and such execution is intended to occur in parallel, i.e.,. simultaneously, the VLIW processor comprises multiple instruction pipelines, one for each functional unit in the computer that may be called upon during the processing steps to execute atom instructions. In making reference herein to the pipeline of a VLIW processor, it should be understood that reference is being made collectively to the multiple pipelines, unless the context of the statement indicates that reference is made only to a specific individual pipeline, as an example, to the memory pipeline, the ALU1 pipeline and so on.
At the respective pipeline execution stage, the atomic instructions are executed by separate non-conflicting functional units in the computer, ideally, concurrently for optimal speed of processing. If execution of one of those component instructions causes an exception, which, as a consequence, halts further processing and forces clearance of the instruction pipeline, as many as three atomic instructions in the VLIW molecule are also halted and cleared from their respective instruction pipelines.
In VLIW computers an exception handler cannot simply emulate the one atom responsible for the exception and advance the pipeline control unit instruction counter another step, as occurs in the prior CISC computers earlier described, since the remaining operation atoms in the same molecule also must be accounted for and require execution. Instead the VLIW computers exception handlers are required to either emulate all the atoms in the molecule before returning control back to the pipeline control unit or otherwise store, retrieve and execute those remaining atoms.
The foregoing exception handling process is very expensive in terms of VLIW processor time (clock cycles) and is likely to dramatically slow execution of the program. It also requires the software comprising the exception handlers to be significantly more complex than that for the CISC type processors. Neither result is attractive. Both detract from inherent advantage of the VLIW processor.
In my copending application entitled PIPELINE ENABLE BITS, S.N. filed of even date herewith, the content of which is incorporated by reference in its entirety, I disclose an improvement, applicable to a VLIW computer and possibly to other computers as well, through which control information, a group of bits, therein referred to as enable bits, is linked to the molecule, and each of those bits pertains to a respective one of the individual atom instructions within a molecule. Those bits help processing by indicating whether the associated atom is to be executed or not when the instruction is present at the execution stage of the pipeline. The execution units interpret those bits and execute the instruction (or not) accordingly.
As the molecule progresses through the processors instruction pipeline, stage by stage, that control information also progresses along what is therein referred to as a control information pipeline, also containing multiple stages, in synchronism with the progress of the molecule through the instruction pipeline. At the execution stage, both the control information and the individual atoms of the molecule are presented in parallel to respective execution units for those atoms. Each execution unit checks the information pipeline for information pertinent to the respective atom presented for execution, prior to any execution.
In a specific embodiment described in that application, the VLIW computer referred to therein contained four functional units responsible for execution of atoms. One enable bit is included for each of those four execution units and the four bits in parallel defines the information packet.
By default all enable bits are set (xe2x80x9c1xe2x80x9d) when the molecule is first introduced to the instruction pipeline. When the atoms in the molecule are presented again for execution, the respective functional unit first checks to ensure the one (of the four) enable bits relevant to the functional unit (pertaining to the respective atom) is set. If disabled (xe2x80x9c0xe2x80x9d), the functional unit does not execute the respective atom. If set, the functional unit executes the atom. Alternatively, the execution unit determines that the atom is subject to an exception, in which case an exception is taken. That action is communicated to the pipeline control unit. With an exception, the functional unit issues a xe2x80x9cglobal killxe2x80x9d signal to clear the instruction pipeline of all atoms awaiting execution, including the atom responsible for the exception, and pauses all other operations.
Responding to an exception, the pipeline control unit saves the VLIW instruction address (which, as later herein described, permits the instruction to later be reasserted in the pipeline) and also saves the accompanying packet of enable bits, placing those bits within a register, therein called the error register, and selects and calls up exception handler software.
The exception handler handles the exception and then issues a return-from-exception to the pipeline control unit. Prior to issuing the return-from-exception to the pipeline control unit, the handler (if required for by its design) also writes to the aforementioned error register and disables the enable bit associated with the atom responsible for the exception.
Upon the return-from-exception, the pipeline control unit reasserts the same VLIW instruction (address) in the instruction pipeline together with the packet of enable bits. The pipeline control unit retrieves that packet from the error register, where temporarily stored, and transfers that data into the respective control information pipeline. Since the enable bit associated with the atom that was responsible for the exception is now disabled (xe2x80x9c0xe2x80x9d), that atom cannot be executed when the molecule again reaches the pipeline execution stage. The remaining atoms in the molecule for which the associated enable bits remain set (xe2x80x9c1xe2x80x9d) are able to be executed (or, when checked, may also be found subject to an exception, in which case the procedure is repeated for such atom).
If for a particular type of exception, the exception handler resolves the exception without necessitating disablement of the atom that produced the exception, when the molecule is reasserted in the instruction pipeline, as above described, that atom now executes, since the exception handler already resolved the condition that initially caused the exception.
The foregoing procedure avoids the slower processing speed as would be occasioned by the use of more complex software for the exception handlers as required those handlers to handle or otherwise account for the remaining atoms in the molecule and permit them to be presented for execution to their respective functional units. Generally speaking, based principally on hardware, the control information pipeline provides a fast and efficient means to permit reassertion of a VLIW molecule in the instruction pipeline, while permitting software, the exception handler, to control whether individual atoms within a molecule are executed by the respective functional unit.
In addition to exceptions, other out-of-the ordinary actions, similar in effect to exceptions, are found to occur from time to time in the execution of instructions that also may slow down the speed of processing. One of those actions is an unaligned memory operation. The present invention deals with those events in a new way that also makes use of a synchronized control information pipeline.
The memory unit controls access to the memory where various digital information, such as instructions and data, is stored for potential use during processing. Typically, any piece of digital information, such as data, is stored in a self-contained unit of memory, that is, a certain string or number of adjacent memory cells all in a xe2x80x9csegmentxe2x80x9d or xe2x80x9crowxe2x80x9d of memory (a precisely defined multi-byte wide region aligned to a boundary, as an example, a four-byte aligned block of memory). When a load instruction requires the memory unit to retrieve the digital information at a certain address in memory, the memory unit retrieves that digital information in a single operation. Since the information is self-contained within a defined segment off memory, the information is referred to as xe2x80x9calignedxe2x80x9d. The memory unit need only access the specified row of memory, and read the specified number of bits containing the desired information.
In some instances, the digital information sought is not aligned, but bridges more than the single continuous segment of memory, a condition referred to as xe2x80x9cunaligned dataxe2x80x9d. Companion references to the latter condition sometimes refer to the condition as unaligned memory and to the instruction as an unaligned instruction, despite such references being literally incorrect. Retrieval of unaligned data from memory requires the memory unit to access memory twice, obtaining each part separately. Such an operation can be handled by adding complex hardware to the memory unit. As an example, in addition to the pipeline control unit maintaining state information on the pipeline, each execution unit would be required to also maintain that state. Not only does the foregoing increase complexity, but it increases the chance for errors to occur.
The present invention takes advantage of and adapts the foregoing control information pipeline hardware and software technique that links one or more bits (sometimes referred to as xe2x80x9chelp bitsxe2x80x9d) to atom instructions to mark, annotate or, as otherwise termed, tag, as necessary, atom instructions with control information that allows more efficient handling of an unaligned memory operation, thereby further enhancing the efficiency of processing operations.
As an advantage the invention permits VLIW computers, such as those constructed in accordance with the ""205 Kelly patent, to gain in performance without increasing the complexity of the hardware or software. As a further advantage, the invention permits continued reassertion of a VLIW instruction (molecule) containing an unaligned atom without re-executing other atoms in the molecule.
Accordingly, an object of the invention is to improve the internal operating efficiency of a microprocessor, more particularly, a VLIW microprocessor.
A further object of the invention is to provide a new more efficient process and apparatus internal to a VLIW microprocessor for handling those atoms in a molecule (VLIW instruction) that require obtaining data from an unaligned memory location.
In accordance with the invention, a processor includes an instruction pipeline, a control information pipeline and a pipeline control unit that operates the instruction pipeline and the control information pipeline in synchronism. The pipeline control unit originates a packet of help bits, and contains the means for appropriately marking and introducing the help bits in the control information pipeline when an instruction is asserted, whereby both the instruction and help bits progress through the respective pipeline stages in synchronism. At the execution stage of the pipeline, respective execution units interpret those help bits, and function in accordance with that interpretation.
Further in accordance with the invention, the help bits signify an unaligned memory operation, specifically, the first or second part of the unaligned memory location.
As an advantage, a processor may be modified to include multiple control information pipelines to serve separate and distinct control operations. As an example, the foregoing control information pipeline may be combined in a single processor with the xe2x80x9cenable bitxe2x80x9d control information pipeline described in my copending application.