1. Field of the Present Invention
The present invention generally relates to the field of microprocessor architectures and more particularly to a microprocessor utilizing an instruction group architecture, a corresponding cache facility, and useful extensions thereof.
2. History of Related Art
As microprocessor technology has enabled gigahertz performance, a major challenge for microprocessor designers is to take advantage of state-of-the-art technologies while maintaining compatibility with the enormous base of installed software designed for operation with a particular instruction set architecture (ISA). To address this problem, designers have implemented xe2x80x9clayered architecturexe2x80x9d microprocessors that are adapted to receive instructions formatted according to an existing ISA and to convert the instruction format of the received instructions to an internal ISA that is more suitable for operation in gigahertz execution pipelines. Turning to FIG. 4, selected portions of a layered architecture microprocessor 401 are presented. In this design, an instruction cache 410 of microprocessor 401 receives and stores instructions fetched from main memory by a fetch unit 402. The instructions stored in instruction cache unit 410 are formatted according to a first ISA (i.e., the ISA in which the programs being executed by processor 401 are written). Instructions are then retrieved from instruction cache 410 and converted to a second ISA by an ISA conversion unit 412. Because the conversion of instructions from the first ISA to the second ISA requires multiple cycles, the conversion process is typically pipelined and, accordingly, there may be multiple instructions being converted from the first ISA to the second ISA at any given time. The converted instructions are then forwarded for execution in the execution pipelines 422 of processor 401. The fetch unit 402 includes branch prediction logic 406 that attempts to determine the address of the instruction that will be executed following a branch instruction by predicting the outcome of the branch decision. Instructions are then speculatively issued and executed based on the branch predictions. When a branch is mispredicted, however, the instructions that are pending between instruction cache 410 and finish stage 432 of processor 401 must be flushed. The performance penalty that is incurred when a mispredicted branch results in a system flush, is a function of the length of the pipeline. The greater the number of pipeline stages that must be flushed, the greater the branch mispredict performance penalty. Because the layered architecture adds to the processor pipeline and increases that number of instructions that are potentially xe2x80x9cin flight,xe2x80x9d at a given time, the branch mispredict penalty associated with a layered architecture can become a limiting factor in the processor""s performance. It would therefore be highly desirable to implement a layered architecture microprocessor that addressed the branch mispredict performance penalty. In addition, it would be further desirable if the implemented solution addressed, at least in part, repetitive occurrences of exception conditions resulting from repeated execution of a piece of code. It would be further desirable if implemented solution enabled an effectively larger issue queue without sacrificing the ability to search the issue queue for the next instruction to execute.
The problems identified above are in large part addressed by a microprocessor that utilizes instruction groups and a cache facility that is matched to the instruction group format. One embodiment of the invention contemplates a microprocessor and an associated method and data processing system. The microprocessor includes an instruction cracking configured to receive a first set of microprocessor instructions. The cracking unit organizes the set of instructions as an instruction group where each of the instructions in the group shares a common instruction group tag. The processor further includes a basic block cache facility that is organized with the instruction group format and is configured to cache the instruction groups generated by the cracking unit. An execution unit of the processor is suitable for executing the instructions in an instruction group. In one embodiment, when an exception is generated during execution of an instruction in the instruction group that causes a flush, the flush flushes only those instructions that have been dispatched from the basic block cache. By flushing only those instructions that have arrived at the basic block cache, the processor spares the instructions pending in the cracking unit pipeline from being flushed. Because fewer instructions are flushed, the exception performance penalty is reduced. In one embodiment, the received instructions are formatted according to a first instruction format and the second set of instructions are formatted according to a second instruction format wherein the second instruction format is wider than the first instruction format. The basic block cache is suitably configured to store each instruction group in a corresponding entry of the basic block cache. In one embodiment, each entry in the basic block cache includes an entry field indicative of the corresponding basic block cache entry and a pointer predictive of the next of the instruction group to be executed. The processor is preferably configured to update a pointer of a cache entry responsive to a mispredicted branch.
The invention further contemplates a processor, data processing system and method utilizing instruction history information in conjunction with the basic block cache to improve performance. The processor is suitable for receiving a set of instructions and organizing the set of instructions into an instruction group. The instruction group is then dispatched for execution. Upon executing the instruction group, instruction history information indicative of an exception event associated with the instruction group is recorded. Thereafter, the execution of the instruction is modified responsive to the instruction history information to prevent the exception event from occurring during a subsequent execution of the instruction group. The processor includes a storage facility such as an instruction cache, an L2 cache or a system memory, a cracking unit, and a basic block cache. The cracking unit is configured to receive a set of instructions from the storage facility. The cracking unit is adapted to organize the set of instructions into an instruction group. The cracking unit may modify the format of the set of instructions from a first instruction format to a second instruction format. The architecture of the basic block cache is suitable for storing the instruction groups. The basic block cache includes an instruction history field corresponding to each basic block cache entry. The instruction history information is indicative of an exception event associated with the instruction group. In the preferred embodiment, each entry in the basic block cache corresponds to a single instruction group generated by the cracking unit. The processor may further include completion table control logic configured to store information in the instruction history field when the instruction group completes. The instruction history information may be indicative of whether an instruction in the instruction group has a dependency on another instruction or may be indicative of whether the execution of the instruction group previously resulted in a store forwarding exception. In this embodiment, the processor is configured to execute in an in-order-mode responsive to detecting that the execution of the instruction group previously resulted in the store forwarding exception.
The invention still further contemplates a processor, data processing system and an associated method utilizing primary and secondary issue queues. The processor is suitable for dispatching an instruction to an issue unit. The issue unit includes a primary issue queue and a secondary issue queue. The instruction is stored in the primary issue queue if the instruction is currently eligible to issue for execution. The instruction is stored in the secondary issue queue if the instruction is currently ineligible to issue for execution. The processor determines the next instruction to issue from the instructions in the primary issue queue. An instruction may be moved from the primary issue queue to the secondary issue queue if instruction is dependent upon results from another instruction. In one embodiment, the instruction may be moved from the primary issue queue to the secondary issue queue after issuing the instruction for execution. In this embodiment, the instruction may be maintained in the secondary issue queue for a specified duration. Thereafter, the secondary issue queue entry containing the instruction is deallocated if the instruction has not been rejected. The microprocessor includes an instruction cache, a dispatch unit configured to received instructions from the instruction cache, and an issue unit configured to receive instructions from the dispatch unit. The issue unit is adapted to allocate dispatched instructions that are currently eligible for execution to a primary issue queue and to allocate dispatched instructions that are not currently eligible for execution to a secondary issue queue.