1. Field of the Invention
This invention relates in general to microprocessors, and more particularly, to microprocessor architectures and methods for processing complex or non-RISC instructions within the processor.
2. Relevant Background
In order to improve the overall performance of a computer processor (also called a microprocessor), modern processor architectures are designed to optimally execute a particular subset or "core" set of the entire instruction set supported by the processor. The processor hardware architectures are designed such that the core instructions can be efficiently executed within the processor utilizing a minimum, limited, or optimum number of resource in the processor. These instructions therefore are processed in the most efficient manner within the processor, generally within a minimum number of clock cycles.
Non-core instructions are referred to as a "complex" instruction in that executing the instruction requires more of the processor's resources than do the core instructions. To execute a complex instruction, conventional processor designs expand a complex instruction into two or more micro instructions comprised of core instructions. The processor can process the microinstructions in an efficient manner. However, since the complex instruction is broken into multiple microinstructions, the execution time of the complex instruction is penalized. Examples of these complex instructions include the CAS (compare and swap) instruction and LDD (load double) instruction which are part of the Scaleable Processor Architecture (SPARC) instruction set.
Complex instructions present a challenge for high frequency execution (i.e., a 250 MHz or more clock frequency) in a processor because these instructions place greater demands on the execution resources of the processor than do other core instructions. In particular, in a processor utilizing a "wide-issue" of instructions, wherein the processor fetches a plurality of instructions (known as a fetch bundle) from the instruction cache at one time for placement in one or more instruction pipelines, the handling of complex instructions is problematic. Any instruction in the fetch bundle may be a complex instruction that may need to be expanded into simpler microinstructions. If, for instance, the fetch bundle contains eight complex instructions each expandable into eight microinstructions, then there will be sixty-four microinstructions created for processing. During such expansion, it is desirable to maintain the order in which the instructions were originally programmed.
Furthermore, conventional processor hardware architecture designs generally exact a processing penalty on all instructions, either complex or non-complex, in that each instruction is passed through an expansion stage for creating microinstructions. Typically, these conventional implementations have decode pipeline stages in which the complex instructions are decoded into simpler microinstructions
What is needed is a mechanism for handling complex instructions in a wide-issue processor which reduces the processing penalty exacted upon core or simple instructions and maintains the order in which instructions are to be executed.