Conventionally processors are designed to process operations that are typically identified by operation codes (opcodes). In the design of new processors, it is important to be able to process all of a standard set of operations so that existing computer programs based on the standardized codes will operate without the need for translating operations into an entirely new code base. Processor designs may further incorporate the ability to process new operations, but backwards compatibility to older instruction sets is often desirable.
Dedicated pipeline queues have been used in multi-pipeline execution units of microprocessors in order to achieve faster processing speeds. In particular, dedicated queues have been used for execution units having multiple pipelines that are configured to execute different subsets of a set of supported microinstructions. Dedicated queuing has generated various bottlenecks and problems for the scheduling of microinstructions that require both numeric manipulation and retrieval/storage of data.
Additionally, microprocessors are conventionally designed to process microinstructions that are typically identified by opcodes. In the design of new microprocessors, it is important to process all of a standard set of microinstructions so that existing computer programs based on standardized codes will operate without the need for translating microinstructions into an entirely new code base. Microprocessor designs may further incorporate the ability to process new microinstructions, but backwards compatibility to older instruction sets is often desirable.
Execution of microinstructions/operations is typically performed in an execution unit of a processor core. To increase speed, multi-core processors have been developed. Also to facilitate faster execution throughput, “pipeline” execution of operations within an execution unit of a processor core is used. Cores having multiple execution units for multi-thread processing are also being developed. However, there is a continuing demand for faster throughput for processors.
One type of standardized set of operations is the instruction set compatible with prior “x86” architectures that have enjoyed widespread use in many personal computers. The microinstruction sets, such as the “x86” instruction set, include operations requiring numeric manipulation, operations requiring retrieval and/or storage of data, and operations that require both numeric manipulation and retrieval/storage of data. To execute such operations, execution units within processor cores have included two types of pipelines: arithmetic logic pipelines (“EX pipelines”) to execute numeric manipulations and address generation pipelines (“AG pipelines”) to facilitate load and store operations.
In order to quickly and efficiently process operations as required by a particular computer program, the program commands are decoded into operations within the supported set of microinstructions and dispatched to the execution unit for processing. Conventionally, an opcode is dispatched that specifies what operation/microinstruction is to be performed along with associated information that may include items such as an address of data to be used for the operation and operand designations.
Dispatched instructions/operations are conventionally queued for a multi-pipeline scheduler of an execution unit. Queuing is conventionally performed with some type of decoding of a microinstruction's opcode in order for the scheduler to appropriately direct the instructions for execution by the pipelines with which it is associated within the execution unit.
In the x86 format instruction, an instruction may have an opcode including one or two opcode bytes, a modify register or memory (“mod r/m”) byte, a scale-index-base (“sib”) byte, displacement bytes, and immediate data bytes. These opcodes are also known as simple opcodes. The opcode specifies the operation code, and may also contain a register identifier. The mod r/m byte specifies whether an operand is in a register or in memory. If the operand is in memory, fields in the mod r/m byte specify the addressing mode to be used. Certain encodings of the mod r/m byte indicate that a second byte, the sib byte, follows to fully specify the addressing mode. The sib byte includes a 2-bit scale field, a 3-bit index field, and a 3-bit base field. These fields are used in complex memory addressing modes to specify how address computation is done. The displacement byte is used in address computation. The immediate data byte is used for an instruction operand. One or more additional bytes, known as prefix bytes, may appear before the opcode byte. The prefix byte changes the interpretation of the instruction, adding additional complexity. The length of an instruction in the x86 instruction formats is variable. The minimum instruction includes a single opcode byte and is 8 bits long. A long instruction that includes a prefix byte may be 104 bits long. Longer instructions containing more than a single prefix byte may also be possible.
Some of the opcodes passed from Decode (DE) to arithmetic logic (EX) operation stage are Complex Operations (complex opcodes) that comprise a load operation with a simple opcode; a store operation with a simple opcode; or a load-store operation. Processing such complex opcodes in an arithmetic logic pipeline (EX pipeline) design is problematic because one part of the complex opcodes must be completed before the other part is processed. Also the internal complex opcode dependency needs to be properly addressed for aligning internal sources and destination register numbers.
One skilled in the art would recognize that due to the problems associated with processing complex opcodes, there is an increase in chip area and power requirements for a scheduler block in the processor while simultaneously decreasing the processing efficiency of the execution unit since most instructions are uni-cyclical.