Dedicated pipeline queues have been used in multi-pipeline execution (EX) units of processors, (e.g., central processing units (CPUs), graphics processing units (GPUs), and the like), in order to achieve faster processing speeds. In particular, dedicated queues have been used in conjunction with EX units having multiple EX pipelines that are configured to execute different subsets of a set of supported micro-operations, (i.e., micro-instructions). Dedicated queuing has generated various bottlenecking problems and problems for the scheduling of micro-operations that required both numeric manipulation and retrieval/storage of data.
Processors are conventionally designed to process operations that are typically identified by operation (Op) codes (OpCodes), (i.e., instruction codes). In the design of new processors, it is important to be able to process all of a standard set of operations so that existing computer programs based on the standardized codes will operate without the need for translating operations into an entirely new code base. Processor designs may further incorporate the ability to process new operations, but backwards compatibility to older operation sets is often desirable.
Operations (Ops) represent the actual work to be performed. Operations represent the issuing of operands to implicit (such as add) or explicit (such as divide) functional units. Operations may be moved around by a scheduler queue.
Operands are the arguments to operations, (i.e., instructions). Operands may include expressions, registers or constants.
Execution of micro-operations (uOps) is typically performed in an EX unit of a processor core. To increase speed, multi-core processors have been developed. To facilitate faster execution throughput, “pipeline” execution of operations within an execution unit of a processor core is used. Cores having multiple execution units for multi-thread processing are also being developed. However, there is a continuing demand for faster throughput for processors.
One type of standardized set of operations is the operation set compatible with “x86” chips, (e.g., 8086, 286, 386, and the like), that have enjoyed widespread use in many personal computers. The micro-operation sets, such as the x86 operation set, include operations requiring numeric manipulation, operations requiring retrieval and/or storage of data, and operations that require both numeric manipulation and retrieval/storage of data. To execute such operations, execution units within processor cores have included two types of pipelines: arithmetic logic pipelines (“EX pipelines”) to execute numeric manipulations, and address generation (AG) pipelines (“AG pipelines”) to facilitate load and store operations.
In order to quickly and efficiently process operations as required by a particular computer program, the program commands are decoded into operations within the supported set of micro-operations and dispatched to the EX unit for processing.
A shifter in the EX unit may perform several x86 instructions that require shifting or rotating the data in a register or data from memory, e.g., rotate left (ROL), rotate right (ROR), shift left (SHL), which is identical to shift arithmetic left (SAL), shift right (SHR), shift arithmetic right (SAR), and the like. These instructions may be 8-bit, 16-bit, 32-bit or 64-bit operations.
A method and apparatus are needed to improve the latency of shift operation execution by shifting or rotating this data and generating results and flags within a single phase or half-cycle to meet high core frequency targets and limited silicon area.