1. Field of the Invention
This invention relates to computing systems, and more particularly, to efficiently rotating data for multiple modes of a processor.
2. Description of the Relevant Art
The geometric dimensions of devices and metal routes on each generation of processor cores continue to decrease. Superscalar designs increase the density of integrated circuits (ICs) on a die with multiple pipelines, larger caches, and more complex logic. Cross-capacitance effects grow with decreasing geometric dimensions. Cross-capacitance increases the power consumption and noise effects on the chip. The noise effects increase the propagation delays of signals on a chip. Wide buses typically increase noise effects as geometric dimensions decrease and lines are brought closer together.
Ideally, every clock cycle produces useful execution of an instruction for each stage of a pipeline. An integer execution unit (IEU), or an execution core, executes several single-cycle instructions, such as addition, incrementing, subtraction, shifting and rotation. However, one or more of these instructions may become a critical path for the processor as the geometric dimensions decrease and the operational frequency increases.
The rotation of data is typically used for manipulating data fields such as data extraction, insertion and alignment. For example, data misalignment occurs in cached processor designs. Typically, when a misalignment is detected, two reads of consecutive caches lines are performed followed by an alignment operation to obtain the requested data. In addition, a rotate unit within an execution core may be configured to support of different operand sizes. In one example, a 64-bit processor achieves instruction set architecture legacy when the 64-bit processor is configured to support 32-bit instructions. In such a case, the processor may be configured to support rotations of both 64 and 32 bit operands.
One approach for processor to support both 64 bit and 32 bit rotations, is to include both a 32-bit rotator and a 64-bit rotator within the execution core(s). However, this solution consumes on-die real estate by having two rotators and may also add additional delay by adding a 2:1 mux to the critical path to select an appropriate result. A second approach is to detect a 32-bit rotate and in response duplicate the 32-bit rotate data inputs and send them to both the higher order (most significant) 32 bits and the lower order 32 bits of the 64-bit rotator. However, this second solution may increase the data input load and reduce the speed of the rotation.
In view of the above, efficient methods and mechanisms for efficiently rotating data for multiple modes of a processor are desired.