Technical Field
Embodiments described herein generally relate to processors. In particular, embodiments described herein generally relate to utilization of registers in processors.
Background Information
Many processors have Single Instruction, Multiple Data (SIMD) architectures. In SIMD architectures, a packed data instruction, vector instruction, or SIMD instruction may operate on multiple data elements (e.g., multiple pairs of data elements) concurrently (e.g., in parallel). The processor may have parallel execution hardware responsive to the packed data instruction to perform the multiple operations concurrently (e.g., in parallel).
Multiple data elements may be packed within registers or memory locations as packed data. In packed data, the bits of the registers or other storage locations may be logically divided into a sequence of data elements. For example, a 64-bit wide packed data register may have two packed 32-bit data elements, four packed 16-bit data elements, or eight packed 8-bit data elements.
In some processors, there has been a progressive increase over the years in the width of the packed data operands. The increase in the width of the packed data operands generally allows more data elements to be processed concurrently (e.g., in parallel), which generally tends to improve performance. For example, when 128-bit packed data is used eight 16-bit data elements may be processed concurrently instead of just four 16-bit data elements in the case of 64-bit packed data.
However, one possible drawback to such use of wider packed data is a possible corresponding increase in the size of the registers and register files. For example, expanding each register of a set of 64-bit registers so that they are each 128-bit registers will likely approximately double the size of the registers (e.g., the area or footprint occupied by the registers on die). The impact will likely even be larger in implementations where there are more physical registers implemented than architectural registers since the size of a greater number of registers may be approximately doubled. Another possible drawback to such an increase in the size of the registers and register files is a corresponding increase in the amount of data (e.g., state or context) that needs to be moved to and from the registers on context switches, power mode state saves, and like transitions. For example, for each register, 128-bits in the case of a 128-bit wide register, instead of just 64-bits in the case of a 64-bit wide register, may need to be moved to and from the register.