Technical Field
Embodiments described herein generally relate to processors. In particular, embodiments described herein generally relate to accessing data in memory with processors.
Background Information
Many processors have Single Instruction, Multiple Data (SIMD) architectures. In SIMD architectures, a packed data instruction, vector instruction, or SIMD instruction may operate on multiple data elements (e.g., multiple pairs of data elements) concurrently or in parallel. Multiple data elements may be packed within a register or memory location as packed data. In packed data, the bits of the register or other storage location may be logically divided into a sequence of data elements. For example, a 64-bit wide packed data register may have two packed 32-bit data elements, four packed 16-bit data elements, or eight packed 8-bit data elements. The processor may have parallel execution hardware responsive to the packed data instruction to perform the multiple operations concurrently (e.g., in parallel).
In some processors, there has been a progressive increase over the years in the width of the packed data operands. This increase in width of the packed data operands generally allows more data elements to be processed concurrently (e.g., in parallel), which generally helps to improve performance. For example, a 128-bit wide packed data operand may have four 32-bit data elements (instead of just two in the case of a 64-bit wide packed data operand), eight packed 16-bit data elements (instead of just four in the case of a 64-bit wide packed data operand), and so on.
In certain processors, the increase in the width of the packed data operands is accompanied by a corresponding increase in the width of the registers. However, one possible drawback to increasing the width of the registers is an increase in the area or footprint occupied by the registers on die. For example, expanding each register of a set of 64-bit registers so that they are each 128-bit registers will likely approximately double the area or footprint occupied by the registers on die. The impact will likely even be larger in implementations where there are more physical registers implemented than architectural registers since the size of a greater number of registers may be approximately doubled. Another possible drawback to such an increase in the width of the registers is a corresponding increase in the amount of state, context, or other data stored in the registers that needs to be moved to and from the registers (e.g., saved and restored) on context switches, power mode state saves, and like transitions. For example, for each register 128-bits instead of just 64-bits may need to be swapped in and out on context changes.
Processors typically execute instructions to load data (e.g., packed data operands) from memory and store data (e.g., packed data operands) to memory. For example, a processor may execute a load from memory instruction to load or read a packed data operand from the memory into a destination register. The processor may execute a write to memory instruction to write or store a packed data operand from a source register to the memory.