1. Field of the Invention
The present invention relates to the field of data processing and, in particular, to the field of SIMD data processing in which data processing instructions perform a data processing operation in a number of parallel lanes of processing on respective data elements from within a source register so as to generate respective data elements within a destination register.
2. Description of the Prior Art
In the field of data processing, it is known to provide immediates of a limited data size within the instruction. If a larger constant is required, then the prior art approach is to store this in memory and perform a load to move the stored value to a register. A drawback of this is that to access the constant a memory access needs to be made which is expensive in both time and power. In addition, it can be seen that the need to provide a load instruction in order to load the constant from memory to a specified register adversely impacts code density. Further, there is a limited amount of bandwidth available for performing load operations, and accordingly it can be seen that each time a load operation is required to move a constant from memory to a register, this will use a portion of that load bandwidth, which is then not available for use in performing other load operations required by the program.
The smaller data size immediates that are provided along with an instruction may be moved into a register and the rest of the register populated with zeros, thereby providing a constant which maybe of a more suitable data size for use in certain types of data processing. The actual amount of data provided with the immediate is however, still very limited.
As an enhancement to the above approach, ARM Limited have provided in their instruction set an instruction encoding which allows an 8-bit immediate value to be specified, and then a further 4-bits to be specified to identify a rotation to be applied to the immediate value in order to specify its location within a register (with the remaining bits of the register being filled with a predetermined sequence of ones or zeros). By allowing an immediate value to be specified, along with a rotation to be applied to that immediate value in order to determine its location within the register, further flexibility in the choice of integer constants is provided. However, the range of constants that can be generated in this way is still very limited.
It is known to provide SIMD (single instruction multiple data) processors in which a data processing operation upon a specified register results in parallel operations being performed upon multiple data elements stored within that register, each of those elements being treated as part of a lane of processing. The processing lanes are isolated from one another to the degree necessary to ensure that the processing within one lane does not inappropriately influence the processing in any of the other lanes. This may have significant advantages, particularly in fields where a large amount of data needs to be processed in the same way, such as video data where the same operations need to be performed on a large number of pixels.
The provision of immediates with instructions in SIMD processing provides its own problems, with the immediates often not being of a suitable form for processing certain operations on a plurality of data elements within a plurality of lanes. In such cases it may be appropriate to provide immediates for each data element and it is not at all obvious how this can be done unless a suitable data value of large size is stored in memory.