The present invention relates to the field of data processing. In particular, the present invention relates to the interleaved storage of data elements.
In a Single Instruction Multiple Data (SIMD) architecture, a data processing element (also referred to as a consumer) may execute a single instruction on several data elements at once. The data processing element acquires the data elements from storage circuits. Each data element is then inserted into a separate lane of the processing element and a single instruction can then execute on each of the lanes in parallel. Consequently one instruction is executed on many data elements at the same time, thereby providing data parallelisation.
The processing element may acquire the plurality of data elements from a plurality of general purpose registers which collectively form a SIMD register. In order that the processing element can access the bits that make up the data elements in parallel, it is necessary for each of the general purpose registers to be provided in a separate register bank. If two such registers were provided in the same register bank it would require two accesses to that register bank (or multiple access ports) in order to retrieve the bits stored therein. Since each access to a register bank takes time, latency of the processing element would be increased.
Often, the processing element is less wide than the SIMD register. That is, the processing element may hold fewer bits than the SIMD register is able to store. The processing element may therefore elect to either handle the data elements stored in the SIMD register in batches, or else may iteratively handle a subset of the bits of each of the data elements stored in the SIMD register. For example, the top 32 bits of every data element may be handled first, followed by the bottom 32 bits of every data element. The technique that is used by the processing element at any particular instant may depend on the operation being carried out and, in particular, which technique will be most efficient.
The registers making up the SIMD register may be accessed individually or in combination to acquire the necessary data for the processing element. It is therefore necessary to provide muxing logic between the banks of registers that that make up or provide the SIMD registers. However, each of the register banks and the muxing logic both consume space and consume power, which are disadvantageous.