The present technique relates to the field of data processing. More particularly, it relates to the processing of vector instructions.
Some data processing systems support processing of vector instructions for which a source operand or result value of the instruction is a vector comprising multiple elements. By supporting the processing of a number of distinct elements in response to a single instruction, code density can be improved and the overhead of fetching and decoding of instructions reduced. An array of data values to be processed can be processed more efficiently by loading the data values into respective elements of a vector operand and processing the data values several elements at a time using a single vector instruction.
One type of vector memory access operation that can be performed accesses a plurality of data values in memory at addresses determined from an address vector operand comprising a plurality of address elements. Such operations provide a great deal of flexibility as they allow data values to be accessed from arbitrary memory locations, with the address of each data value being derived from a corresponding address element in the address vector operand. When loading data values from memory into a vector register, such memory access operations are often referred to as gather memory access operations, as they serve to gather data values from a plurality of address locations and store those data values within a vector register. Similarly, when such operations are used to store data values from a vector register into memory, they are often referred to as scatter memory access operations, as they serve to distribute the data values from a vector register to the identified addresses in memory.
Due to the fact that the addresses involved in such gather or scatter operations can be arbitrary, the processing of such operations typically requires the various access requests to be serialised, such that a series of independent load or store operations are performed. Seeking to do otherwise would come at a significant cost in terms of hardware and thus area and power, and would require additional processing to be performed in a critical timing path, namely the memory access path.
It would be desirable to provide an improved mechanism for handling gather or scatter operations without such additional hardware costs, and without impacting on the timing path to memory.