Embodiments of the present invention relate to data processing and more particularly to processing vector operations, such as vector memory operations.
Certain processors such as microprocessors are configured to operate on different types of data. Some processors include support for operations on vector data. Such vector data is typically of a wider length than scalar operands. For example, vector data may be formed of a plurality of vector elements, each corresponding to a scalar operand. Various instruction set architectures (ISAs) include support for certain vector operations. In some instruction sets, there are instructions aimed to perform arbitrary-strided and non-strided vector memory accesses. These instructions are commonly referred to as gather (load or memory read) and scatter (store or memory write) instructions. In a gather/scatter instruction, a user provides a set of arbitrary addresses or offsets. Gather and scatter instructions are fundamental tools for a programmer and a vector compiler to produce efficient vector code that deals with one or more levels of memory indirections.
Accordingly, most vector instruction sets offer a flavor of memory access that allows reading or writing a collection of arbitrary memory locations. Typical gather/scatter instructions in a vector ISA are of the form:                Gather [v1]→v2; and        Scatter v1→[v2]where v1 and v2 are vector registers, each of which includes a plurality of base registers. In a gather instruction, the data contained in the source register v1 is used as a set of memory addresses. For each address, a processor capable of executing the instruction would fetch the corresponding data located in memory at the specified address and place it in the corresponding position in the destination register v2.        
Scatter instructions perform the reverse operation, where the source register v1 contains arbitrary data and the destination register v2 contains a set of memory addresses. Each data element in v1 is stored in memory at the location indicated by the corresponding address in v2. Some vector instruction sets have a global register that is added to the described addresses to construct a final memory address.
There are two fundamental strategies to implement gather/scatter instructions in hardware. In a first strategy, hardware generates each address in the gather/scatter in sequence and dispatches the memory requests (either reads or writes) in sequence. Such a strategy is somewhat cumbersome and ineffective, and reduces the efficiency of vector operations which seek to perform a single instruction on multiple data simultaneously. A second strategy seeks to perform multiple simultaneous accesses to a closest memory element (e.g., a cache).
However in performing the simultaneous accesses, conflicts between the data elements and portions of the memory hierarchy are to be avoided. That is, when sending multiple vector elements out to a cache memory, a portion of the cache memory can only receive a single data element during a cycle. Accordingly, various control schemes are used to avoid such conflicts. These resolution mechanisms however are relatively inefficient and are not optimized for either the specific data nor the memory elements. Accordingly a need exists for improved implementation of vector operations and more specifically vector memory operations.