Computer applications executed on a host processor often access stored data in a non-sequential manner, so the accessed data values are not stored at contiguous memory addresses and resources are not used efficiently. For example, a contiguous block of data may be loaded into a cache line even though only a single value in the line is to be accessed. Thus, when data is not contiguous, cache lines are loaded and evicted more often than when data is contiguous. The problem may be mitigated by ‘gathering’ or packing data to be put into the cache from non-contiguous memory address. Data returned to the memory must be unpacked to contiguous memory locations, or ‘scattered’. A disadvantage of this approach is that the number of transfers to and from memory is not reduced for the packing phase. For wider vectors, the number of transfers from memory to fill them can be quite significant with very low utilization. In addition, gather and scatter operations are also typically limited in the scope of rearrangement to bit-vector addresses or addresses dynamically generated by the host processor.