Data is commonly transferred between memory locations in order to be processed by a processor. For example, data may be copied from dynamic random access memory (DRAM) to local or shared memory for processing during a streaming application. However, current techniques for copying data between memory locations have been associated with various limitations.
For example, data may be first copied from external memory to a register file, where it is then transferred to local scratchpad memory within a processor. This may result in a limited number of outstanding data transfers, bottlenecking, wasted power, etc. There is thus a need for addressing these and/or other issues associated with the prior art.