A processor typically interacts with a memory subsystem to store and retrieve data. For some applications, it can be useful to copy data stored at one block of memory (the source block) to another block (the destination block). Processors typically do not include special instructions for memory copy operations, whereby in response to a memory copy operation request the processor executes a set of load and store instructions to copy data from one block to another. Each load instruction of the memory copy operation loads a portion of the source block into a register, and each store instruction of the memory copy operation stores the data at the register to the destination block. When the data to be copied is not present at a low-level data cache, the load and store operations are time consuming and inefficient.