In existing processor architectures, operands usually are loaded into registers from memory using a special ‘load’ instruction, and then sent to the execution unit for corresponding operations. After the execution is finished, the execution result is first stored in a register, and then written into memory from the register using a special ‘store’ instruction. Even for a processor able to obtain operands with direct memory addressing, due to the limitations on the number of memory ports and bandwidth, its execution unit cannot obtain all operands directly from the memory, but has to load certain operands from memory to registers. Thus, both memory and registers are used to provide all operands to the execution unit.
On the other hand, a cache is often provided to duplicate a part of contents of the memory (or operands) in the cache, so the contents can be quickly accessed by a processor in a short time in order to ensure continuous operations of a processor pipeline. However, even if the operands are in the cache as a mirrored part of the memory, some or all of the operands must still be loaded into the registers in order for the execution unit to use the operands.