Reduced Instruction Set Computer (RISC) processors have become well known and are used in all facets of modern society. RISC processors are generally designed with instruction sets that facilitate the use of a technique known as pipelining. Pipelining enables a processor to simultaneously process different stages of instructions. In so doing, RISC processors exploit parallelism that exists among the steps needed to process an instruction. Exploiting such instruction level parallelism allows RISC processors to execute more instructions in a shorter period of time.
Modern Complex Instruction Set Computer (CISC) processors can also benefit from instruction level parallelism by translating their instructions into micro-operations (i.e., instructions similar to those of a RISC processor). These micro-instructions can then be processed in a pipeline fashion to obtain the benefits of pipelined processing.
RISC processors are inherently inefficient at moving data between memory locations. This is because, conventional memory moves involve a load word instruction to load data from a source memory location to a general purpose register (GPR) and a subsequent store word instruction to copy the loaded data from the GPR to a destination memory location.
Execution of the load word instruction causes a load/store unit (LSU) to interface with a bus interface unit (BIU) to access data located in the source memory location. The BIU accesses the contents of the source memory location and transfers the contents to the LSU. The LSU stores the transferred contents in a GPR.
After the data is stored in the GPR, the execution unit executes a store word instruction to store the data in the GPR into a destination memory location. Execution of the store word instruction causes the LSU to move the data in the GPR to the BIU. The BIU then accesses the memory to store the data in the destination memory location, thereby completing a copy of the data from the source memory location to the destination memory location.
Executing the load word and store word instruction consumes processing resources, which limits processor throughput. This inefficiency becomes particularly acute for loads and stores having relatively high latency.