All data processors execute some instruction by which they move the results of their internal calculations to the remainder of the data processing system of which they are part and vice versa. The results of these calculations may be moved to an external memory system for storage and later use, to a CRT for display to a human user, or to a network for transmission to another data processing system. These instructions are often referred to as "store" or "write" instructions. Conversely, data may be moved from an external memory system or from an input/output ("I/O") device such as a keyboard to the data processing system where it is processed. These instructions are often referred to gas "load" or "read" instructions.
There are a wide variety of load/store instructions even within a particular computer architecture. A variety of load/store instructions eases the burden on the software programmer. These various instructions differ from each other in the amount of data transferred, the calculation of the address of the data source or destination, the format of the data, etc. One type of load/store instruction is a "load multiple" or "store multiple" instruction. These two instructions load the contents of a series of sequential memory locations into a series of sequential internal registers or vice versa during several processor clock cycles. The number of memory locations, the starting address of the memory address, and the starting address of the internal register are determined by the instruction format and its operands.
A load/store multiple instruction is a difficult instruction to implement in a pipelined data processor. Certain data processors, particularly reduced instruction set computers ("RISC") and some complex instruction set computers ("CISC"), overlap instruction processing to increase performance. This strategy is often defeated by a load/store multiple instruction. For instance, a single load multiple instruction may update every internal register. All subsequent instructions in the instruction stream must be delayed until the relevant data is available. Conversely, a single store multiple instruction may output every internal register. This instruction must be delayed until the data is available. Meanwhile, the store multiple instruction must be buffered in some type of internal queue. This strategy thereby precludes other instructions from using these resources.