Data processing typically involves retrieving data from a memory, processing the data, and storing the results of the processing activity back into memory. The hardware architecture supporting this data processing activity generally controls the flow of information and control among individual hardware units of an information processing system. One such hardware unit is a processor or processing engine, which contains arithmetic and logic processing circuits, general and special purpose registers, processor control or sequencing logic, and data paths interconnecting these elements. In some implementations, the processor may be configured as a stand-alone central processing unit (CPU) implemented as a custom-designed integrated circuit or implemented in an application specific integrated circuit (ASIC). The processor has internal registers for use with operations that are defined by a set of instructions. The instructions are typically stored in an instruction memory and specify a set of hardware functions that are available on the processor.
When implementing these functions, the processor generally retrieves “transient” data from a memory that is external to the processor, sequentially or randomly loads portions of the data into its internal registers by executing “load” instructions, processes the data in accordance with the instructions, and then stores the processed data back into the external memory using “store” instructions. In addition to loading the transient data into and removing the execution results out of the internal registers, load and store instructions are also frequently used during the actual processing of the transient data in order to access additional information required to complete the processing activity (e.g., accessing status and command registers). Frequent load/store accesses to an external memory is generally inefficient because the execution capability of a processor is substantially faster than its external interface capability. Consequently, the processor often idles while waiting for the accessed data to be loaded into its internal register file.
This inefficiency can be particularly limiting in devices that operate within communication systems, since the net effect is to constrain the overall data handling capacity of a device and, unless some data is to be dropped rather than transmitted, the maximum data rate of the network itself.