A computing system includes multiple computing resources, at least some of which communicate with each other based on a network on a chip architecture. The computing resources include processing elements (or engines), memories, and the like. Data processed by a processing element can be stored by the processing element, in part remotely, in a memory of the computing system, and, in part locally, in memory registers of the processing element. Often, the processing element combines the items of processed data stored in the memory with the items of processed data stored in the memory registers and then sends the combined processed data items to another processing element for further processing (e.g., as part of a software pipeline).
This is conventionally accomplished by the processing element by performing the following sequence of operations: a first portion of the processed data to be sent to the other processing element is first retrieved from the memory and then placed into memory registers contiguous with the memory registers already holding a second portion of the processed data to be sent to the other processing element. Upon placement of the retrieved first portion of the processed data in the contiguous registers, the processing element transmits the combined processed data to the other processing element for further processing.