Many systems, especially communication systems, are constructed from a plurality of processing elements that must communicate with each other for proper operation of the system. Different processing elements may include any type of hardware circuit element, software processing device, hardware accelerator module, central processing unit (CPU), etc. For example, data output from several processing units may need to be multiplexed and communicated to different processing elements for further processing. Numerous techniques and methods for transferring data between processing elements are known in the art. Several techniques for enabling communication between two processing elements include bidirectional buses, Direct Memory Transfer (DMA) circuits, shared memories such as dual port RAMs, first in first out (FIFO) based memories, etc. When designing an embedded system, the decision of which particular technique to use typically takes into consideration several factors, including the required functionality, the cost, the resulting level of system complexity, the desired performance, etc.
Often times, however, processing elements in the embedded system comprise complex circuitry and are adapted to generate multiple output data streams from several data generating units within the system. Often, one of the processing elements in the system such as the CPU is responsible for processing the multiple output data streams alone or in combination with one or more other processing elements. The processing to be performed typically must be performed separately on the data output of each processing unit.
In order to transfer data from the multiple units, a multi-streaming interface is required between the data generating processing units and the CPU. Such an interface typically requires multiple instances of the techniques described above, e.g., multiple DMA controllers or multi-channel DMA controllers, multiple shared memories, multiple FIFOs, etc., which requires more complicated system control and increases the system cost.
As an example, a prior art shared memory scheme is described below. A block diagram illustrating a prior art example embedded system for transferring data between multiple data generating devices (e.g., a hardware module with multiple data generating units) to a software module such as a CPU using multiple buffers in a shared memory is shown in FIG. 1. The embedded system, generally referenced 10, comprises multiple generating devices 12, shared memory 14 and a CPU 20. In general, the embedded system is constructed from two main portions, namely a plurality of processing elements coupled to a central processor via a memory buffer. In this prior art example, the multiple data generating devices comprises a hardware processing element which incorporated a plurality of hardware based data generating processing units 15, labeled unit 1 through unit N. Each hardware unit generates it's own stream of data. The shared memory comprises N buffers 16, labeled buffer 1 through buffer N. The N buffers in the shared memory may be implemented in any suitable manner such as linear, cyclic, etc. The data output from each hardware unit is written into its associated buffer in the shared memory.
Data is transferred from the plurality of processing units and the CPU via the shared memory. The shared memory communicates with the CPU via the common bus 18 which functions to couple the buffers in the shared memory to the CPU. A bus arbiter (not shown) functions to coordinate the transfer of information between the multiple units and the CPU. Each hardware unit writes data to its own cyclic buffer in the shared memory. The CPU reads the hardware unit data output directly from the associated buffer.
A disadvantage of this prior art data transfer scheme is that it is costly due to the requirement of providing multiple buffers in the shared memory along with bus arbitration logic circuitry. In addition, in the case of cyclic buffers, the system control required for the proper operation of the embedded system is complicated since each buffer requires separate control circuitry for maintaining read and write pointers, etc.
In the case where the number of active hardware units and the data output rate of each unit is configurable each cycle, additional control circuitry is required increasing the system control complexity even further.
Thus, there is a need for a data transfer scheme for use in transferring data between processing elements that is more efficient and less costly that the prior art techniques. Both the data generating (i.e. source) and data receiving (i.e. sink) processing elements may be implemented in hardware, software or a combination of hardware and software. Further, the data transfer scheme should have the capability of handling the transfer of multiplexed data from multiple data generating processing units to a receiving processing element that does not require complex system control logic circuitry.