The present invention relates generally to integrated circuit memory access schemes, and more particularly to memory access in integrated circuits having multiple hardware accelerators.
FIG. 1 shows a block diagram of a portion of an exemplary conventional SOC (system on chip) integrated circuit (IC) 100 having multiple hardware accelerators. In particular, the SOC 100 has hardware acceleration (HWA) processors 110, 120, and 140.
As shown in FIG. 1, the HWA processor 110 includes a signal processing co-processor 112, tightly coupled memory (TCM) 114, and memory controller 116. The co-processor 112 is a hardware accelerator (e.g., application-specific integrated circuitry (ASIC)) designed to perform one or more particular data-processing algorithms quickly and efficiently. As controlled by the memory controller 116, the co-processor 112 accesses and processes data stored in the TCM memory 114 and stores the resulting processed data back into the TCM memory 114. The HWA processor 120 is analogous to a central processing core.
The HWA processor 140 has three co-processors 142, 144, and 146 configured to perform different data-processing algorithms, such as filtering, fast Fourier transform (FFT) processing, or sequencing, where each co-processor functions as a different hardware accelerator. The HWA processor 140 also has a local SRAM (static random access memory) 148 and controller (bus interface) 150. Similar to the HWA processors 110 and 120, as controlled by the controller 150, each hardware accelerator 142, 144, 146 of the HWA processor 140 accesses and processes data stored in the local memory 148 and stores the resulting processed data back into the local memory 148.
In addition to local and tightly coupled memories, the SOC 100 also has a system memory 160 with twelve banks 162 of independently addressable SRAM memory. The HWA processors 110, 120, and 140 are able to access (i.e., read data from and/or write data to) the memory banks 162 of the system memory 160 simultaneously in a non-blocking manner via a data crossbar switch (i.e., arbiter) 170 and system memory controller 180. In addition to being able to access the system memory 160 via the data crossbar switch 170, the HWA processor 140 is also able to access the system memory 160 via a dedicated bus 152 and system memory controller 180, bypassing the data crossbar switch 170. The bus 152 enables the HWA processor 140 to access system memory 160 faster than by using data the crossbar switch 170.
Assume, for example, an exemplary data-processing routine involving the sequence of a data-processing algorithm of hardware accelerator 142 of the HWA processor 140 followed by a data-processing algorithm of the hardware accelerator 144 of the HWA processor 140 followed by a data-processing algorithm of the hardware accelerator 112 of the HWA processor 110. The sequence of events for this exemplary data-processing routine may be as follows:                1) Controller 150 copies a first set of (original) data from memory bank 162 of system memory 160 into local memory 148 via bus 152 and memory controller 180.        2) Hardware accelerator 142 accesses and processes the first set of data stored in local memory 148 and stores the resulting second set of (processed) data back into local memory 148.        3) Hardware accelerator 144 accesses and processes the second set of data stored in local memory 148 and stores the resulting third set of (further processed) data back into local memory 148. (Note that, since hardware accelerators 142 and 144 share the same local memory 148, the second set of data does not have to be stored back into system memory 160. If hardware accelerators 142 and 144 had different local memories, then the controller for hardware accelerator 142 would first copy the second set of data from its local memory into system memory 160, and the controller for hardware accelerator 142 would then copy the second set of data from system memory 160 into its local memory.)        4) Controller 150 copies the third set of data from local memory 148 into memory bank 162 of system memory 160 via bus 152 and memory controller 180.        5) Controller 116 of HWA processor 110 copies the third set of data from memory bank 162 of system memory 160 into TCM memory 114 via data crossbar switch 170 and memory controller 180.        6) Hardware accelerator 112 accesses and processes the third set of data stored in TCM memory 114 and stores the resulting fourth set of (still further processed) data back into TCM memory 114.        7) Controller 116 copies the fourth set of data from TCM memory 114 into memory bank 162 of system memory 160 via data crossbar switch 170 and memory controller 180, such that the fourth set of data is available for further handling by SOC 100.        
Because steps (5) and (7) involve the relatively slow data crossbar switch 170, this exemplary data-processing routine is slower than it would be if the SOC 100 were included a fast bus between the HWA processor 110 and the system memory controller 180 similar to the bus 152. Furthermore, the time that it takes for some data-processing routines to be performed using the data crossbar switch 170 may be too long for certain applications. However, designing the SOC 100 to have a fast bus between the HWA processor 110 and the system memory controller 180 would take up additional layout area.