The continuous increase in transistor density on a single die has enabled integration of more and more components in a system-on-chip (SoC), such as multiple processors, memories, etc. Although the integration of more and more components has significantly improved the intrinsic computational power of SoCs, such integration has also significantly increased the design complexity. Continuously increasing design complexity is exacerbating the well-known issue of design productivity gap. To meet time-to-design and time-to-market deadlines, industry is gradually shifting towards the use of automation tools at a higher level of design abstraction.
Heterogeneous multiprocessor system-on-chip (MPSoC) devices integrate multiple different processors to handle the high performance requirements of applications. An MPSoC primarily consists of multiple computational elements, examples of which include general-purpose processors, application-specific processors, and custom hardware accelerators, and communication channels. Hereafter in this document, such computational elements are collectively referred to as processors. A communication channel connects two processors, where a first processor, in this instance operating as a “sender-processor”, sends data and a second processor, in this instance operating as a “receiver-processor”, receives the data. Communication channels in an MPSoC can be implemented using first-in-first-out (FIFO) memory, shared memory, shared cache etc. Processors can also have private on-chip local memory (LM) used as a scratchpad memory for temporary storage of data. The mapping of communication channels can influence the size of LM associated with a receiver-processor. Memory configuration, including FIFOs, shared memory, shared cache and LMs, used for data communication contributes significantly to the overall area and performance of an MPSoC. A complex MPSoC can have a large number of communication channels between processors. The design space for memory configuration for inter-processor communication is defined as all the possible combinations of the implementation of communication channels along with the variations of LMs connected to the processors. One combination of the implementation of communication channels, along with a selected size of LMs for all the processors, represents one design point.
Mapping a complex streaming application on to an MPSoC to achieve performance requirements can be a very time intensive task. There has been an increased focus on automating the implementation of streaming multimedia applications on MPSoC platforms.
In one known method, an area of a pipelined MPSoC is optimized under latency or throughput constraint using an integer linear programming (ILP) approach for a multimedia application. The optimization method assumes that data communication between processors is achieved by using queues implementing FIFO protocol. The size of the queues is sufficiently large to hold an iteration of a data block, which can vary depending on the application. For example, a data block may include a group of pixels of an input image stream needed by any processor to independently execute the task mapped on the processor. The size of queues can significantly increase the area of an MPSoC for applications having a large data block size, which as a result, increases the cost of the MPSoC.
In another method, a design space exploration approach using linear constraints and a pseudo Boolean solver is proposed for optimization of the topology and communication routing of a system. Communication channels are commonly restricted to be mapped to memory resources. This approach does not consider multiple levels of memory hierarchies involving shared caches. Shared caches are on-chip memories which contain a subset of the contents of the external off-chip memory and provide better performance in comparison to the use of external off-chip memories alone. Not including shared caches may result in a significant increase in the on-chip memory area for a range of applications.
In another method, memory aware mapping of applications onto MPSoCs is proposed using evolutionary algorithms. Memory resources include private memories and shared memories. The limitation of this approach is that the method maps the application on a fixed memory platform, which is provided as an input to the method. In addition, the memory platform does not include shared caches. Including shared caches in the design space provides the flexibility to map communication data to off-chip memories and reduce on-chip memory area.
In another method, memory mapping is automatically determined and generated to provide an optimization of execution of the program on the target device. The memory mapping includes a description of the placement of the uniquely named memory sections to portions of the one or more memory elements of the target device. One limitation of this approach is that that the approach optimizes the memory mapping for a fixed memory platform, which is provided as an input to the method.
The memory configuration for inter-processor communication (“MC-IPrC”) can have a significant impact on the area and performance of an MPSoC. There is a need for design automation methods to consider MC-IPrC including FIFOs, shared caches and local memories when mapping streaming applications onto MPSoCs.