In modern multiprocessor systems there is the need for communication between the individual processors, i.e. the processors must be able to interchange both data and instructions (commands) among one another. In accordance with FIG. 1, traditional multiprocessor systems having processors P1, P2 use a shared memory 1 for interchanging data or instructions, which shared memory is connected via a bus architecture 2 both to the first processor P1 and to the second processor P2. The interaction of the two processors P1, P2 can be controlled or synchronized by means of interrupts (commands for interrupting the current CPU cycle). A further possibility for interactive control of the memory access consists in providing semaphores, i.e. a software-controlled identification (flag), defining which of the processors is permitted to exercise a write access to the shared memory 1. Furthermore, it is known to connect a plurality of processors to one or a plurality of shared memories 1 via a switchable connection (crossbar switch).
The known solutions are not very effective if a fast interaction between the processors is required. Although the interchange of data or else instructions (i.e. the programming of one processor by the other processor) is possible by means of the known measures described above, it is too slow for computation- and data-intensive tasks with real-time requirements, such as occur for example in modern communications systems.
In order to accelerate the data processing in the processors P1, P2, it is already known for the latter to be coupled in each case to a tightly coupled fast memory integrated on the chip, a so-called TCM (tightly coupled memory). One example of a processor that can be equipped with a TCM is described in the data sheet “FlexCore® ARM926EJ-S™ 32-bit RISC Processor Cores”, http://Isilogic.com/files/docs/marketing-docs/microprocessors/arm926ej-s_flexcore_db.pdf. The TCM is a DRAM, SRAM or flash memory that can essentially be directly connected to the processor core and can be accessed singly and solely by the processor P1, P2 equipped with the respective TCM. Processors P1, P2 that can be equipped with a TCM have an input/output intended specifically for the TCM—a so-called TCM interface—and also a suitable address generating unit for generating the addresses for the TCM. In comparison with processors without a TCM, processors P1, P2 with a TCM have an improved performance for dealing with computation- and data-intensive tasks. In the multiprocessor system shown in FIG. 1, both processors P1, P2 are equipped with a TCM 3. However, for applications requiring a high processor interaction, the real-time behaviour that can be achieved with this solution still remains unsatisfactory.
For some years, complex heterogeneous systems have increasingly been realised on a single chip. These so-called SoC (system-on-chip) realisations contain one or a plurality of embedded programmable components—processor cores for general tasks, DSP cores or cores of application-specific processors—and also further components such as, for example, an analogue front end, on-chip memory, input/output devices and other application-specific integrated circuits.
The starting point for the development of an SoC is the definition of the processes or tasks that have to be dealt with by the SoC. Afterwards, it is necessary to find a suitable software/hardware partitioning. In this case, it is important to enable a high-performance task scheduling and a high-performance intertask communication in order to be able to comply with real-time requirements. At the same time, it is necessary to fulfil the customary requirements (small space requirement, low power consumption).
The document U.S. Pat. No. 6,643,763 B1 describes a multiprocessor system in which a tight connection between two processors is realised via a register pipeline with FIFO (First-In First-Out) buffers.