When a computer system must process the special purpose for application software, the computer usually employs an additional processor, such as a digital signal processor (DSP), or a floating-point unit (FPU). For embedded multimedia applications, such as mobile phones, the micro-processing unit (MPU) is used due to the power consumption and heat dissipation consideration, and thus the mathematical computing capability is compromised. To provide multimedia applications, a DSP is usually included to handle the multimedia compression or decompression. For example, a dual-core system-on-a-chip (SoC) with a MPU and a DSP, such as DM series of TI, and the parallel architecture core (PAC) SoC by SoC Technology Center (STC) of ITRI are examples of such development.
When an MPU and a DSP are on the same platform and working together, the platform can be considered as a multiprocessor platform. The multiprocessor platform usually faces the synchronization problem of shared resources; therefore, a mechanism must be provided so that only a processor is using the shared resource at any time. The conventional technique is to use a semaphore mechanism in the shared memory to lock the shared resource. The conventional semaphore suffers from the efficiency problem. In addition, to prevent the error caused by multiple processors accessing the semaphore, the processors will lock the bus, which further reduces the utilization and efficiency. Another problem is the lack of an efficient mechanism to notify the waiting processor when a shared resource becomes available.
Another conventional technique is a mailbox mechanism in hardware. The mailbox mechanism issues an interrupt to the specific processor after writing to command and data register. Then, the interrupt service routine (ISR) wakes up a specific application software to use the shared resource. The dsp gateway of Nokia uses such a mechanism, and an inter-process communication (IPC) framework is developed on the OMAP5912 platform so that the MPU processing schedule and the tasks on DSP can communicate with each other through a simple application programming interface (API).
FIG. 1 shows a flowchart of data reading and writing on an IPC framework of a mailbox mechanism of a conventional multi-processor system. As shown in FIG. 1, a dual-core processor platform includes an MPU and a DSP. When the application software on MPU issues a command to request DSP to process data (shown as 101), the operating system of MPU and the IPC framework will assign the shared resource to DSP (shown as 102). The MPU_to_DSP mailbox receives the data transmission request 103 from MPU, and issues an interrupt command 104 to DSP. DSP interrupt service routine receives the interrupt command (shown as 105), and DSP executes the data processing request of MPU (shown as 106). Time T1 is the time when DSP finishes the data processing.
When DSP finishes data processing, the shared resource is assigned to MPU (shown as 107). The setting of shared resource by DSP to MPU will wake up the application software on MPU, described as follows.
DSP_to_MPU mailbox receives the data transmission request 108 from DSP, and issues an interrupt command 109 to MPU. MPU interrupt service routine receives interrupt command 109 (shown as 110). Then, IPC framework wakes up the application software (shown as 111), and the application software on MPU starts to use the shared resource and process data (shown as 112). Time T2 is the time when IPC framework on MPU end wakes up the application software to start using the shared resource.
A data latency problem leading to performance efficiency can be observed in FIG. 1. Theoretically, the ideal situation should be that time T1 when DSP finishing processing data equals to time T2 when IPC framework on MPU waking up application software to use shared resource. Thus, no data latency is observed. However, in an actual application, the following factors may contribute to the difference between T1 and T2:                1. the time for DSP to write to the register of DSP_to_MPU mailbox;        2. the time from finishing writing to DSP_to_MPU mailbox register to DSP_to_MPU mailbox issuing the interrupt command;        3. the time from DSP_to_MPU issuing the interrupt command to MPU receiving the interrupt command;        4. the time from MPU receiving the interrupt command to ISR of MPU operating system starting to execute;        5. the time for ISR of MPU operating system to execute and IPC framework execution time; and        6. the time from IPC framework waking up the application software to the application software starting to use the shared resource.        
Items 1-3 of the above list are the simple register writing and hardware operation, and may require tens of clock cycles to finish. With a 100 system bus, the 10 clock cycles will take 0.1 us. Item 4 is defined as interrupt latency, and item 5 is the necessary execution process for operating system and IPC framework.
The duration of item 6 depends on the operating system scheduling. The experiment with Linux2.6 Operating System shows that after the ISR finishes, a scheduling algorithm evaluates whether to schedule the CPU to another task. In addition, when the application software requests to IPC framework for shared resource while DSP has not yet finished processing data, the application software is requested to hand over the CPU. When IPC framework informs the application software to use the shared resource, the application software must wait until the CPU is available to the application software. Items 4-6 of the above list are defined as task latency. The task latency depends on the system workload, and the time record can be used to estimate the task latency.
U.S. Pat. No. 6,938,253 disclosed a system and method for multiprocessor communication, including the integration of semaphore and mailbox mechanism so that when the resource is not required to be locked, the mailbox mechanism can be used to inform the specific processor and application software to use the shared resource. This patent emphasizes the elimination of the need to lock the resource to improve the semaphore efficiency in a multiprocessor environment. However, when the system is busy, the task latency problem remains for the mailbox mechanism.
Cirrus Logic, Inc. proposed an IPC framework implemented with mailbox mechanism. But the technique does not address the performance improvement issue of IPC.