The present invention relates to design of flexible data exchange mechanisms for multiprocessor based embedded applications. More specifically, the present invention provides architecture for implementing flexible data exchange between nodes of a heterogeneous multiprocessor system and improving the performance of embedded applications.
Embedded applications perform specialized tasks and are traditionally designed by taking into consideration various parameters such as speed, cost, efficiency and robustness. With an increase in the complexity of embedded applications, low power consumption and flexibility in design have also emerged as parameters which must be taken into consideration while designing an embedded application. The need of higher processing power at lower power consumption levels has driven designers to take multiple approaches in designing complex high performance embedded systems. One approach is a hybrid design where the application may be partly implemented on hardware and partly as one or more software modules running on a microprocessor. An alternate approach is to use multiple processing engines fabricated on a single chip allowing parallel execution of instructions. The processing engines used in this approach could be similar microprocessors, different kinds of microprocessors or special hardware modules. In any of these multiprocessor scenarios data exchange and synchronization mechanisms between the processing engines can greatly influence the performance of the overall system. The processing engines need to exchange raw data, results of processing, status, processing states etc., during run time. The term ‘data exchange’ is used broadly in all embodiments of the present technique to include all types of communication between the processing engines. Methods and systems employed for the data exchange in prior art are described below;
In one conventional method, any processor, when ready to send data, requests the arbitrator for gaining control of the common bus. Arbitrator, based on any chosen priority scheme, grants the control to the requesting processor. The requesting processor sends the data over the bus to the destination processor and then withdraws its request for the bus. This scheme is simple and allows any processor to communicate with any other processor on the bus; however, in this scheme the destination processor has to be ready to receive the data when the source processor is ready to send the data. Otherwise, one of the processors has to wait for the other. Besides, this scheme does not allow two or more processors to send data simultaneously to their respective destinations. The communication gets serialized and hence can delay the processing at both the source and destination processors. This is a serious limitation in systems where parallel execution is desired.
In another conventional method any processor can write data into the shared memory and the destination processor can consume the data at its convenience. This frees the source processor to take up the next job. This scheme requires some type of synchronization so that destination processor can know about the availability of new data in the memory. Also, parallel communication in this scheme is again limited by the number of ports the memory can support and the maximum possible memory access speed.
In yet another conventional method every processor is connected directly with every other processor through dedicated links. This scheme requires a lot of resources and can't be scaled up easily to incorporate more processors in the system although it does achieve fair amount of parallel execution. A variation of this approach is shown with reference to subsequent figures, where each processor is connected only to its neighboring processors and any communication to the non neighbor is through one of the neighboring processors. Paths to non-neighbors are determined using various routing algorithms. This scheme, though conceptually very efficient, involves, in its most generic form, implementation of highly generic and elaborate routing algorithms and is mostly used in large distributed networks where processing engines are general purpose computers physically located at different places. An improved version of such a scheme with optimum and efficient utilization of resources specifically suited for embedded multiprocessing applications is disclosed in this invention.