Since the beginning of the new millennium, there has been a qualitative change in the progress in microprocessor and microelectronics architectures. Moore's law, which provides the roadmap for the state of the art in microelectronics etching processes continues to be borne out, since the number of transistors that can be placed on a chip still doubles approximately every 20 to 24 months. However, the use of additional transistors is currently caused by the increase in the number of computation cores, more than by the increase in the computation power of each of the cores.
Thus, the workstation processors already have a significant number of cores per physical processor: between 8 and 16 at the current time. If this trend continues, in the coming years the number of cores per chip will exceed several tens. In fact, there are already a few processors that have several hundreds of cores per chip. For example, the MPPA-256 processor from Kalray which has 256 computation cores and twenty or so cores for managing the parallelism and inputs/outputs.
Such processors are called many core processors. The distinction between multicore and many core is not reduced solely to the number of cores. The most noteworthy difference is more specifically linked to the manner in which the processor cores communicate. In a conventional multicore, the cores have a shared bus for accessing the main memory. In principle, the shared bus allows a good perception of the memory accesses from all the cores. It is thus relatively easy to implement memory consistency methods to ensure that all the cores have the same view of the central memory of the system. These principles are still in force in processors commonly accessible at the present time, through a bus architecture which adds a few nuances to the preceding scheme. Generally, the workstation processors with 8 or 16 cores are referred to by the acronym NUMA, for Non-Uniform Memory Access. For these processors, the memory access is not uniform between the computation cores, but the data consistency remains achievable.
In the context of a many core processor where the number of computation cores is particularly high, the possibility of establishing memory consistency on a large scale on all the cores is more problematical. Without becoming impossible, its cost in terms of synchronization counterbalances the saving in execution time generated by the parallelism, the benefit of which can be compromised thereby. In effect, for this, it is necessary to involve distributed protocols but protocols that include some shared elements. It is therefore an overall unsatisfactory solution, all the more so as the memory consistency induces a not inconsiderable electrical consumption. In order to address this problem, a significant amount of the research on many core execution methods is focused on computation models which do not need memory consistency. Among these, the data flow models are these days a particular study subject.
The abovementioned hardware trends induce a deep mutation in the embedded computation, and allow for the emergence of new applications, of which some, in fields as varied as high-performance computation or embedded critical real-time systems, entail both strong needs in terms of parallelism and real-time constraints. By way of example, J. Mitola. The software radio architecture. Communications Magazine, IEEE, 33(5):26-38, May 1995 describes a software radio application and M. Heskamp, R. Schiphorst, and K. Slump. Cognitive Radio Communications and Networks, chapter Public safety and cognitive radio, pages 467-488. Elsevier, 2009 describes a cognitive radio system. Each of these applications demands high computation power coupled with low electrical consumption. Both present real-time constraints and cognitive radio presents a need for adaptability to its environment. Similar constraints are encountered in the augmented reality applications, for example those described by S. Feiner, B. MacIntyre, T. Hollerer, and A. Webster. A touring machine: prototyping 3d mobile augmented reality systems for exploring the urban environment. In Wearable Computers, 1997. Digest of Papers., First International Symposium on, pages 74-81, October 1997 and Y. Sato, M. Nakamoto, Y. Tamaki, T. Sasama, I. Sakita, Y. Nakajima, M. Monden, and S. Tamura. Image guidance of breast cancer surgery using 3d ultrasound images and augmented reality visualization, 1998, as well as autonomous vehicle applications described by A. Kushleyev, D. Mellinger, C. Powers, and V. Kumar. Towards a swarm of agile micro quadrotors. Autonomous Robots, 35(4):287-300, 2013.
The connectivity is another important aspect in the current technological trends. More and more smartphone appliances, from the smart sensor to the smartphone, have the possibility of being connected permanently to one another. Each of these mobile appliances can be connected to remote servers or embedded systems, such as, for example, the embedded system of a car. Atzori, A. Iera, and G. Morabito. 54(15):2787-2805, October 2010. The internet of things: A survey. Comput. Network discloses a swarm of mobile computers creating a global computation platform, highly parallel and mutable. This article lays the groundwork for an omnipresent and distributed computation capacity.
Thus, the applications cited previously can be seen as parts of a cooperative computation, in which each application can use the results of other applications to improve its own. A. A. Faisul, A. R. Ramli, K. Samsudin, and S. Hashim. Energy management in mobile robotics system based on biologically inspired honeybees behavior. In Computer Applications and Industrial Electronics (ICCAIE), 2011 IEEE International Conference on, pages 32-35. IEEE, 2011, in the field of energy management in mobile robotics, and Y. Wang, Y. Qi, and Y. Li. Memory-based multiagent coevolution modeling for robust moving object tracking. The Scientific World Journal, 2013, 2013 in the field of robust target tracking, now discloses such applications, which require dynamisms, knowledge of the environment and adaptation thereto.
The data flow models and more particularly the networks of communicating were first devised by Kahn in 1974. They make communications between tasks explicit, which allows for an analysis of the application offline to check important properties for safe and deterministic execution.
Many data flow models use the concept of actor introduced by Carl Hewitt; Peter Bishop; Richard Steiger (1973). A Universal Modular Actor Formalism for Artificial Intelligence. IJCAI. In the context of this concept and of the associated mathematical model called actor model, the actors are the only primitive functions necessary to concurrent programming. The actors communicate by the exchange of messages. In response to a message, an actor can for example perform a local processing operation, create other actors, or send other messages.
The basic KPN (Kahn Parallel Network) principle is founded on a graph whose nodes are communicating processes and whose links are queues (more commonly called FIFO, the acronym for First In First Out). The processes can read data on the input channels and write the results of their computations on output channels. The granularity of the data exchanged is denoted by tokens and the number of tokens produced and consumed are positive or zero integer numbers.
The reads on the input channels are blocking, which means that, if a process requires more tokens than there are present on one of its inputs, it will be blocked until the number of tokens required is present on said input. The processes have the right to have an internal state and can dynamically change the number of tokens awaited at the input or produced at the output.
The Kahn model is powerful, gives deterministic results (the same inputs produce the same outputs), local (therefore without the need to have memory consistency mechanisms), but it lacks properties to be able to use it in total confidence in the context of embedded computing. In particular, it is possible to know in advance if a Kahn process network has no blockages or if it will be able to be executed in bounded memory.
There are a number of KPN enhancements to allow off-line checks on the properties essential for the determinism and the dependability. Among these enhancements, the CSDF (Cyclo-Static Data Flow) model is the most expressive which retains the possibility of conducting these checks. K. Denolf, M. Bekooij, J. Cockx, D. Verkest, and H. Corporaal, Exploiting the expressiveness of cyclo-static dataflow to model multimedia implementations. EURASIP Journal on Advances in Signal Processing, 2007(1): 084078, 2007 describes possible analyses of CSDF graphs. In addition, with the information on the execution times of the actors and the latencies induced by the communications, a CSDF graph can be analyzed to determine the latencies on a path of the graph.
The patent FR 28166730 describes a securing method that makes the real-time execution of multitask application deterministic. It describes in particular the sequencings of system calls and interrupts, clocks, and management of interruptions at the microkernel level allowing for a task to be executed in real time deterministically.
The patent U.S. Pat. No. 5,745,687 describes a data flow management system implemented on computer. This system notably comprises an organization service, intended to identify, out of a set of agents, the most appropriate agent to perform each of the tasks of the data flow graph.
The patent U.S. Pat. No. 6,397,192 describes a method for synchronizing one or more data flow graphs. It notably describes how to pause and start/restart the execution of the data flow graph as a function of the termination or non-termination of tasks.
However, none of the techniques described previously makes it possible to ensure the correct execution of the data flow graph in the case of failure of one of the agents. The risk of failure becomes not inconsiderable with the expansion of the distributed computations and of the network connections. By way of example, an agent of a data flow graph can invoke a computation resource situated on an external network. A connection problem, by its nature unpredictable, can then threaten the determinism of the execution, or of the time of execution, of the task performed by the agent. Similarly, when an agent executes a task requiring external elements, for example the reception of GPS signals, it is very difficult to ensure the task execution determinism. In the case of a CSDF graph, the token system blocks the execution of the graph until all the data awaited have arrived. The patent FR 28166730 makes it possible to obtain a deterministic response time for a real-time task, but does not broach the issue of the possible failure of certain actors. The patent U.S. Pat. No. 5,745,687 does not deal with this problem and the patent U.S. Pat. No. 6,397,192 describes a synchronization mechanism in which the tasks await the end of execution of other tasks, but does not describe a mechanism that makes it possible to overcome the failure thereof.
Furthermore, when several agents are likely to perform the same task with different result qualities, none of the abovementioned methods makes it possible to select a posteriori the one that has produced the best result. The tasks that can be performed by different agents with a result of variable quality include, for example, geolocation by different means, or image analysis with different result qualities. The patent U.S. Pat. No. 5,745,687 does indeed make it possible to select the most appropriate agent to perform a task, but it is an a priori selection of the agent, and not an a posteriori selection of agents that have already terminated their task and that are capable of quantifying the quality of their result.
Finally, none of the above methods makes it possible to create deterministic graphs guaranteeing the provision of a result both within a limited time and of the best possible quality out of the results supplied by several agents being running in parallel.