The semiconductor industry is faced with a disappointing observation: there are no longer any credible avenues for significantly increasing the performance level of processors, at least not at the individual level. Only the systems that use a plurality of processors operating in parallel still seem to constitute an encouraging avenue for increasing the computing power of the systems. In practice, studies conducted in the 1960s showed that the ratio between the computing power and the efficiency of the computation systems is potentially much higher for parallel systems than for sequential systems. Now, the new applications in fields such as multimedia, communications or real-time processing systems are demanding more and more computing power for consumed power levels and controlled surface areas. Failing being able to increase the processing powers of a single core, the only solution is to multiply the number of cores and to make them operate in parallel, giving rise to a new architectural concept, that of the on-chip parallel system. In the fairly specialized context of processors for embedded systems, this trend to increase the number of execution cores on one and the same chip is very marked. It should tend in the medium term toward the introduction or even the standardization of the systems with several tens or even hundreds of execution cores. Worth citing among these systems are the on-chip multiprocessor systems, usually designated by the acronym “MPSoC”, standing for “Multi-Processsor System on Chip”.
However, parallel systems are much more difficult to program and to debug than the sequential systems. These programming and debugging difficulties are exacerbated by the ever increasing complexity of the applications. In embedded applications, these difficulties are also increased by the desire to integrate ever more functionalities and by the ongoing increase in the volumes of data to be processed. For example, cell phones associate telecommunication functions with multimedia, positioning, or even gaming functions. This leads to embedded systems in which the intensive computation tasks run alongside control-dominated tasks, with very strong interactions between these various elements of the applications. The synchronization of the processing functions of the different cores to best manage the effective parallelism is then the critical performance and capacity factor in responding to the associated real-time constraints. This is the main difficulty in the effective operation of the parallel architectures of embedded systems. This difficulty has to be looked at from the three-fold aspect of mastery of the indeterminism, mastery of the communications and mastery of the controls. Once a potential parallelism has been identified, extracted from an application and expressed in a program, it is then essential to be capable of effectively implementing this parallelism in a given hardware architecture. In an MPSoC for example, in order to derive the maximum benefit from the work of extraction of the application parallelism done by the programmer, the numerous processing sequences must be best distributed among all the resources of the chip, although these sequences are interlinked by data dependencies or execution control dependencies. Hereinafter, these sequences will be called “execution tasks”: an execution task relates to the execution of a processing function on a processing core. It should be noted that software specialists also call it “thread”. In the rest of the present application, we will make no distinction, and the term “task” will refer solely to an execution task. In order to organize the execution of these tasks on an MPSoC and to facilitate the work of the developer, the software support for their execution is structured in purely application parts and other so-called “system” parts, the function of which is to abstract the resources of the underlying hardware. In order to better exploit the parallelism expressed by the tasks and that available in the MPSoC, it is necessary to conduct a study relating on the one hand to the way to choose the processing functions to be performed on the various cores and on the other hand to the way of making them operate together, ie to how to structure the basic software controlling the execution of the tasks on the hardware. Thus, in the same way as the program expresses the potential for parallelism of the application, it is essential to find a means of expressing the potential for parallelism of the architecture by an appropriate control of the tasks at the basic software level, generally called “kernel”. The study should take into account all the situations that may adversely affect a good use of the potential parallelism of the architecture. First there are the risks of being limited by the access to an essential shared resource such as the central memory, a network, a communication bus or a task manager. There are also risks of not being able to sufficiently finely manage the interdependencies between the tasks, notably when said tasks have a dynamic nature. And finally there are the risks of not being able to master the indeterminisms of the parallel execution, rendering the debugging of the programs complex and difficult. One standard way of addressing this problem is a layered software approach in which at the very least the application layer incorporating the tasks to be executed and the kernel which abstracts the hardware resources and manages the effective execution of the tasks on the machine are distinguished. The kernel is itself conventionally structured in two parts, one called “micro-kernel”, which performs all the system functions directly related to the hardware such as management of the registers, of the timers, of the peripheral devices, and so on, and a second which in this document is called “system layer”, responsible for the inter-task communications and other high-level task control aspects. The study should culminate in a structuring of the kernel which defines the way in which the processing cores are chosen and the way in which they are made to operate in a coordinated and effective manner. This constitutes one of the major challenges currently facing the microelectronics and embedded software industry and for which the present invention provides a solution.
An existing solution proposes using the processing cores symmetrically to execute the kernel. It is implemented in the architectures of “Symmetric Multiprocessing” (SMP) type. For example, it may involve having an identical kernel of Linux or Windows (registered trademark) type executed on each of the distinct processing cores. However, a major drawback is that a kernel of Linux or Windows (registered trademark) type cannot be executed on two distinct cores in a really simultaneous manner, at least with respect to the critical functions of the kernel. The parallelism is therefore limited to the non-critical functions of the kernel. And that is one of the technical problems that the present invention proposes to resolve by distributing some of the critical functions of the kernel over a plurality of processing cores.
There are also so-called “partition” solutions, which propose that each processing core be dedicated to mutually unaware activities. Interchanges then take place through the sharing of a memory space. For example, the patent application number WO/20071038011 entitled “Real-time threading service for partitioned multiprocessor systems” describes how a core can be dedicated to the execution of real-time tasks supplying results for an application executed by virtue of a non-real-time kernel executed on another core. Similarly, there are solutions in which each core executes a kernel that makes it dedicated to certain types of processing functions (logic computations, intensive computations, taking of input/output interrupts with the network, etc.). A typical example is to have one core dedicated to the computations and the others to the taking of interrupts to serve the peripheral inputs/outputs. In this typical case, the kernel of the computation core makes it possible to perform the computations on asynchronous data originating from the inputs/outputs made available by the core where the taking of interrupts are managed. An interrupt corresponds to the occurrence of an event external to the program, said event triggering the temporary stoppage of the execution of a current task in order to execute another, higher priority task (this change of execution context is called switch). The external event may be the advancing of a real or simulated clock, the higher priority task possibly being triggered by a timer. Such is notably the case with the real-time tasks that are time-constrained, or “Time Triggered” (TT): given that a real-time task must be finished before a given time, it must also be started before a given time which depends on the task execution duration. The real-time tasks are triggered by a timer that is physically paced by a quartz crystal, thus forming a real clock. The external event may also be the completion of a data transfer, the higher priority task is then said to be “Event Triggered” (ET) subject to input-output interrupt. In this type of relatively conventional design, the interrupt-taking core and the computation core are weakly coupled. This type of solution then resembles architectures with co-processors whose coordination relies on the provision of data and of collections of associated signals. One advantage with this solution is that it offers a good responsiveness to fast inputs/outputs, that is to say that it makes it possible to do the associated basic computations without in any way disrupting the scheduling of the processing functions on the computation core. It is effective when the tasks to be executed are independent of one another, that is to say, when they require little or nothing in the way of data interchanges and/or synchronizations. However, when the inputs/outputs require strong synchronizations with the computations or when there are numerous inputs/outputs with different rates, this type of solution is not very effective. And here again is one of the technical problems that the present invention proposes to resolve.
Another existing solution category consists of kernel architectures of “master-slave” type (as presented in the American patent application number US005978838A entitled “Coordination and synchronization of an asymmetric, single-chip, dual multiprocessor”). A core is designated as “master” core and is responsible for managing all the system calls and all the interrupts whereas the other cores are designated as “slave” cores and execute only the algorithmic part of the application tasks. Thus, the master core takes complete charge of the control and allocation of the processing functions to be performed by the other slave cores: it synchronizes their executions. The advantage is simplification of the synchronization problems which are entirely managed by the master processor. The major drawback with this architecture is its strongly centralized nature, the master processor becoming subject to contention problems engendered when a large number of tasks or even of cores is involved: the overall performance levels are then rapidly limited by those of the master processor. Improvements have been proposed such as that published in Advances in Computer Systems Architecture (vol. 4697) entitled “An effective design of master/slave operating system architecture for multiprocessor embedded systems” by Minyeol Seo et al. This publication aims to optimize the scheduling problems. They propose hierarchically organizing the scheduling policy on each core, without in any way releasing the master core from the processing of all the inputs/outputs nor of all the system calls, notably those concerning the inter-task communications. To do this, they define a part of the kernel called “Hardware Abstraction Layer” duplicated on each core and managing the communications between cores (“Inter-Processor Communications” or IPC). The IPC mechanism for synchronizing the cores is a mechanism implemented on the basis of a remote procedure call. Now, this client-server distributed function call mechanism causes blockages: when a task on a core invokes the sending of a message for the attention of another task, it proceeds to call a remote system function, on the master core. The calling core is then blocked until the return calls from this function, preventing any other parallel execution during this time period. Similarly, if the master core is blocked for one reason or another, all the slave cores are also blocked, and vice versa. Such a solution is therefore still too centralized. And here too is one of the technical problems that the present invention proposes to resolve.
Finally, there are also kernel structuring proposals for single-core processors, such as that described in the patent EP 1 337 919 B1, which disclose the organization of a kernel in a micro-kernel and a system layer making it possible to coordinate on a single core a deterministic execution of the tasks. This patent does not indicate how to proceed with an advantageous partitioning of the kernel between different cores of an MPSoC architecture. And that is one of the aims of the present invention.