Multiprocessor systems incorporate increasingly complex and diversified applications which demand increasing computation and storage performance levels. Such increasing performance levels, which are reflected on the one hand by always increasing operating frequencies and on the other hand by an increase in the number of processors and memory circuits, result in an increasingly high energy consumption.
The parallel processing of an application by a set of computation elements, typically processors, requires this application to be subdivided into a number of processing blocks called tasks. These tasks are executed in succession and in parallel. The aim of this subdivision is notably to execute individual computation tasks in parallel and thus speed up the computation, the parallel tasks being assigned to a number of processors. The architectures that exploit this type of parallelism, called multiprocessor architectures, offer great operating flexibility and good performance in terms of computation power. In practice, they are capable of executing a number of applications in parallel, as well as applications that have a strong parallelism at task level. These multiprocessor architectures that make it possible to execute parallel applications that have increasing storage and computation capabilities consume a great deal of energy requiring suitable management of this consumption.
In a real-time context, these architectures are often dimensioned for the worst case according to the applications supported and the time constraints to be observed. The dimensioning phase consists in characterizing each task by a worst-case execution time WCET. Once the WCETs of the various tasks have been calculated, a number of processors making it possible to exploit the parallelism and guarantee that the time constraints are observed is set in the worst case. This number of processors is strongly dependent on the choice of scheduling implemented to set the execution priorities of the tasks while observing the dependencies between the tasks. Inactivity intervals of certain processors may occur at certain task synchronization points, these inactivity intervals being due to the variation of the potential rate of parallelism of the application and may be characterized according to the worst-case execution behavior. If these inactivity intervals occur during actual execution in the same way, idle modes may be determined off-line and implemented during execution, on-line, to reduce the energy consumed. Unfortunately, the variation of the actual execution times AET of the tasks relative to the worst-case execution times WCET alters the order and the times of activation of the tasks. Thus, the off-line prediction of these inactivity intervals becomes difficult. This difficulty limits the exploitation of these opportunities for reducing the energy consumed which occur during execution. The differences between the worst-case behaviors and the actual behaviors become increasingly significant in the applications that are data dependent and that have a lot of control. These differences do, however, offer a great potential for optimizing the consumption. In practice, the variation of the task execution times compared to the worst-case execution times reveals time excesses. These time excesses may be exploited to slow down the execution speeds of the subsequent tasks and therefore locally reduce the consumption while observing the real-time constraints. In a global scheduling context, the difficulty occurs at the level of the distribution of the time excesses, which are obtained during execution, in order to effectively reduce the energy without violating the time constraints.
Solutions dealing with the consumption management problem at resource level are known. These solutions aim to reduce the energy consumed by the computation resources (processors) in an embedded system by being based on so-called DPM (dynamic power management) and/or DVFS (dynamic voltage and frequency scaling) techniques. The DPM techniques consist in exploiting the inactivity intervals by switching to idle modes the resources that are not used for a given time period. The DVFS techniques aim rather to exploit the time excesses and locally or globally lower the frequency and the voltage of certain resources. A first non-optimal variant combines all the methods that implement only the DPM techniques by being based on the off-line prediction of the inactivity intervals of the resources, as is notably described in document D1 by A. Iranli at et al: “System-level Power Management—An Overview”, University of Southern California, Dept of Electrical Engineering, Los Angeles. However, the variation of the actual task execution times relative to the worst-case execution times WCET varies on-line the order and the times of activation of the tasks. Thus, the off-line prediction of these arrival times and of the lengths of these inactivity intervals becomes very difficult. The implementation of the DPM techniques which are based on prediction may culminate in certain cases, where the execution profile of the application is more or less deterministic, in interesting results. In a general context in which a number of applications with a number of instances whose arrival times are not known, the implementation of these techniques remains very tricky. In practice, with a poor prediction, these techniques may introduce additional latencies that are likely to violate the time constraints.
A second variant combines all the methods that implement only the DVFS techniques. This variant is notably described in a document D2 by D. Zhu et al: “Scheduling with Dynamic Voltage/Speed Adjustment Using Slack Reclamation in Multiprocessor Real-Time Systems”, IEEE Transactions on Parallel and Distributed Systems, vol. 14, N° 7, July 2003, and in a document D3 by N. Ventoux: “Contrôle en ligne des systèmes multiprocesseurs hétérogènes embarqués—elaboration et validation d'une architecture”, doctoral thesis defended on 19 Sep. 2006 at University of Rennes 1. These techniques may be implemented off-line as well as on-line. In an off-line approach, the pairings (voltage, frequency) for the various processors may be adjusted globally by calculating a global slowing-down factor (according to the global time constraint or deadline of the application) or locally by calculating a slowing-down factor local to each task (according to their contribution to the critical path of the application). In an on-line approach, these techniques aim to detect the time excesses due to the variations of the actual execution times of the tasks and exploit them so as to reduce the energy consumed while guaranteeing that the time constraints are observed.
The difficulty clearly appears at the level of the preparation of an optimal consumption management method which remains compatible with a global scheduling. In practice, in a global scheduling context as presented in the abovementioned documents D2 and D3, the authors make do with sub-optimal excess distribution methods in order to observe the time constraints. In the document D2, a time excess obtained during the execution is shared between tasks assigned to different resources so as to enable the tasks to observe their time constraints. In this method, portions of the time excesses may be disregarded to observe an a priori set task execution order, by being based on the worst-case execution times WCET and a global scheduling policy which executes the longest task first, (Largest Task First) LFT. In the document D3, the method assigns the time excess obtained during the execution to the next task in the precedence chart. This method for distributing excesses according to the data or control dependencies is quite compatible with a global scheduling policy, but it does not make it possible, for example, to exploit all the excesses produced by the various branches of a convergence. In practice, only the smallest excess, out of the excesses produced by the various branches, is implemented to reduce the energy consumed.