The invention relates to global voltage/frequency switches in a multiprocessor environment, and more particularly to global voltage/frequency switches in an environment having asynchronously operating processors, one or more of which have a task scheduling mechanism that is dataflow dependent on outputs from one or more of the other processors.
Radio transceivers have stringent time requirements, and are commonly treated as hard-real-time applications whose deadlines cannot be missed, and for which there is no graceful degradation of performance.
In recent years, designers have implemented the physical layer (PHY) of modems on heterogeneous multiprocessors. Due to difficulties in synchronization, tasks between processors are neither statically scheduled, nor are they time-triggered. Instead, they synchronize amongst themselves in a distributed, self-timed fashion: when a task finishes execution, it sends a trigger to another task (potentially running on another processor/accelerator) and that task starts execution.
At the same time, so-called Voltage-Frequency Scaling (VFS) technology is becoming more and more popular. This concerns one or multiple switches that allow one or more processor cores to switch between different voltage/frequency levels. Each of these voltage/frequency levels offers a different trade-off between power consumption and performance. Thus, one can run faster and consume more power, or run slower and consume less power. This switching is not limited to two levels, but typically a number of discrete levels.
U.S. Patent Publication 2013/0086402, published on Apr. 4, 2013 by Orlando Pires Dos Reis Moreira and entitled “DFVS-Enabled Multiprocessor” (hereinafter, “Moreira”), which is hereby incorporated herein by reference in its entirety, describes how a computer program designed in accordance with linear programming (LP) technology can be used to generate a static periodic schedule to assign, for each time interval, a different Voltage-Frequency level, in such a way that the temporal requirements of a streaming application described as a dataflow graph annotated with timing are met, while providing optimal (minimal) energy consumption. However, the inventors have recognized that this methodology has two important limitations.
The first is that it relies on the assumption that each processor has an independent VFS switch. In many hardware platforms, a single, global, VFS switch is provided by the hardware. This is mainly due to cost issues. To account for this, a heuristic including (Integer) LP programming has been described that provides a static periodic schedule for a global VFS switch that minimizes energy consumption. See Moreira et al., “Throughput-Constrained Voltage and Frequency Scaling for Real-time Heterogeneous Multiprocessors”, SAC' 13 Mar. 18-22, 2013, Coimbra, Portugal (ACM 978-1-4503-1656-9/13/03) (hereinafter referred to as “Moreira et al.”), which is hereby incorporated herein by reference in its entirety.
The second limitation is that a static periodic schedule is very difficult to implement at a system level, because it requires very exact global synchronization for the starting/stopping of each task. As mentioned before, most multiprocessor systems have distributed, self-timed synchronization. For the case assumed in the Moreira publication, this was not really a problem. Since each processor had its own VFS switch, a self-timed implementation of the static schedule (i.e., preserving the order, but scrapping the exact activation times), would always work, as each processor could set its own pace of execution, and its own power consumption level. The cited Moreira et al. publication also assumes that there is a shared VFS switch in the global cases. But in that publication, it is assumed that it is possible to easily statically schedule across all processors, so no care is given to minimizing power consumption for self-timed schedules.
A problem arises, however, in a processing platform with a global switch. Consider the task graph depicted in FIG. 1. A number of recurring tasks, depicted as nodes in the graph, with activation dependencies per iteration as expressed by the directed edges between them (meaning that the source task must terminate its current iteration before the sink task can execute), are to execute on a multiprocessor, in a self-timed manner (i.e., a task fires immediately when its dependencies are met—this is signaled by inter-task, inter-processor synchronization mechanisms, which can differ from one embodiment to the next). Each task can run only on a specific processor (this may be due to a mapping decision or to the existence of a dedicated/specialized processor for a specific task). In this example, the system includes three processors, denoted A, B, and C, respectively. The name of each task is prefixed by an indication of the processor onto which it is mapped. Thus, task A1 maps to processor A, task B2 maps to processor B and so on. In some instances, it may be advantageous to split up a task into two sub-tasks, with each running at a different VFS level. This is denoted by applying the above-described naming convention to the first-run task, and then appending an apostrophe (') to the same task name for the second-run task. For example, in FIG. 1, Tasks A1 and A1′ represent the same overall task, with a first part slated to be run at a first VFS level, and the second part slated to be run at a second VFS level.
After applying the algorithm described in the above-cited Moreira et al. publication or another algorithm to obtain a similar VFS schedule, a static periodic schedule for all tasks is obtained, along with a static periodic schedule for the switching between different frequency levels for the global VFS switch. The schedule consists of “segments” of time.
FIG. 2 depicts an ideal static schedule per processor for the exemplary graph of FIG. 1, assuming 2 Voltage-Frequency levels in the switch. Notice that the schedule works under the assumption of worst-case execution times for all tasks, and respects all data dependencies. As shown in the graph, Tasks A1 and B1 are run with a low V/F setting while processor C remains idle (due to task C1's data dependency on the output of task B1. At a timed switch to a high V/F setting (transition from Segment 1 to Segment 2), it is assumed that tasks A1 and B1 have completed. Accordingly, processors A and C run their respective tasks A1′ and C1. Processor B is now idle, because its next task, B2, is dependent on completion of both tasks A1′ and B1.
In a third switching period (Segment 3), the circuitry switches to a low V/F setting, and tasks A2, B2, and C1′ are run. At the end of the Segment 3, the circuitry switches back to a high V/F setting in Segment 4, and task A3 is run on processor A, while processors B and C, which according to the graph of FIG. 1 have no further tasks scheduled, remain idle.
The timing depicted in FIG. 2 is static and based on a global timer with timeout periods set to values that guarantee that the slowest task can run to completion before expiration of the timer. This means that the timing of each task initiation is dictated by the VFS level switching schedule. Now look at what can happen to the same program depicted in FIG. 1 when the execution of tasks is self-timed, but the switching of VFS levels is kept static periodic. In this example, assume that on one iteration, the execution times of tasks A1′ and C1 are half of the worst-case execution time. The resulting schedule is depicted in the scheduling graph of FIG. 3. Each task is permitted to run as soon as its data dependencies are satisfied, but the scheduling of the VFS switching is static periodic.
Since the execution is self-timed (i.e., as soon as possible with all data dependencies preserved) and the data flow model of execution is time-monotonic (i.e., faster-than-worst case finishing of a task activation or firing can only result in faster-than-worst case starting of any dependent task activations or firings), all dependencies are preserved. Furthermore, since a faster-than-worst-case finishing of a task can only result in a faster-than-worst-case start of a task (assuming that the new task's other data dependencies (if any) have also been satisfied), and taking into account that the schedule of the VFS switch can only be slower, the only possibility is that tasks will complete faster than required, and never later. Thus, all worst-case temporal requirements are honored.
The problem is that the circuitry is potentially expending much more power than on the static periodic schedule. This is evident from an inspection of FIG. 3. It had been planned to execute tasks A2, B2 and C1′ in low power mode (see FIG. 2, Segment 3), but due to the self-timed, as-soon-as-possible implementation of the schedule, most of their cycles are executed (needlessly, since this happens even before these tasks were required to start) at the high frequency/voltage of Segment 2, thus spending more power than required for real time behavior.
It is therefore desired to address the problem of how to schedule a global V/F switch in a way that saves energy consumption on a multiprocessor chip, while also guaranteeing satisfaction of all real-time application timing requirements.