Historically, processors for embedded hard real-time markets typically have been characterized by a simple architecture, with short pipelines and in-order execution in order to ease the computation of the worst-case execution time (WCET). However, a significant percentage of real-time application systems require or may improve their functionality, if the processors could provide higher performance than these simple architectures. These tasks range from soft-real time applications like video-coding and decoding applications, to hard-real time applications like automotive, avionic applications and industrial control systems.
Multithreaded processors can provide this additional performance required by real-time systems. Multithreaded (MT) processors allow the execution of several tasks at the same time, which leads to both, higher performance and a reduced number of processors required in a given system. Moreover, resource sharing in these processors allows them to have good “performance/cost” and “performance/power consumption” ratios. This characteristic makes MT processors a good option for embedded real-time systems.
However, at the same time, resource sharing causes the execution of tasks to become highly variable, as tasks interact in an unknown way with the other hard real-time (HRT) and non-hard real-time (NHRT) tasks. This makes MT processors unsuitable for embedded real-time systems. That is, there is a major drawback that impedes the use of MT processors in embedded real-time systems: the execution time of tasks on a MT processor becomes highly unpredictable as tasks share resource dynamically at execution time in an unpredictable way. Under these circumstances, the real-time constraints of embedded systems cannot be ensured.
At a hardware level, the unpredictability introduced by sharing resources has been avoided by providing full isolation in the execution of hard-real time tasks. That is, the hardware allows the execution of a HRT task and NHRT tasks by providing an interaction-free execution of the HRT task. In this category we find the documents [A. El-Haj-Mahmoud, A. S. AL-Zawawi, A. Anantaraman, and E. Rotenberg. Virtual Multiprocessor: An Analyzable, High-Performance Microarchitecture for Real-Time Computing. Proceedings of the 2005 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES '05), pp. 213-224, September 2005] and [Infineon Technologies AG. TricoreTM2 Architecture Manual]. In the former document, authors propose an architecture that provides an interference-free execution between threads in a MT architecture that allows the execution of HRT threads. This proposal requires the architecture of the processor to be changed so that no interaction between threads is allowed. In the latter, the execution of a background NHRT task is allowed only when the foreground HRT task experiences an instruction cache miss. The NHRT task can never delay the HRT task.
In order to achieve full isolation, the proposed architecture impedes any interference between threads. This requires significant changes at the architecture level: all buffers are duplicated for each thread, functional units are assumed totally pipelined so each one can process one new operation per cycle, normal caches are not allowed, etc.
In a European patent application (from the same applicant of the present application) with title “A multithreaded processor and a mechanism and a method for executing one hard real-time task in a multithreaded processor” it is proposed a method and a mechanism to execute a single HRT task together with several NRT tasks and SRT tasks. In the proposed solution the software provides the hardware with the slack time of the HRT task. Every time the HRT task may be delayed by a NHRT task the hardware extract this delay from the remaining slack time. If the remaining slack time reaches a minimum value the NHRT tasks are stopped preventing the HRT task to miss its deadline.
As described above, the main source of unpredictability in a MT processor is inter-task conflicts. It is possible to say that a task suffers an inter-task conflict in a shared resource when said task is going to use a resource and it is used by any other task. In single-thread architectures, in which only one thread is running at a time, threads almost do not suffer inter-task interferences. Only when a task is scheduled out of the processor and eventually it is re-scheduled back, it can suffer inter-task conflicts, for example in the cache, as the task(s) running while it was scheduled out evicted part of its data. However, given that tasks do not run at the same time, the number of inter-task conflicts is low.
In a multithreaded processor, however, threads suffer many more inter-thread conflicts as they share at the same time processor resources. The effect that inter-task effect has on a given task depends of several factors: the number of shared resources running in the architecture, how many threads share said resources at the same time, and the type of resource.
Once we fix a given multithreaded architecture, the effect of the inter-task conflicts depends mainly on the other tasks a given task is co-scheduled with. A common methodology to analyze the sensitivity of such task to inter-task conflicts consists of running such task in a set of different ‘stressing’ workloads. These workloads are composed by tasks that stress the different shared resources, so that the task under study is affected.
This method presents several disadvantages. Firstly, the threads in the stressing workloads have to be carefully designed so they properly stress a given resources. On the other hand, it is hard to obtain a minimum level of stress on all shared resources at the same time. Finally, for every new generation of multithreaded processor in which the shared resources change, a redesign of the stressing workloads is required.