A computer system is usually comprised of one or more “Commercial Off The Shelf” (COTS) processors—for example, microcontrollers or microprocessors—and some software that will execute on such processor(s): this software may be created, for example, using a programming language such as ‘C’ or Ada.
In many cases, processors are “embedded” inside larger systems, including cars, aircraft, industrial and agricultural machinery, medical equipment, white and brown goods and even in toys. It is estimated that people in the developed world encounter around 300 of such “embedded systems” every day while going about their normal activities.
Other related uses of computer systems include real-time “desktop” applications, such as air-traffic control and traffic management.
When creating such computer systems, developers must choose an appropriate system architecture. One such architecture is a “time-triggered” (TT) architecture. Implementation of a TT architecture will typically involve use of a single interrupt that is linked to the periodic overflow of a timer. This interrupt may drive a task scheduler (a simple form of “operating system”). The scheduler will—in turn—begin the execution of the system tasks (a process sometimes called “releasing” the tasks, “triggering” the tasks or “running” the tasks) at predetermined points in time. The tasks themselves are typically named blocks of program code that perform a particular activity (for example, a task may check to see if a switch has been pressed): tasks are often implemented as functions in programming languages such as ‘C’.
This type of TT design can offer very predictable system behaviour, making it comparatively easy to test and verify the correct operation of real-time computer systems that are based on such an architecture: this is one reason why TT designs are often used in safety-critical systems, high-integrity systems and in other products where system reliability and/or security are important design considerations.
While it is possible to create a working system with a single task, most TT designs involve multiple tasks. More generally, there may be several different sets of tasks, with each set of tasks matched to a particular system mode. For example, most systems will have at least a “Normal” task schedule and a different (sometimes very simple) “Error” or “Fail Silent” schedule for use when unrecoverable errors have been detected. More commonly, there may be several different “normal” system modes, each with different task schedules: for example, a passenger car may have different task schedules for use when manoeuvring at low speeds and for motorway driving.
Traditional approaches to changing system modes in TT designs involve mechanisms for adding/removing tasks from the schedule. For example, the TT task scheduler described in Reference 1 is widely used: it provides SCH_Add_Task( ) and SCH_Delete_Task( ) functions that can be called at any time while the scheduler is running. Such mechanisms for changing system modes have the benefit of simplicity, and can work effectively: however, they also open up opportunities for introducing a number of very significant reliability and security problems.
It will be appreciated that TT schedules are—by their very nature—static in nature, and a key strength of this development approach is that a complete task schedule can be carefully reviewed at design time, in order to confirm that all system requirements have been met. In general, it is extremely difficult to change the system mode in such designs using conventional methods without undermining this static design process. When tasks can be added or removed from the schedule at “random” times (perhaps—for example—in response to external system events), then the system design becomes dynamic (it is no longer “time triggered”), and it is not generally possible to predict the precise impact that the mode change will have on all tasks in the schedule.
It will be appreciated that—even where the perturbations to the behaviour of a TT system during traditional mode changes are short lived—this may still have significant consequences. TT designs are often chosen for use in systems where security is an important consideration. In such designs—because the task schedule is known explicitly in advance—it is possible to detect even very small changes in behaviour that may result from security breaches (for example, if the system code has been changed as the result of a virus, etc). In circumstances where dynamic changes to a task set are permitted (as in traditional mode changes), this may mask security-related issues.
Traditional approaches to mode changing in TT system may also have a negative impact on the processes of error detection and error handling in such designs. One important class of errors relates to execution time. In particular, it is important that each task completes its activities within a pre-determined “worst case execution time” (WCET). When a task exceeds its predicted WCET (a situation that is sometimes referred to as a “task overrun”), this can have a highly detrimental impact on the behaviour and reliability of the whole system, because it breaches assumptions that were made when the system was developed. In some systems, a task overrun may result in fatal consequences for the system users or those in the vicinity of the system. Thus, it is important that task overruns are detected quickly and handled effectively.
It will also be appreciated that where a task completes more quickly than its predicted “Best Case Execution Time” (BCET)—a situation sometimes referred to as a task underrun—this may also be symptomatic of problems with the system.
One common way of detecting and handling task overruns (and, sometimes, task underruns) is through the use of watchdog timers (WDTs): see, for example, Reference 2. It will be appreciated that—when used in this way—the appropriate WDT settings will vary with the task set. It is inevitable, therefore, that—if tasks are added or removed dynamically from the task set as part of a mode-change operation—it may then be appropriate to change the WDT settings. Unfortunately, it may not be possible to make such changes, since most modern microcontrollers are deliberately designed in such a way that alterations to the WDT settings can only be made immediately after a system reset. The intention with such a design feature is to reduce the likelihood that the WDT settings will be corrupted during the system operation: were such corruption to take place, this might mean that the WDT was disabled altogether, or triggered to run at the wrong time. In either event, the consequences could be severe, and preventing inadvertent or malicious changes to the WDT setting during normal operation makes very good sense in many system designs.
Where it is not possible to change the WDT settings during the programme run, settings must be chosen that will work with all task combinations, both in all system modes and during the transition between system modes. The inevitable consequence is that the resulting settings will be sub-optimal for some or all system modes. The end result is likely to be that a long timeout period must be employed: this will mean that the WDT mechanism can respond only comparatively slowly when an error occurs. This slow response may have very serious consequences in many systems, and it further undermines one of the key advantages that can be obtained from a TT design (that is, very predictable timing behaviour and—consequently—the potential for a very rapid response if the behaviour is not as expected).
A number of techniques have been proposed previously as a means of addressing problems related to those presented here.
For example, a form of WDT has been proposed that allows changes to the configuration during the program run (see Reference 3). This solution requires specialised hardware (that is, it cannot be applied with most COTS processor platforms). Where such specialised hardware is available, there is the possibility that the WDT settings will be corrupted during the system operation (which is why such changes are prohibited in most WDT implementations, as discussed earlier in this document).
As another potential solution to some of the problems raised earlier in this document, software-based “task guardian” mechanisms have been proposed (see Reference 4). Such task guardian mechanisms may provide a way of dealing with task overruns very quickly. However, such software mechanisms can substantially increase the amount of code required to implement the system. This in turn can increase system costs and complicate both the development and maintenance process for the system. In addition, the use of software task guardians still involves the replacement of one task with another while the scheduler is running: this type of dynamic solution may—for the reasons outlined above—cause a number of problems in TT system designs.
Other task-guardian mechanisms, typically involving modified computer hardware plus additional software have also been proposed as ways of detecting and handling task overruns in embedded computer systems (see, for example, Reference 5 and Reference 6). Such approaches can be effective, but require specialised hardware that will not be available in the majority of systems. In addition, this approach may—again—involve replacement of one task with another while the scheduler is running, with the consequences discussed previously.
Overall, there is a widespread need to be able to support controlled and predictable mode changes—both under normal conditions and in the event of errors—when developing embedded systems with a TT architecture. The traditional process of changing modes in such designs opens up a number of potential reliability and security loopholes.
It is therefore an objective of embodiments of the present invention to improve the reliability and security of time-triggered computer systems by providing a framework that provides a comprehensive solution to the problem of mode changing, both under normal circumstances and when errors—including task timing errors—are detected.