The present invention relates generally to operating systems and architecture and more particularly to an operating system and run-time architecture for safety critical systems.
Aircraft systems that contain software are subject to functionality restrictions and the verification requirements specified in the RTCA/DO-178B (DO-178B) Standard, “Software Considerations in Airborne Systems and Equipment Certification.” The Federal Aviation Authority in conjunction with its worldwide counterparts recognizes and enforces adherence to this standard. In the RTCA/DO-178B standard, there are three concepts of interest defined, the first being “Levels of software criticality,” the second concept being protection, and the third, which is closely related to the second, is the concept of partitioning.
Software levels of criticality, as defined in the DO-178B standard, are defined as five differing levels (e.g. Levels A, B, C, D, E), where Level A represents software of the highest criticality and Level E the lowest in terms of the software's function in controlling safety critical function on the aircraft. Thus the standard provides a method to classify high criticality functions and tasks from lower level criticality functions and tasks. Safety critical standards from other industries may define this concept similarly.
The DO-178B standard defines partitioning as the separation of software levels of criticality in both time and space running on a single CPU. Thus a partitioned design provides both Time Partitioning and Space Partitioning. Time Partitioning is the ability to separate the execution of one task from another task, such that a failure in one task will not impede the execution of the other. Space Partitioning is defined as the separation of space for two partitions, such that one partition cannot corrupt the other partition's memory (space), or access a critical resource. The DO-178B standard defines protection as the protection of one partition from another partition, such that a violation of either time or space in partition has no effect on any other partition in the system.
Many existing task analysis and scheduling techniques exist in real-time preemptive operating systems today. One method of interest is Deadline Monotonic Analysis (DMA) and Scheduling (DMS) (reference Embedded Systems Programming see “Deadline Monotonic Analysis,” by Ken Tindell, June 2000, pp. 20-38.). Deadline Monotonic Analysis DMA) is a method of predicting system schedule-ability where the system is a CPU with multiple tasks that are to be executed concurrently. DMA requires that the analyst have the following basic information for every task to be scheduled in the system: 1) Task period, the task cycle or rate of execution. 2) Task Deadline, the time that the task must complete execution by as measured from the start of a task period. 3) The task's worst case execution time (WCET), the worst-case execution path of the task in terms of instructions converted to time. Armed with this basic information the analyst can use the DMA mathematics or formulas to predict if the system can be scheduled. i.e. whether all tasks will be able to meet their deadlines in every period under worst case execution scenarios. If the system can be scheduled then the system can be executed using a runtime compliant Deadline Monotonic Scheduler (DMS).
Existing Deadline Monotonic Schedulers use a dynamic method for determining individual task execution at runtime. At each timing interval, an evaluation is made at run-time to determine whether the currently executing task is to be preempted by a higher priority task, or whether a new task is due to be started on an idle system. This dynamic method achieves the goals of schedule-ability, but does introduce an element of variability, since the individual preemption instances and task initiation times may vary over successive passes through the schedule. For example, in an existing Deadline Monotonic Scheduler, individual task execution may be “slid” to an earlier execution time if the preceding task finishes early or aborts. Also, the number and placement of preemptions that take place are similarly affected, and so individual tasks may vary anywhere within the bounds defined by their DMS parameters.
Even though the amount of variability in existing Deadline Monotonic Schedulers is limited to the schedule parameters, it is nevertheless undesirable for certain applications where a higher degree of predictability and repeatability is desired, for example, DO-178B (avionics) and other safety critical applications
In a partitioned design, tasks inside of one partition communicate data via Application Programming Interfaces (APIs) or APplication/EXecutive (or APEX) has they are called in ARINC 653 compliant designs. The RTCA/DO-178B standard concept of protection requires that partitions be protected from each other such that a violation of either time or space in partition has no effect on any other partition in the system. This concept of protection applies to the APIs or APEX interfaces as well.
In ARINC 653 compliant designs, partitions are given access to the APEX interface during the partition's window of execution. During this window, a partition can request or send data to any resource available in the system via calls to the appropriate APEX interface.
In the case of the ARINC 653 compliant designs, all partitions have access to all of the APEX interfaces to request or send information. Thus, the standard has no concept for restricted use or protected services or restricted interfaces.
Many safety critical industries like aviation provide regulatory guidelines for the development of embedded safety critical software. Adherence to safety critical software design standards involves creation of design and verification artifacts that must support and prove the pedigree of the software code and its particular application to the assessed software criticality level.
Adherence to these safety critical standards typically means that designers will spend less than 20% of their time producing the actual code, and greater than 80% producing the required supporting artifacts, and in some cases the time spent producing the code can enter the single digits.
While adherence to these standards is meant to produce error-free embedded software products, the cost associated with the production of these products is high. As a result the producers seek as much reuse as possible. Due to the critical nature of these products in the industries that they serve, the safety critical standards also provide guidance for reuse.
The reuse guides, like those provided by the FAA for avionics designs, typically state that a software configuration item can be reused without additional effort if it has not changed, implying that its artifacts have not changed in addition to the code.
Today, only one standard exists for a partitioned software design in the safety critical world of avionics, that standard is the ARINC 653 standard. The ARINC 653 standard supports application partitions that could be reused across multiple applications, since the standard provides a common APEX or user interface to the Operating System functions. Using the APEX interface as specified in the standard, it is possible to write an application that does not change across multiple applications. Such an application would be a candidate for reuse and reduced work scope in its successive applications as defined by safety critical guidelines like thus provided by the FAA.
One of the flaws with specifying the user interface or APEX or API's as a part of the executable operating system code is that the underlying system hardware, like an aircraft avionics communications device or protocol and or other system hardware devices tend to change from program to program (or aircraft to aircraft).
In addition, most aircraft OEM's change aircraft specifications from aircraft to aircraft. Thus any changes in the user interface, APEX or API's will cause changes in the application software or application partitions. Once the software or its artifacts have changed, its chances for reuse via a reduced work scope as provided by industry guidance, like that of the FAA, has evaporated. Architectures which separate the Operating Systems user interfaces from the hardware device or services interfaces better serve reuse claims.
In summary, existing safety critical operating systems contain many noticeable drawbacks, among these are the following:
1) They do not ensure that the individual tasks grouped within a partition will be individually time partitioned.
2) They do not provide the flexibility to space partition multiple tasks of the same criticality either individually or in subgroups.
3) The architecture requires the operating system to provide all Application Programming Interfaces (API's) or APEX's in the case of ARINC 653, to all partitions.
4) Access to system hardware or CPU resources is provided by operating system via the API (or APEX in the case of ARINC 653), thus the interface for these resources is controlled by the operating system, and could change from platform to platform, limiting the ability to reuse software without change.
5) The architecture and API or APEX interfaces provide no mechanism for exclusive use of critical resources by a partition, the concept of protected resources.
6) The architecture and API or APEX interfaces are open to use by any caller and as such does not provide protection for each partition.
7) Runtime dynamic compliant Deadline Monotonic Schedulers do not limit task execution variability.