The present invention relates to systems and methods for managing computing resources, and more particularly to systems and methods for scheduling and controlling asynchronous tasks to provide deterministic behavior in time-partitioned operating systems.
An Operating System (“OS”) is software that manages computing hardware on behalf of user applications. The OS manages hardware resources such as the computer memory, the Input/Output (I/O) devices including the hard drive and the network interface, and so forth. One of the most important hardware resources managed by the OS is the central processing unit (“CPU”). The OS allocates time on CPU to each application, one at a time, by means of a scheduling algorithm that selects which application (called a process when executing) will be run on the CPU next. The OS itself must run on the CPU in order to execute the scheduling algorithm, so whenever a process calls the operating system (e.g., through an I/O system call), the OS is invoked and run on the CPU. It then selects the next process to run. The OS also sets a hardware timer to expire on a periodic basis. When the timer expires, the hardware invokes the OS, interrupting the running process so that the OS can select a new process to run, based on the scheduling algorithm. A computer chip that contains more than one CPU is called a multicore processor. Each core is a CPU. The OS then schedules a different process for each core. A number of scheduling algorithms are available for an OS to use. Some algorithms provide good responsiveness to user input in a Graphical User Interface (GUI), e.g., by providing more CPU time to the application running in the window on the top of the GUI desktop. Some algorithms provide CPU time in order to improve the likelihood that each process meets any declared deadlines. A Real Time Operating System (RTOS) uses scheduling algorithms that provide strong guarantees for meeting deadlines.
A partitioned operating environment is a special type of OS that strictly manages all shared hardware resources (such as the CPU, memory, and I/O) so that each application is guaranteed to receive its allocated share of the managed resources during any specified time interval, where this interval is sometimes called the “major time interval”. Each application receives a portion of the time on a time-partitioned resource, such as the CPU, called a partition window. During that window of time, the application has sole access to the resource and to the exclusion of all other applications.
In many time-partitioned operating systems (including, but not limited to ARINC 653 partitioned operating environments), a repeating major time frame is used to periodically run all applications in the system. Each application is statically scheduled during one or more of the partition windows during the major time frame. FIG. 1 illustrates three partitions allocated in a major time frame. FIG. 1 shows the time allocated to each partition for two consecutive major time frames. It also includes a portion of unused time in each major time frame. As can be seen, each partition receives a deterministic amount of CPU time during each major time frame. The operating system enforces this allocation so that no partition uses more than its allotment and no partition can interfere with other partitions. The partitions may themselves contain not only applications, but an entire operating system. The basic principle of time partitioning remains the same.
The gap in time between the partitions shown in FIG. 1 represents the overhead time required by the operating system to stop one partition and start another. This is sometimes called the partition switch time. The partition switch time varies between the minimum time (Best-Case Execution Time) and the maximum time (Worst-Case Execution Time) of the operating system task responsible for switching partitions off and on the CPU. This variation is called the jitter, illustrated in FIG. 2. Some time-partitioned operating systems, including many implementations of an ARINC 653 partitioned operating environment, attempt to minimize the jitter so that partitions start at nearly the same time within the major time frame as possible for each repetition of the major time frame, so that their period of execution is nearly constant. That is, their period of execution is equal to the major time frame duration, or nearly so. The deviation from this constant period could be as large as the sum of the jitter for all prior partition switches during any particular major time frame.
The problem with the standard approach is that time-partitioned operating systems generally do not permit the use of asynchronous tasks such as interrupts. This is because interruption of a partition could interfere with its allocation of time on the CPU or affect its performance in other ways (such as reducing cache hit rates). However, interrupts are the most commonly understood and utilized mechanisms in computer systems for dealing with events that occur asynchronously (such as the arrival of an I/O signal to the computing hardware). Prohibition of interrupt mechanisms for handling I/O forces use of the lower-performance “polling” mechanism, whereby a partition only acts on an I/O event when it is scheduled, which could result in latency of an entire major time frame or more. FIG. 3 illustrates this delay in responding to an input signal because the system cannot handle the I/O until the associated partition (Partition 1 in this example) is scheduled. In essence, the standard approach suffers high latency in I/O response as a consequence of requiring a certain kind of determinism (very small variability in the period of execution for partitions).