1. Field of the Invention
The invention relates to multi-processor computing. Specifically, the invention relates to apparatus, systems, and methods for automatically minimizing real-time task latency and maximizing non-real time task throughput.
2. Description of the Related Art
Mainstream computer systems are currently moving from conventional single processor architectures, also known as Uniprocessor (UP), to multi-processor architectures. In particular, Symmetric Multi-Processor (SMP) architectures or environments are becoming more widely used in various fields. In an SMP environment, typically, hardware computing resources such as memory, communications bus, I/O devices, and the Operating System (OS) are shared by two or more processors. The multiple processors cooperate as peers each having an equal ability to service tasks assigned by the OS.
A multi-processor environment or SMP system can be implemented with multiple physical processors connected to a common bus. Alternatively, due to advances in processor technology a single physical processor may be used but treated as multiple logical processors by the OS and all other computer system components. One example of this technology is hyperthreading. As used herein, references to SMP system(s) and multi-processor environment(s) refers to any computer system that includes a plurality of physical and/or logical processors. Similarly, references to “processor” include both physical processors and logical processors.
SMP systems are being used both for general purpose computing such as desktop PCs and for more specialized applications such as embedded computing systems. Operating systems for the general purpose and embedded computing systems are adapting to most efficiently service both general desktop computing needs and specialized embedded computing needs. One example of this adaptation is the ability of a single SMP OS to properly manage both Non-Real Time (NRT) tasks and Real-Time (RT) tasks. As used herein, the term “task” refers to the smallest portion of a software program that the OS can switch between a waiting or blocked state and a running state in which the task executes on a particular processor. Tasks can come from a single software program or a plurality of software programs and typically include portions of the OS.
As used herein, a NRT task refers to software code that does not have explicit time constraints for when computing services are to be provided to the task. Conversely, RT tasks are tasks for which the task has a predefined maximum threshold for delay between requesting a service from the computer system and having the request fulfilled. Failure to service the RT task within the threshold can cause serious failure of the task and/or systems managed by the RT task. Furthermore, RT tasks include both hard real time tasks and soft real time tasks. Hard real time tasks require an absolute guarantee of response time below the maximum threshold. Soft real time tasks require a very high probability that the response time is below the maximum threshold, but not an absolute guarantee.
Where the threshold is set to define a NRT task or a RT task depends on the context. For a mission critical task, such as respiratory oxygen content in a hospital patient, the threshold could be measured in tens of microseconds. Such a critical task is one example of a hard real time task. For other RT tasks, soft real time tasks, the threshold could be measured in minutes, such as a weather station temperature sampling task. Another soft real time task example is a real time video or audio processing task. Failure to meet the maximum threshold for response time may result in detectable “skips” or degradation in quality, but not critical failures such as possible death.
Typically, NRT tasks are tasks that involve user interaction where long delays in response to user inputs results in a poor user experience but no loss of data, functionality, or a critical failure. However, because classifying a task depends so much on the context, NRT tasks and RT tasks are typically classified as such for the OS by the software developer. Generally, RT tasks have a very low service threshold and NRT tasks have a comparatively high service threshold.
The delay between when a task requests a service from the OS and when the service is provided is referred to as latency. Typically, the requested service is the assignment of a processor to execute the task. The service threshold defines a maximum tolerable latency for the task. As used herein, the term “latency” or “task latency” refers to the time between when the task requests a service from the OS and when the service is provided. The service may include assignment of a processor for task execution, exclusive access to a resource, and the like. Task latency typically includes other more specific latencies well known to those of skill in the art such as scheduling latency, task switching latency, and the like.
Multiple factors affect task latency. It is well known that modern OSs constantly change the task assigned to a particular processor in order to provide multitasking functionality. Consequently, the number of tasks managed by the OS can lengthen the task latency due to the increased overhead in handling each additional task. However, due to the critical nature of RT tasks, general purpose OSs have been modified to service the task latency requirements of the most demanding RT tasks in order to handle a worst-case scenario and ensure that the worst-case task latency still meets the RT task requirements.
In certain cases, real-time specific OSs (RTOSs) have been developed. Unfortunately, the RTOSs favor the RT tasks over the NRT tasks. Consequently, a NRT task may experience poor responsiveness on an RTOS system. Often, if the RTOS support NRT tasks, the NRT task is so delayed in responding to user inputs that the user notices a delay in response to a user-initiated action. The responsiveness of a NRT task in an OS is referred to herein as task throughput. Task throughput represents how quickly a task is able to complete a command and provide a response to the user. Task throughput also includes the number of units of work a task can complete in a given time period.
If NRT tasks and RT tasks are run together on the same computer system, the optimizations for servicing RT tasks adversely affect NRT task throughput. In certain cases, the NRT task throughput is affected regardless of whether any RT tasks are running on the system. Similarly, conventional optimizations to improve NRT throughput can adversely affect RT task latency. Consequently, the industry has experienced a trade-off in OSs attempting to service both RT tasks and NRT tasks. Until the present invention, the industry has been unable to satisfactorily minimize RT task latency and maximize NRT task throughput automatically.
Typically, NRT task throughput is sacrificed in favor of RT task latency. Currently, the OS scheduling algorithm is optimized such that RT tasks, if present, are assigned a processor ahead of NRT tasks. Consequently, in a typical Uniprocessor (UP) system, NRT tasks are generally preempted if a RT task becomes runnable. Preemption means the task currently executing on the processor is halted before the task has reached a natural termination point. The task is forced to wait while a higher priority task, such as a RT task, is assigned a processor. However, as noted above, under this approach, as the number of RT tasks increases the NRT task throughput decreases. Examples of these scheduling optimizations include a Priority Level Scheduler (PLS) and Multi-queue Scheduler (MQS).
The problem of optimizing the OS to minimize RT task latency and maximize NRT throughput is even more difficult in an SMP system. In a UP system, RT tasks can simply be tracked and provided priority over runnable NRT tasks. However, in SMP systems, there is currently no efficient way to determine whether a RT task exists on a processor other than the processor executing a NRT task.
In addition, in an SMP system the concurrent nature of multiple processors (each executing a different task) sharing resources such as data structures, memory, devices, and the like requires that access to the shared resources be controlled. The access to the resources is controlled such that only one processor and its currently executing task, are permitted to access the resource at a given time. This process of controlling access is referred to as serialization.
Serialization is particularly desirable to preserve data integrity when multiple tasks/processors can modify data in a shared resource. Preempting a task while writing to a shared resource can corrupt the data. Consequently, serialization should provide exclusive access for the task requesting the shared resource and exclude preemption. If one task has access to the resource, all other tasks are excluded from con accessing the resource until the one task has finished. Exclusive access is provided atomically meaning a task executes a single command to obtain access and is either successful or not, there is no opportunity to preempt the task while requesting the exclusive access.
Generally, serialization of SMP tasks to shared resources is controlled by locks. If a task desires exclusive access to a resource, the task requests the lock. If the lock is not held by any other task, the lock is atomically provided to the requesting task. If the lock is held by another task, the requesting task often enters a loop in which the task continually requests the lock until the lock becomes free. Once a task holds a lock, the task modifies or otherwise uses the shared resource in some manner and then releases the lock. Typically, a lock is implemented using a boolean value, False if the lock is available and True if the lock is being held.
FIG. 1 illustrates a conventional multi-processor environment 100 with NRT tasks 102 and RT tasks 104 that share exclusive access to a common resource 106. The environment 100 includes a memory 108, a plurality of processors 110, also referred to as Central Processing Units (CPUs) 110, and a communications bus 112. The memory 108, CPUs 110, and communications bus 112 are well known. The CPUs 110 are identified by subscripts 1, 2, . . . n. Those of skill in the art will recognize various different hardware configurations for a multi-processor environment 100.
The memory 108 includes a set of executable code that includes a multi-processor operating system 114 such as an SMP OS 114 and a data section 116. The SMP OS 114 includes a task manager 118 also referred to as a scheduler 118 and a runqueue 120 associated with each CPU 110. The runqueue 120 include subscripts 1, 2, . . . n corresponding to the associated CPU 110. The data section 116 includes task-specific data as well as data structures shared by the tasks. Certain exclusive access data structures are controlled by locks 122. Different locks 122 are represented using alphabetic identifiers and an arrow 124 to the associated resource 106, such as a data structure.
By way of example, in the current multi-processor environment 100, CPU1 110 is executing a NRT task 102 that has acquired the lock A 122a for resource 106. At substantially the same time, a RT task 104 on CPUn 110 has become runnable and begins to run on CPUn. Furthermore, one of the first instructions executed by RT task 104 is to acquire lock A 122b. 
This presents a problem. The task latency will be increased because the NRT task 102 is holding lock A 122a which the RT task 104 needs and the NRT task 102 is not preemptable. Furthermore, depending on which type of lock 122 NRT task 102 is holding interrupts could be disabled. This means that the RT task 104 must wait for the NRT task 102 to release the lock before the RT task 104 can perform its work. In addition, if interrupts are enabled, an interrupt arrive while the NRT task 102 holds the lock. The interrupt may be long-running such that the interrupt also delays the release of the lock by NRT task 102. The delays caused by the NRT task 102 and/or interrupts are generally unacceptable and the RT task latency maximum threshold is consequently exceeded.
In a conventional multi-processor environment 100, the solution is to defer to the needs of the RT task 104. One proposed solution is to include multiple preemption points 126 in the code of the NRT task 102. Alternatively, if the NRT task 102 is executing object code in the kernel of the OS, the preemption points 126 are in the kernel. The preemption point 126 is executed indiscriminately. There is currently no way for a NRT task 102 to avoid the preemption point 126 and its associated delay.
A preemption point 126 is a predefined point in the object code when a developer has determined that the NRT task 102 can voluntarily give up ownership of the CPU1 110. Generally, as part of executing the preemption point 1126 the NRT task 102 will also release any locks 122 being held. Typically, the preemption points 126 are in the kernel object code and the NRT task 102 is forced to give up the CPU1 110.
Preemption points 126 ensure that the most time added to the RT task latency is the time between preemption points 126. RT tasks 104 are not delayed by a NRT task 102 holding a lock 122 too long. Preemption points 126 also introduce overhead as the NRT task 102 performs steps to preserve its state, release any locks 122, sleep, and then resume operations after the preemption point 126.
Unfortunately, indiscriminate execution of preemption points 126 incurs this overhead delay even if there are no RT tasks 104 in the environment 100. The overhead delays caused by mandatory preemption points 126 unnecessarily reduce the NRT task throughput.
From the foregoing discussion, it should be apparent that a need exists for an apparatus, system, and method for automatically minimizing RT task latency and maximizing NRT task throughput in a multi-processor environment. Beneficially, such an apparatus, system, and method would conditionally execute preemption points in response to the presence or absence of a runnable RT task in the multi-processor environment. In addition, the apparatus, system, and method would automatically and optimally handle both RT tasks and NRT tasks, incur minimal processing overhead, and prevent shared resource contention between NRT tasks and RT tasks.