Modern operating systems for computing devices typically support parallel programming to enable concurrent execution of a plurality of applications running different tasks to increase device throughput or speed up interface responses. Usually, concurrent execution require synchronization to coordinate asynchronous activities among different tasks. Most operating systems provide synchronization primitives to facilitate synchronization for programming concurrent tasks. However, depending on how such primitives are implemented, the overhead incurred to allow multiple tasks to execute in parallel may differ from one operating system to another.
For example, multiple threads or processes may be synchronized via synchronization primitive (or synchronizer) such as mutex (a thread may be the minimum scheduling entity of a process for an application). Typical implementation for a mutex may include an interlock in user-space memory that may require continuously polling the contents of the memory by each thread or process trying to participate in the use of the mutex (via a primitive, lock, unlock, try, etc.). As a result, valuable processing resources are wasted when a thread or process interacts with that mutex.
Additionally, an interlock based synchronization primitive may degrade responsiveness of real-time threads or processes. For example, a preemptive task scheduler, as commonly adopted in most operating systems, may force a thread or a process associated with an interlock to be removed from a processor with a preemptive event. An interlock implemented in user-space memory may be associated with the thread or process without kernel interactions. As a result, the scheduler does no know that the interlock is held by the particular thread or process when it is preempted. A subsequent real-time thread or process that attempts to interact with the same interlock may have to (e.g. at least temporarily) relinquish its real-time characteristics in order to allow the thread or process that was preempted to release the interlock. It must do so without the scheduler knowing which thread or process needs to be scheduled to release the contended interlock. This is known as an unbounded priority inversion—something that must be avoided in real-time systems.
A typical solution is to move the complete implementation of the synchronizer into the kernel, so the scheduler always knows which thread or process holds the interlock and can prevent preemption for the duration. Although such a solution does not introduce unbounded priority inversions, it adds latency to each synchronization operation in order to make the transition to the appropriate kernel interface.
Therefore, traditional implementations of synchronization primitives for computing devices fail to support real-time characteristics of modern multitasking applications without incurring a latency penalty.