Multiprocessing systems continue to become increasingly important in computing systems for many applications, including general purpose processing systems and embedded control systems. In the design of such multiprocessing systems, an important architectural consideration is scalability. In other words, as more hardware resources are added to a particular implementation the machine should produce higher performance. Not only do embedded implementations require increased processing power, many also require the seemingly contradictory attribute of providing low power consumption. In the context of these requirements, particularly for the embedded market, solutions are implemented as “Systems on Chip” or “SoC.” The assignee of the present application, MIPS Technologies, Inc., offers a broad range of solutions for such SoC multiprocessing systems.
In multiprocessing systems, loss in scaling efficiency may be attributed to many different issues, including long memory latencies and waits due to synchronization. The present invention addresses improvements to synchronization among threads in a multithreaded multiprocessing environment, particularly when individual threads may be active on one or more multiple processors, on a single processor but distributed among multiple thread contexts, or resident in memory (virtualized threads).
Synchronization in a multithreaded system refers to the activities and functions of such a multiplicity of threads that coordinate use of shared system resources (e.g., system memory and interface FIFOs) through variables storing “state” bits for producer/consumer communication and mutual exclusion (MUTEX) tasks. Important considerations for implementing any particular synchronization paradigm include designing and implementing structures and processes that provide for deadlock-free operation while being very efficient in terms of time, system resources, and other performance measurements.
Synchronization of processes using software and hardware protocols is a well-known problem, producing a wide range of solutions appropriate in different circumstances. Fundamentally, synchronization addresses potential issues that may occur when concurrent processes have access to shared data. As an aid in understanding, the following definitions are provided:                Critical Section—A section of code that reads/writes shared data;        Race Condition—Potential for interleaved execution of a critical section by multiple threads, resulting in non-deterministic behavior;        Semaphore—High-level synchronization mechanism to avoid race conditions and to provide for orderly transfer of shared data between threads;        Mutual Exclusion (MUTEX)—Also a synchronization mechanism to avoid race conditions by ensuring exclusive execution of critical sections; a MUTEX is a binary semaphore;        Deadlock—Permanent blocking of threads; and        Starvation—Execution with insignificant and unfair progress.        
Conventional implementations of a MUTEX include software reservation, spin-locks and operating system based mechanisms. Software reservation includes registration of a thread having an intent to enter a critical section, with the thread waiting until assured that no other thread has registered a similar intention. Spin-locks use memory-interlocked instructions that require special hardware to ensure that a given shared resource may be accessed (e.g., a memory location can be read, modified and written with interruption). Operating system mechanisms for MUTEX include semaphores, monitors, message passing and file locks. Software reservation is available for both uniprocessors and multiprocessors but have different types of overheads and memory requirements.
Concurrent processes and concurrent threads often need to share data (maintained either in shared memory or files) and resources. When there is not controlled access to shared data, some processes/threads will obtain an inconsistent view of this data. The action performed by concurrent processes/threads will then depend on the order in which their execution is interleaved.
When a process/thread executes code that manipulates shared data (or resource), it is said that the process/thread is in its critical section (for that shared data/resource). Execution of critical sections must be mutually exclusive—at any time only one process/thread is allowed to execute in its critical section (including with multiple CPUs). Each process/thread must therefore be controlled when entering its critical section. The well-known critical section problem is to design a protocol/mechanism that processes/threads use so that their action will not depend on the order in which their execution is interleaved (including the case for multiple processors).
Requirements for valid solutions to the critical section problem include (1) mutual exclusion, (2) progress, and (3) bounded waiting. Progress refers to limitation of which processes/threads may participate in a decision of which process/thread will next enter its critical section in a way that the selection cannot be postponed indefinitely. Bounded waiting provides for a bound on a number of times that the other processes are allowed to enter their critical section once a process has made a request to enter its critical section (otherwise the process suffers from starvation).
Drawbacks of software solutions include: (1) processes/threads that are requesting entry to their critical section are busy waiting (consuming processor time needlessly), and (2) when critical sections are long it is more efficient to block processes that are waiting. Hardware solutions include interrupt disabling and use of special machine instructions. Interrupt disabling is generally not an acceptable solution in a multiprocessor environment because mutual exclusion is not preserved. Special hardware instructions can be used to provide mutual exclusion but need to be complemented by other mechanisms to satisfy the other two requirements of the critical section problem (and avoid starvation and deadlock). Typically additional machine instructions are added that perform two actions atomically (indivisible) on the same resource (e.g., reading and writing to a memory location). Advantages of special synchronization-related machine instructions are that they are applicable to any number of processes/threads on either a single processor or multiple processors sharing memory, they are simple and easy to verify, and they can be used to support multiple critical sections. Disadvantages are that busy-waiting consumes processor time, starvation is possible when a process/thread leaves a critical section and more than one process/thread is waiting, and deadlock. Operating system solutions include use of semaphores. A semaphore can be an integer variable that is accessed during operation through atomic and mutually exclusive operations. An implementation of a semaphore can avoid busy waiting—when a process/thread has to wait, it is put into a blocked queue of processes/threads waiting for the same event.
Details regarding the MIPS processor architecture are provided in D. Sweetman, See MIPS Run, Morgan Kaufmann Publishers, Inc. (1999), which is incorporated by reference in its entirety for all purposes.
What is needed is a simple, efficient mechanism for providing a hardware solution to mutual exclusion in a multithreaded (including multiprocessors) concurrent environment that overcomes the drawbacks of existing solutions, particularly for a processor core using a reduced instruction computer system (RISC) architecture that limits use of additional special purpose instructions for synchronization.