The present invention relates generally to a system and method for implementing mutual exclusion locks.
The advent of operating systems that support multithreaded languages such as Java has vastly improved the utility of computers. Multithreaded environments present special problems, however, because it is possible for threads to compete for the same resources. Such resources include sections of code, data structures, and system peripherals. If resources are left unprotected, many undesirable states are possible. For example, a race condition, in which multithreaded code becomes dependent upon the order of completion of two or more independent activities, may arise. Race conditions are possible in any program where independent threads access the same resources in an unpredictable order. See, e.g., Cohen and Woodring, 1998, WIN32 MULTITHREADED PROGRAMMING, O""Reilly, New York. In general, any failure in a multithreaded environment to ensure that a resource, such as a data structure, is not being operated on by a thread may result in other threads seeing data that is in an inconsistent, intermediate state. To prevent such undesirable states, functionality is typically provided in multithreaded environments to ensure that each resource is accessed by, at most, a single thread at any given instance.
One method that is used to ensure that multiple threads do not simultaneously claim a particular resource is to protect the resource with a mutual exclusion lock (xe2x80x9cmutexxe2x80x9d). Typically, each resource is protected with a unique mutex. At most one thread may hold a given mutex at any time, so if a programmer arranges to access a resource only while holding a mutex, the programmer is guaranteed that at most one thread accesses the resource at any given time. The process by which a requesting thread verifies that no other thread is holding a mutex is known as synchronization.
Synchronization is accomplished using acquire and release operations. In the acquire operation, the requesting thread claims exclusive ownership of a mutex, or the requesting thread is blocked until exclusive ownership of the mutex is available. In the release operation, the requesting thread releases ownership of the resource. These acquire and release operations are normally implemented using hardware provided atomic machine instructions such as test-and-set, atomic compare-and-swap, or load-locked/store conditional. Such atomic machine instructions require a large number of central processing unit cycles to execute because they necessarily involve memory ordering operations visible outside a single processor.
Because of the expensive nature of atomic machine instructions, techniques have been developed to avoid their use on uniprocessor computers, where each thread runs by itself until the processor receives an interrupt from an external source. See e.g., Moss and Kohler, Proceedings of European Conference on Object-Oriented Programming, Paris, France, 171-180, 1987; Lecture Notes on Computer Science, ed. Bezivin et al., Springer-Verlag, 276, 1986. Central to these techniques is the requirement that no thread be interrupted while in an atomic sequence. Such a requirement may be implemented directly in the operating system or in other parts of the system. Thus, when the operating system wishes to interrupt the running thread, the operating system will query whether the running thread is in an atomic sequence. If the running thread is in an atomic sequence, the running thread will be either backed up or pushed forward using an interpreter so that it is no longer in the atomic sequence. Once the running thread is outside of an atomic sequence, it can be interrupted. After the running thread has been interrupted, another thread can operate on the mutex. Such prior art techniques work well in a uniprocessor environment because the only way for a requesting thread to start running is for the currently-running thread to be interrupted. Thus all that is required to guarantee atomicity in a uniprocessor environment is for threads to not be interrupted during an atomic sequence. However, such techniques do not work in multiprocessor environments because it is possible for more than one thread to run simultaneously.
Another prior art technique that has been used to minimize the cost of synchronization is to provide equivalent thread safe and non-thread safe versions of routines, as is done in various UNIX systems, such as Compaq Tru64 UNIX. A careful programmer can control the amount of synchronization in code by selecting thread safe routines only when they are required. The problem with such an approach is that it is difficult to predict in advance whether thread safety will be needed in future versions of code. Furthermore, reliance on the intuition of a programmer to identify what objects should be thread safe and what objects do not need to be made thread safe is time consuming and error prone. A cautious programmer must always assume that thread safety may be required at some time in the future, even if it is not needed now.
Yet another prior art technique used to reduce synchronization costs are runtime systems, such as that in Compaq Tru64 UNIX, that have two different code sequences for mutex operations, one for single-threaded programs and the other for multithreaded programs. The decision of whether to use single-threaded or multithreaded code sequences may be made at link time, load time, or while the program is running. However, this technique does not minimize the cost of atomic synchronization once a commitment to multithreaded code is made because there is no attempt to avoid atomic instructions in the multithreaded versions of the code. Thus, when the multithreaded runtime system is chosen over the single-threaded runtime system, expensive atomic synchronization operations are still required.
Accordingly, it is an object of the present invention to provide a system and method for reducing the costs of synchronization in a multithreaded environment regardless of the number of processors in the environment and without any requirement that the programmer pick between thread safe and thread unsafe versions of code.
A system and method is provided for controlling a request to acquire a target mutex. The target mutex is capable of designating whether it may be synchronized using a fast nonatomic load/store sequence or expensive atomic hardware instructions. The method by which a target mutex is synchronized, in response to a request by a requesting thread, depends on whether the target mutex has designated the fast nonatomic synchronization sequence or the expensive atomic synchronization sequence. When using the fast nonatomic synchronization sequence, the target mutex can further designate a thread that is currently associated with the mutex.
Each mutex has an xe2x80x9cassociatedxe2x80x9d thread, even when the mutex is not held by any thread. The thread associated with a mutex can synchronize with the mutex using the fast nonatomic synchronization sequence, while any other thread must use a more complex procedure for synchronizing with the mutex.
When the target mutex has designated the fast synchronization sequence, a determination is made as to whether the requesting thread is the thread associated with the target mutex. If the requesting thread is the thread that is associated with the target mutex, the request to acquire the target mutex is granted without atomic hardware operations. However, if the requesting thread is not the thread associated with the target mutex, a verification procedure is executed to ensure that no thread is operating on the target mutex.
When the target mutex has designated the expensive atomic synchronization sequence, the mutex is synchronized using atomic machine instructions to ensure that no thread is operating on the target mutex. Then, the request to acquire the target mutex has been completed.
In one embodiment of the present invention, the verification procedure in the fast synchronization sequence may comprise forcing the target mutex to designate the expensive synchronization method and therefore defaulting to synchronization using the expensive technique. In another embodiment of the present invention, the verification procedure involves execution of a heuristic function to determine whether to set the mutex to the expensive synchronization method or to associate the requesting thread with the target mutex. This heuristic function may, for example, use a mutex request counter. The counter is incremented each time the mutex is acquired, and decremented by some constant each time the thread associated with the mutex is changed. When the counter is above a threshold, the requesting thread is associated with the target mutex. When the counter is below the threshold, the expensive synchronization technique is executed.