It is common in today's multiprocessing and multi-threaded computing environments for various executable units running on a computer system to share data by reading and writing data structures residing in shared memory. Sharing data in this manner provides an efficient mechanism for threads to communicate information with one another.
A common problem associated with using data structures in shared memory is managing multiple simultaneous requests to access the data structures and ensuring that accesses to the data are atomic. Guaranteeing atomic access is important because it ensures that the data structure is completely updated before another thread attempts to use the data. As an example, consider a data structure that is 32 bytes long. Without atomic access, one thread may have updated 16 bytes of the data structure when a second thread reads the data structure. The reading thread will read a corrupt version of the data structure, because the first 16 bytes will be new data while the last 16 bytes will be old data.
Atomic access to a data structure can be guaranteed by the hardware when the data structure meets size and alignment restrictions imposed by the particular hardware (typically the size of a machine word or floating point number). Atomic access cannot be guaranteed by the hardware for data structures that do not meet these restrictions. For example, on the Intel IA32 and IA32 compatible architectures, a data structure can only be read atomically by the hardware if it is 64 bits or smaller. In addition, a 64-bit data structure must be aligned on a 64 bit memory boundary and 32 bit data structures must be aligned on a 32 bit memory boundary to be read atomically.
In order to allow for atomic reads of data structures larger than that supported by the hardware, previous systems have provided software mechanisms to guarantee atomic reads of data structures. One such system involves the use of a lock mechanism. In systems using a lock, a thread that requires access to a shared data structure first acquires a lock on the data structure, typically using a function provided by the operating system. The process then updates the data structure. After the data structure is updated, the requesting thread releases the lock. Other threads that require access to the data structure may also attempt to acquire a lock on the data structure. If an attempt occurs while another thread has the data structure locked, the attempt will fail, and the requesting thread will block or wait until the lock becomes available.
Two types of locks are typically provided, exclusive locks and shared locks. An exclusive lock is used by threads that are writing a data structure. The writing thread has exclusive access during the lock period, no other thread may read from or write to the data structure. A shared lock is typically used by a thread that is reading a data structure. A shared lock allows other threads to read the data structure, but does not allow any thread to write to the data structure while the shared lock is in effect.
While software locks allow exclusive and atomic access to data structures, locks are expensive in terms of CPU (Central Processing Unit) and memory resources. The locking mechanisms are routines built around simpler data structures that can be atomically updated via hardware or firmware mechanisms. In addition to the overhead involved in the software used to implement the lock, the accesses to the lock data structures can cause pipeline stalls, poor memory-bus utilization, and cache memory misses. The problems listed above can occur regardless of whether the lock is an exclusive or shared lock. In addition, the problems listed above are compounded when a large number of threads need to access the same shared data structure.
Therefore there is a need in the art for a way to atomically access a data structure that is more efficient than the software lock mechanism used in previous systems.