From the very beginning of the computer industry, there has been a constant demand for improving the performance of systems in order to run application faster than before, or for running applications that can produce results in an acceptable time frame. One method for improving the performance of computer systems is to have the system run processes or portions of a process (e.g. a thread) in parallel with one another on a system having multiple processors.
A queue is a commonly used data structure used for thread communication and synchronization. Threads can put a data item into the queue (referred to as enqueuing) and other threads can get a data item from the queue (referred to as dequeuing) in a first-in-first-out (FIPO) manner, thus the data are communicated between threads and the activities of the involved threads may be coordinated. Multithreaded applications such as streaming computation, packet processing computation, data-flow computation, etc. employ queues extensively. For example, a multithreaded network application may have one thread process the packet header, then passes the packet through a queue to another thread for packet payload processing.
A major potential performance bottleneck of a queue implementation involves providing concurrent access control, i.e., guaranteeing the correctness of multithreaded concurrent accesses to the queue entries while still maintaining the FIFO property. In existing implementations, the mechanisms used may include: 1) use of a mutually exclusive or atomic critical section, typically implemented with locks or speculations; 2) multiple memory access atomic instructions for load-modify-store, or more generally atomic instructions for multiple-memory-operations, such as atomic compare-exchange or atomic swap; or 3) thread scheduler coordination, such as the task ready queue in a Linux kernel. Unfortunately, each of these mechanisms may introduce significant performance penalties in order to guarantee correct concurrent access to a queue.