A modern computer system has at least a microprocessor and some form of memory. Generally, the processor processes retrieves data stored in the memory, processes/uses the retrieved data to obtain a result, and stores the result in the memory.
One type of computer system uses a single processor to perform the operations of the computer system. In such a single processor (or “uniprocessor”) computer system, incoming memory requests to memory occur serially. However, as described below with reference to FIG. 1, in a computer system that uses multiple processors at least partly in order to increase data throughput, due to parallel processing (i.e., simultaneous processing by two or more processors), memory shared by multiple processors may receive multiple memory requests that overlap in both time and space.
FIG. 1 shows a typical multiprocessor system (100). In FIG. 1, multiple processors (102, 104) share a memory (106) formed of numerous individual memory locations. An important design consideration in shared memory multiprocessor systems involves balancing work loads among processors. When a particular processor finds that it lacks the resources to perform a particular thread of work, the processor may obtain necessary resources from a processor that has such resources available. Such a technique is known and referred to as a “work-stealing” technique.
In a typical work-stealing technique such as, for example, that described in “Thread Scheduling for Multiprogrammed Multiprocessors” by N. Arora et al., each process maintains its own pool of ready threads from which the process obtains work resources. If the pool of a particular process becomes empty (due to, for example, heavy work demand on the process), that process becomes a “thief” and steals a thread from the pool of a “victim” process chosen at random as discussed below with reference to FIG. 2.
As shown in FIG. 2, a pool of threads (200) for a process is maintained with a fixed-size double-ended memory queue (or deque (202)), which has a top index that indexes the top thread and a variable bottom index that indexes the deque location below the bottom thread. Further, the deque has an array pointer that points to an active array of the deque. In general, the typical work-stealing technique involves a collection of deque data structures as shown in FIG. 2, where a local process performs pushes and pops on the “bottom” end of its deque and a thief process perform a pop on the “top” end of a victim process's deque. A pop operation is also referred to as a “removal-type operation.”
Further, those skilled in the art will recognize that for n processes and a total allocated memory size m, each deque may have up to a memory size of m/n. Accordingly, designers often have to implement costly mechanisms to manage deque overflow.
To obtain work, i.e., to obtain a thread, a process pops a ready thread from the bottom of its deque and commences executing that thread. The process continues to execute that thread until the thread either blocks or terminates, at which point the process returns to its deque to obtain another ready thread. During the course of executing a thread, if a new thread is created or a blocked thread is unblocked, the process pushes that thread onto the bottom of its deque. Alternatively, the process may preempt the thread it was executing, push that thread onto the bottom of its deque, and commence executing the newly available ready thread. Those skilled in the art will recognize that as long as the deque of a process is non-empty, the process manipulates its deque in a last-in-first-out (LIFO) manner.
If a process finds that its deque is empty when the process attempts to obtain work by popping a thread off the bottom of its deque, the process becomes a thief. In this case, the thief process picks a victim at random and attempts to “steal” work, i.e., obtain a thread, by removing the thread at the top of the deque belonging to the victim process. If the deque of the victim process is empty, the thief process picks another victim process and tries to steal work again. The thief process repeatedly attempts to steal work until the thief process finds a victim process that has a deque that is non-empty, at which point, the thief process “reforms” (i.e., ceases to be a thief) and commences work on the stolen thread as discussed above. Those skilled in the art will recognize that because work-stealing takes place at the top of a victim process's deque, work-stealing operates in a first-in-first-out (FIFO) manner.
When a thief process and a victim process concurrently attempt to obtain the same thread from the victim process's deque, a synchronization operation must be invoked to ensure proper operation. This scenario is detected by examining the gap between the top and bottom indexes. If the indexes are “too close,” a synchronization operation using known non-blocking primitives such as Compare&Swap or Load-Linked/Store-Conditional may be invoked.
Due to the fixed-size memory space dedicated to each process in a typical work-stealing technique, applications that use the work-stealing technique (e.g., garbage collection) implement specific blocking mechanisms to handle overflow situations. One approach used to lessen the frequency of overflow handling involves resetting top and bottom to index the beginning of the deque every time an empty deque condition is detected. However, although such a reset operation may lessen the amount of times overflow occurs, costly mechanisms to manage overflow are still needed for those times that overflow still occurs.