1. Field of the Invention
The present invention is directed to parallel execution in computer processes. It particularly concerns the manner in which different execution threads terminate their work on parallel operations.
2. Background Information
Modern computer systems provide for various types of concurrent operation. A user of a typical desktop computer, for instance, may be simultaneously employing a word-processor program and an e-mail program together with a calculator program. The user's computer could be using several simultaneously operating processors, each of which could be operating on a different program. More typically, the computer employs only a single main processor, and its operating-system software causes that processor to switch from one program to another rapidly enough that the user cannot usually tell that the different programs are not really executing simultaneously. The different running programs are usually referred to as “processes” in this connection, and the change from one process to another is said to involve a “context switch.” In a context switch one process is interrupted, and the contents of the program counter, call stacks, and various registers are stored, including those used for memory mapping. Then the corresponding values previously stored for a previously interrupted process are loaded, and execution resumes for that process. Processor hardware and operating-system software typically have special provisions for performing such context switches.
A program running as a computer-system process may take advantage of such provisions to provide separate, concurrent “threads” of its own execution. In such a case, the program counter and various register contents are stored and reloaded with a different thread's value, as in the case of a process change, but the memory-mapping values are not changed, so the new thread of execution has access to the same process-specific physical memory as the same process's previous thread.
In some cases, the use of multiple execution threads is merely a matter of programming convenience. For example, compilers for various programming languages, such as the Java programming language, readily provide the “housekeeping” for spawning different threads, so the programmer is not burdened with handling the details of making different threads' execution appear simultaneous. In the case of multiprocessor systems, though, the use of multiple threads has speed advantages. A process can be performed more quickly if the system allocates different threads to different processors when processor capacity is available.
To take advantage of this fact, programmers often identify constituent operations with their programs that particularly lend themselves to parallel execution. When program execution reaches a point where the parallel-execution operation can begin, it starts different execution threads to perform different tasks within that operation.
Now, in some parallel-execution operations the tasks to be performed can be identified only dynamically; that is, some of the tasks can be identified only by performing others of the tasks, so the tasks cannot be divided among the threads optimally at the beginning of the parallel-execution operation. Such parallel-execution operations can occur, for instance, in what has come to be called “garbage collection,” which is the automatic reclamation of dynamically allocated memory. Byte code executed by a Java virtual machine, for instance, often calls for memory to be allocated for data “objects” if certain program branches are taken. Subsequently, a point in the byte-code program's execution can be reached at which there is no further possibility that the data stored in that dynamically allocated memory will be used. Without requiring the programmer to provide specific instructions to do so, the virtual machine executing the byte code automatically identifies such “unreachable” objects and reclaims their memory so that objects allocated thereafter can use it.
The general approach employed by the virtual machine's garbage collector is to identify all objects that are reachable and then reclaim memory that no such reachable object occupies. An object is considered reachable if it is referred to by a reference in a “root set” of locations, such as global variables, registers, or the call stack, that are recognized as being inherently reachable. An object is also reachable if it is referred to by a reference in a reachable object.
So reachable-object identification is a recursive process: the identification of a reachable object can lead to identification of further reachable objects. And, if every reachable object so far identified is thought of as representing a further task, namely, that of identifying any further objects to which it refers, it can be seen that parts of the garbage-collection process include tasks that are only dynamically identifiable. If those tasks are properly programmed, they can be performed in an essentially parallel manner. Specifically, the initial, statically identifiable members of the root set can be divided among a plurality of threads (whose execution will typically be divided among many processors), and those threads can identify reachable objects in parallel.
Now, a point may be reached at which every remaining identified reachable object has already been claimed by a thread for processing. In such a situation, it may initially appear desirable for other threads to proceed to the next operation: all of the parallel-execution operation's tasks have been completed. But allowing them to do so could result in highly suboptimal performance. Since reachable-object identification occurs dynamically, the threads still performing tasks of the parallel-execution operation could end up identifying a large number of further tasks, and they would be left to perform those tasks alone. So it is important to be able to detect the fact that a parallel-execution operation may not be completed, even when none of its tasks is currently available for a thread to perform.