1. Field of the Invention
The present invention is directed to multi-threaded operation in computer systems. It particularly concerns how to allocate tasks among different threads.
2. Background Information
Modern computer systems provide for various types of concurrent operation. A user of a typical desktop computer, for instance, may be simultaneously employing a word-processor program and an e-mail program together with a calculator program. The user's computer could be using several simultaneously operating processors, each of which could be operating on a different program. More typically, the computer employs only a single main processor, and its operating-system software causes that processor to switch from one program to another rapidly enough that the user cannot usually tell that the different programs are not really executing simultaneously. The different running programs are usually referred to as “processes” in this connection, and the change from one process to another is said to involve a “context switch.” In a context switch one process is interrupted, and the contents of the program counter, call stacks, and various registers are stored, including those used for memory mapping. Then the corresponding values previously stored for a previously interrupted process are loaded, and execution resumes for that process. Processor hardware and operating-system software typically have special provisions for performing such context switches.
A program running as a computer-system process may take advantage of such provisions to provide separate, concurrent “threads” of its own execution. In such a case, the program counter and various register contents are stored and reloaded with a different thread's value, as in the case of a process change, but the memory-mapping values are not changed, so the new thread of execution has access to the same process-specific physical memory as the same process's previous thread.
In some cases, the use of multiple execution threads is merely a matter of programming convenience. For example, compilers for various programming languages, such as the Java programming language, readily provide the “housekeeping” for spawning different threads, so the programmer is not burdened with handling the details of making different threads' execution appear simultaneous. In the case of multiprocessor systems, though, the use of multiple threads has speed advantages. A process can be performed more quickly if the system allocates different threads to different processors when processor capacity is available.
To take advantage of this fact, programmers often identify constituent operations with their programs that particularly lend themselves to parallel execution. When program execution reaches a point where the parallel-execution operation can begin, it starts different execution threads to perform different tasks within that operation.
Now, in some parallel-execution operations the tasks to be performed can be identified only dynamically; that is, some of the tasks can be identified only by performing others of the tasks, so the tasks cannot be divided among the threads optimally at the beginning of the parallel-execution operation. Such parallel-execution operations can occur, for instance, in what has come to be called “garbage collection,” which is the automatic reclamation of dynamically allocated memory. Byte code executed by a Java virtual machine, for instance, often calls for memory to be allocated for data “objects” if certain program branches are taken. Subsequently, a point in the byte-code program's execution can be reached at which there is no further possibility that the data stored in that dynamically allocated memory will be used. Without requiring the programmer to provide specific instructions to do so, the virtual machine executing the byte code automatically identifies such “unreachable” objects and reclaims their memory so that objects allocated thereafter can use it.
The general approach employed by the virtual machine's garbage collector is to identify all objects that are reachable and then reclaim memory that no such reachable object occupies. An object is considered reachable if it is referred to by a reference in a “root set” of locations, such as global variables, registers, or the call stack, that are recognized as being inherently reachable. An object is also reachable if it is referred to by a reference in a reachable object.
So reachable-object identification is a recursive process: the identification of a reachable object can lead to identification of further reachable objects. And, if every reachable object so far identified is thought of as representing a further task, namely, that of identifying any further objects to which it refers, it can be seen that parts of the garbage-collection process include tasks that are identifiable only dynamically. If those tasks are properly programmed, they can be performed in an essentially parallel manner. Specifically, the initial, statically identifiable members of the root set can be divided among a plurality of threads (whose execution will typically be divided among many processors), and those threads can identify reachable objects in parallel.
Now, each thread could maintain a list of the tasks that it has thus identified dynamically, and it could proceed to perform all tasks that it has thus identified. But much of the advantage of parallel processing may be lost if each thread performs only those tasks that it has itself identified. That is, one thread may encounter a number of objects that have a large number of references, while others may not. This leaves one thread with many more tasks than the others, so there would be a significant amount of time during which the other threads will have finished all of their tasks in the parallel-execution operation, while another thread still has most of its tasks yet to be performed.
As a consequence, such parallel-execution operations are usually so arranged that each thread can perform tasks that other threads have identified. Conventionally, though, this has usually meant that access to the queues that contain identifiers of those tasks needs to be made “thread safe.” Thread safety in most cases can be afforded only by performing atomically sets of machine instructions that could normally be performed separately. Particularly in the multiprocessor systems in which parallel execution is especially advantageous, performing such “atomic” operations is quite expensive. So the need for thread safety tends to compromise some of a multiprocessor system's advantages.