1. Field of the Invention
The present invention relates to the distribution of code thread instances to respective processors in a multi-processor digital computing system for execution of the code thread instances.
2. Description of the Related Art
With the advent of cache memory, there has been an advantage to coupling multiple processors to a shared memory for general-purpose applications. By providing a dedicated cache memory for each processor, each processor can operate at nearly 100% of the time by accessing cache memory most of the time and accessing the shared memory during a small percentage of the time. The shared memory can also be used for communication between the processors.
Since the introduction of the Intel PENTIUM (Trademark) microprocessor, the caches and memory management circuitry have been integrated onto commodity processor chips together with special machine instructions to facilitate the construction of multi-processor systems. See, for example, the Intel MultiProcessor Specification, Version 1.4, May 1997. More recently, the cost of these commodity processor chips has dropped relative to the cost of other computer system components so that general-purpose systems using commodity processors can be expanded at reasonable incremental cost by substituting multiple processor circuit boards where single processor circuit boards were previously used. However, the cost and delay of conversion of the software for the single processor circuit boards for efficient execution on the multiple processor circuit boards has hindered the substitution of the multiple processor circuit boards.
For some application software designed for multi-tasking systems, it is relatively easy to convert the software for the single processor circuit boards for execution on a multiple processor system. In such applications, the software is subdivided into code threads that are executed to perform independent tasks. In response to a user request to execute an application, a descriptor for a code thread for a task of the application is placed on a task queue. At any given time, the task queue may contain tasks for a multiplicity of applications. A task manager in the computer's operating system timeshares processor execution of the tasks on the task queue. The task manager may change the priorities of the tasks on the task queue, and execution of a task may be interrupted in order to execute a higher priority task. In order to resume an interrupted task, each task on the task queue has a respective execution context including the processor's register contents and local variable values at the time of interruption. Each task on the task queue also has a particular state, such as not yet executed, undergoing execution, or suspended for further execution. A task may be suspended for execution, for example, when the task is waiting for a call-back from an input-output device signaling completion of an input-output operation, or when the task is a repetitive task and waiting for its next time of performance.
For the execution of applications having independent tasks, it is relatively easy to execute the code threads on a multi-processor system. Each code thread can be executed on any of the processors, and when a processor is finished with a task, the processor can inspect the task queue to find and begin execution of the next task ready for execution. In general, however, there may be dependencies between the code threads of an application. The operating system or task manager itself may have code threads that have dependencies. Moreover, if each processor in the multi-processor system simply begins execution of the next task ready for execution, then some of the capabilities of a multiprocessor system cannot be realized, such as the parallel processing of a task by simultaneous execution on all of the processors. Further problems arise if certain hardware or software functions are dedicated to particular processors in the multiprocessor system.
Dependencies among code threads and between code threads and functions of particular processors in a multi-processor system have been dealt with by additional overhead in the task manager. The task manager may provide capabilities for shared and exclusive task locking that attempts to avoid the so-called “spin locks” at the processor level. For tasks that are not conflicting, the task manager may assign a task to a selected one of the processors based on load balancing considerations. For example, the task manager may attempt to determine or monitor a desired or actual level of multi-tasking activity and assign each task to a processor for which each task has an affinity or at least neutrality in terms of relative execution speed. Unfortunately, task manager overhead has a significant impact on execution speed, and a supervisory system may produce results that the programmer might not anticipate. What is desired is a solution providing general applicability, minimal overhead, ease of implementation, and predictable results.