In Symmetric Multi-Processor (SMP) systems, it is desirable for long-running processes to continue to run on the same processor so that their working set of data and text will remain in a single processor's cache. When a process migrates from one processor to another, it must re-load its working set of text from main memory (provided that they are not already resident in the new processor's cache) and it must transfer its working set of modified data from the old processor's cache into the new processor's cache. These loading and transfer operations can be very time consuming depending on the amount of data to be transferred, the bandwidth and utilization of the processor-memory bus, and the access latency of main memory. The Operating System's process scheduler is responsible for determining which processes run on which processor at any given point in time. In order to maintain fair access to the system by all processes, the scheduler maintains a separate priority and time-slice length for each process. In general, the higher priority processes are run before the lower priority processes, but they are also given a shorter "time-slice" of the processor in order to maintain fair access to the system. The scheduler uses the Run-Time-Invariant (RTI) principle, which states that at any given point in time, the processes running on the processor's will be the highest priority system-wide runnable processes, to ensure system-wide fairness for all processes.
In order to minimize process migration between processors, the process scheduler can be enhanced to give individual processes "affinity" to a particular processor. One such method of scheduling processes is disclosed in U.S. Pat. No. 5,185,861 to Valencia, issued Feb. 9, 1993, and entitled "Cache Affinity Scheduler". In that patent, an affinity scheduler for a multi-processor computer system is disclosed. The affinity scheduler allocates processors to processes and schedules the processes to run based upon the bases of priority and processor availability. The scheduler uses the estimated amount of cache context to decide which run queue a process is to be enqueued. U.S. Pat. No. 5,185,861 is hereby incorporated by reference.
Difficulties can arise when processes are given strict affinity to processors, however. For instance, processes of differing priority levels may become unevenly distributed among processors, whereby some processors may have a group of high priority processes competing for runtime, while other processors have a group of lower priority processes competing for runtime. In this situation, the system wide RTI principal of executing the highest priority runnable processes will be violated as not all of the highest priority processes are given preference over lower priority processes. Most implementations of affinity schedulers will tolerate small violations of the RTI principal in order to gain the performance improvements provided by giving processes affinity to processors. Instead of implementing a system wide RTI policy, these affinity schedulers will implement a per-processor RTI policy and attempt to achieve system wide fairness by employing a load balancing algorithm.
In order to ensure a uniform distribution of processes to processor's, traditional affinity schedulers utilize three load-balancing techniques:
1. New processes are placed on a global run queue. Since all processor's periodically check the global queue, the processors that are more idle will tend to check the global queue more often, achieving load balancing for new processes. PA1 2. Processes that have not run on a given processor for some period of time are "aged" via some tunable metric (such as number of context switches or clock ticks since the time the process was last run). When the age of a process exceeds some threshold, the process loses its affinity to that processor, since it is assumed that its working set of text and data have been replaced by those of other processes. In this case, the process is put back on the global queue when it is ready to run again. PA1 3. Dynamic loading problems are detected by periodically examining the run queue lengths of each processor and stealing processes from processors having run queue lengths significantly greater than the average, system wide run queue length.
There are two problems with the load balancing techniques described above. First, the aging threshold discussed in item (2) above is a fixed value and as such cannot take into account that different processes can have different size working sets. Therefore, any reasonably small value will cause a performance degradation for classes of processes with large working sets. Furthermore, a fixed value does not take into account the fact that cache sizes may vary from system to system or even from processor to processor. Second, the queue-length based technique used for dynamic load balancing described in item 3 above does not take into consideration the "weight" of the processes in the run queue, wherein the weight of a process relates to the intensity of the processor use required by the process.