The present invention generally relates to process dispatch processing.
In a computer, an operating system scheduler plays the role of carrying out time sharing of processor resources among a plurality of processes. An operating system has a scheduler, and a run queue is the basic data structure used in process scheduling. The run queue is a queue for keeping processes that are ready to run on a processor.
Among the scheduling processes of the scheduler, a process for switching the process to be run on the processor is specifically called a process dispatch process. The component of the operating system in charge of the process dispatch process is specifically called a dispatcher.
The dispatcher is called when switching the process to be run on the processor. When the dispatcher is called, the dispatch process is run. The dispatcher inserts the current process into the run queue, removes the process with the highest priority from within the run queue, and runs the removed process on the processor.
The run queue has two modes. One is a local run queue that provides an individual run queue for each processor. The other is a shared run queue that shares a single run queue with a plurality of processors. A shared run queue that is shared by all of the processors in a computer is specifically called a global run queue.
In general, the operating system mainly uses the local run queue. When using the local run queue, process scheduling operates independently among a plurality of processors. Therefore, there is no competition among the plurality of processors, making it possible to achieve high scalability relative to the number of processors. Since a process is bind to a single processor, the cache efficiency effect is also great when each processor has a separate cache.
However, in recent years, a multi-core processor in which a plurality of processor cores is mounted in a single socket has become the standard. In a multi-core processor, a large-capacity cache is generally provided in each socket, enabling the adoption of a structure in which a plurality of processors inside the socket shares the large-capacity cache. Therefore, to make each process remain in a socket is adequate for cache efficiency.
Also, in a computer, load balance among processors and rate control of resource allocation among processors are both important. When the local run queue is used, process scheduling is performed independently among the processors, and the load is not shared among the processors. Therefore, the scheduler must detect load bias among the processors and unbalanced resource allocation among virtual computers on its own, and dynamically make adjustments by explicitly migrating processes among the processors.
However, the adjustments are complex and difficult, and the processing cost is high. Complexity and costs have increased in line with increases in the number of processors.
In response to these problems, when a shared run queue is provided in units of processor sockets and sharing is performed by the processors in the socket, cache efficiency can reach practically the same level as that of the local run queue. In addition, since the load is shared by the processors inside the socket, the socket can be regarded as one big processor, and inter-processor load balancing and rate control for inter-processor resource allocation can be performed naturally without explicitly migrating processes among the processors.
When the shared run queue is used, an effect whereby cache efficiency is practically on a par with that of the local run queue, and load balancing and/or rate control is simplified can be expected.
However, as disclosed in the section 5.4 entitled Traditional UNIX Scheduling on pages 117 to 121 in UNIX Internals: The New Frontiers by U. Vahalia, Prentice Hall, in a widely-known conventional dispatch processing method, the expected effect cannot be achieved due to the likelihood of deadlock occurring and a lack of adequate solutions therefor.
U.S. Pat. No. 6,728,959 discloses a method for reducing the overhead of competition for access to a shared run queue in process dispatch processing when a shared run queue is used. In U.S. Pat. No. 6,728,959, a shared run queue shared by a plurality of processors and a local run queue for each processor are provided, and in process dispatch processing, the local run queue is accessed as much as possible, thereby reducing accesses to the shared run queue.
However, U.S. Pat. No. 6,728,959 does not propose a method for directly improving process dispatch processing when using the shared run queue.