1. Field of Invention
The present invention relates generally to scheduling resources on computer systems, and more particularly, to systems and methods for scheduling resources on multiple servers.
2. Background of the Invention Scheduling the execution of processes and the use of resources in a computer is a fundamental part of operating system design. In conventional computer operating systems, there are various system resources available to processes, such as the processor itself, disk drives, memory, network connections, and so forth. An operating system includes a scheduler that schedules individual processes for execution, and further that allocates resources to various ones of the processes. The scheduler may operate in conjunction with a quality of service manager that provides quality of service (QOS) guarantees to individual processes or tasks. A quality of service guarantee typically ensures that a given process will have a certain percent share of some resource""s availability. For example, a process may be given a 30% QOS guarantee for the processor, meaning that it will receive at least 30% of the available processor cycles (or some other metric of processor availability). Research in scheduling algorithms has addressed scheduling on both single processor and multi-processor systems. However, conventional scheduling algorithms typically address scheduling efficiency when considering processes individually.
In many instances, it may be necessary to schedule otherwise distinct processes in a coordinated fashion. More particularly, distinct processes may be related to a particular entity for which it is desired to provide quality of service guarantees for all of the entity""s processes.
For example, consider a server computer providing services, such as file transfers, database queries, packet routing, to a number of client computers. Each client computer communicates with the server computer more or less concurrently, each providing an ongoing stream of requests and data in the form of packets that need to be processed by the server. Normally, each individual request of a client would be processed by the server as an individual process, and scheduled accordingly. However, it may be desirable to provide quality of service guarantees with respect to each of the client computers, and not merely with respect to individual processes. This would be desirable to ensure that each client obtains a particular quality of service from the server computer. In particular, this is desirable in order to differentially price the level of service provided to different clients based upon the quality of service they are guaranteed. In this fashion, client computers who are guaranteed a higher quality of service pay a premium price. For example, a company paying for a high QOS guarantee could receive higher priority processing for all of its database queries, even though each individual query is a separate process. Conventionally, providing differential quality of service and pricing to different client computers has required that separate server computers be provided for each client computer. This approach is costly, and thus makes it further desirable to handle multiple clients using a multi-processor system or the like.
A task stream is a set of tasks or process that are all issued on behalf of the same entity and that all share a common resource allocation or quality of service requirement. Examples of task streams are flows of packets that share a resource allocation, and sets of processes that jointly share a machine""s resources such as CPU, memory, or disk bandwidth. For example, a task stream may be a flow of packets, where all the packets for a particular client request constitute a task stream.
Several task streams may want to use a single resource at the same time. In this case, the task streams are placed in a queue of task streams, awaiting their individual turns. It is commonly desired to provide a task stream with a guaranteed allocation of a system resource. For instance, it may be desired to guarantee that all xe2x80x98high-priorityxe2x80x99 processes receive at least 10% of a machine""s CPU.
Algorithms to schedule a queue of task streams on a single processor to give each task stream a guaranteed share of processor resources are well known in the art. With multiprocessor systems becoming widespread, it is now feasible to process a queue of task streams using. multiple servers in parallel. However, known single processor scheduling algorithms do not extend to the multiprocessor case.
Accordingly, it is desirable to provide a scheduling system and methodology that guarantees a quality of service to various task streams being served by multiple processors (or other resources) in parallel.
The present invention overcomes the limitations of conventional scheduling algorithms by providing and enforcing quality of service guarantees to individual task streams being served by multiple, parallel resources. One embodiment operates in a computer system having a number of servers. A server may be a physical processor, or a server process executing on a processor. Each server managed or controls some system resource, such as CPU, memory, disk, network interface card, and the like. In either case, multiple servers ate available to handle tasks that are being received for processing.
Each task is associated with a particular task stream preferably by some form of task stream identification number, process ID, packet attribute, or the like. As noted above, a task stream may be associated with a an entity, such as a particular client computer or application which generates or is related to otherwise independent tasks or processes having a common quality of service requirement. Each task stream has a previously determined quality of service requirement that it will receive for all of its tasks. For example, a task stream may have a 30% quality of service requirement, so that it receives 30% of the available capacity of system resources, such as CPU time. Each task stream may have a different quality of service requirement.
As a task is received into the system, the task stream with which it is associated is determined. Periodically, the task streams are partitioned or allocated among the various servers. The allocation of task streams to each server is done so that the total quality of service requirements of the task streams assigned to each server does not exceed the total availability of the resource that each server manages. For example, the allocation is preferably such that the total quality of service requirements for each server""s resource does not exceed 100% of the resource""s availability. The allocation of task streams to servers may be made by any type of allocation algorithm, such as first fit or best fit, or the like.
Each server executes its task streams according to some scheduling order, which may be determined by any uniprocessor scheduling algorithm. Individual tasks may be either runnable or waiting (blocked). If all of the tasks for one of the servers are waiting, then the server is idle. A runnable task from one of the other (busy) servers with runnable tasks is moved from a busy server to the idle server, where it can be immediately processed. The selection of which task and from which server may be made according to various criteria, allowing for optimization. This feature of moving runnable tasks from busy servers to idle servers is desirable because it helps ensure that each task will receive the quality of service guarantee defined for its associated task stream.
Preferably, if a waiting task on the previously idle server becomes runnable, then the task that was moved to this server is moved back to its originating server. The newly runnable task is executed, as it would have been before blocking. The re-transfer of runnable tasks back to their originating servers is desirable because it prevents tasks that are blocked temporarily from being repeatedly delayed by other tasks that are taken up by the server. This helps further ensure that the quality of service requirements are met for those task streams having blocked tasks, without sacrificing the quality of service provided to the task streams of transferred tasks (which suffer no loss in quality of service by being transferred or transferred back).
One embodiment of the present invention employs a quality of service manager, an arbitrator, and a number of servers, each of which manages a resource. The quality of service manager is responsible for setting quality of service requirements for individual task streams, and storing this assignment information. The quality of service manager receives incoming tasks, and determines from their process ID, task ID, packet ID, or other data which task streams they are respectively associated with, and hence what quality of service requirement is associated with each task.
When a new task is received by the quality of service manager, it queries the arbitrator to determine if there is sufficient resource capacity to meet the quality of service requirements for the task stream associated with the task. The arbitrator maintains a master quality of service table that identifies each server, the task streams originally assigned to each server, and each server""s current total quality of service commitments to the task streams assigned to it. The arbitrator also stores a current assignment table that indicates which tasks are currently assigned to each server; this table is updated anytime a task is transferred from one server to another.
If the task stream cannot be allocated to one of the servers without the total quality of service commitment of that server exceeding the total resource capacity of the resource managed by the server, then the task is denied. Otherwise, the arbitrator provides the quality of service manager with the ID of the server to which the task stream is assigned. The arbitrator updates the master quality of service table to show the assignment of the task stream to the server, and that server""s new total quality of service commitment. Each server also maintains its own quality of service table that records the task D and quality of service requirement for its task streams.
In one embodiment, each server maintains its tasks in a task stream queue, which may include a runnable task queue and a blocked tasked queue. Tasks are initially assigned to the runnable task queue, but when a task blocks, the server moves the task to the blocked queue. When a server detects there are no entries in its runnable queue (it is idle), it informs the arbitrator. The arbitrator selects a runnable task from another server""s runnable queue and moves it to the runnable queue of the idle, first server. The arbitrator also updates the current assignment table to indicate the current assignment of the task to the first server. If a task becomes unblocked on the first server, the first server preempts the transferred task, and informs the arbitrator. The arbitrator determines from the master quality of service table the original server from which the task was transferred, and moves the task back to this originating server from the first server.