The present invention relates generally to computer systems, and more particularly to techniques for providing a desired quality of service (QoS) for an application running in a computer system.
In a typical computer system, multiple applications may contend for the same physical resources, such as central processing unit (CPU), memory, and disk or network bandwidth. An important goal for an operating system in such a computer system is therefore to schedule requests from different applications so that each application and the system as a whole perform well.
The resource management techniques used in conventional time-sharing operating systems often achieve acceptably low response time and high system throughput for many different types of time-sharing workloads. Examples of conventional time-sharing operating systems include Unix, as described in, e.g., M. McKusick et al., xe2x80x9cThe Design and Implementation of the 4.4 BSD Operating System,xe2x80x9d Addison Wesley Pub. Co., Reading, Mass., 1996, and Windows NT, as described in, e.g., H. Custer, xe2x80x9cInside Windows NT,xe2x80x9d Microsoft Press, 1993.
However, several trends make the resource management techniques of these and other conventional time-sharing operating systems increasingly inappropriate. First, many workloads now include real-time applications, such as multimedia. Unlike time-sharing applications, real-time applications generally must have their requests processed within certain performance bounds, e.g., require a certain minimum throughput. In order to support real-time applications correctly under arbitrary system load, the operating system must perform admission control and offer QoS guarantees. In other words, the operating system should admit a request only if the operating system has set aside enough resources to process the request within the specified performance bounds.
Second, even for purely time-sharing workloads, the trend toward distributed client-server architectures increases the importance of fairness, i.e., of preventing certain clients from monopolizing system resources. The fairness of conventional time-sharing systems can often be inadequate. For example, time-sharing systems typically cannot isolate the performance of a World Wide Web (Web) site from that of other Web sites hosted on the same system. If one of the sites becomes very popular, the performance of the other sites may become unacceptably and unfairly poor.
Finally, the above-noted trend toward client-server architectures also makes it necessary to manage resources hierarchically, i.e., recursively allowing each client to grant to its servers part of the client""s resources. For example, Web servers and other user-level servers often need mechanisms for processing client requests with specified QoS and/or fairness bounds. However, time-sharing operating systems usually do not provide such mechanisms.
These and other drawbacks associated with resource management techniques in conventional time-sharing operating systems have led to the recent development of a number of new techniques. For example, J. Bruno, E. Gabber, B. xc3x96zden and A. Silberschatz, xe2x80x9cThe Eclipse Operating System: Providing Quality of Service via Reservation Domains,xe2x80x9d in Proceedings of Annual Tech. Conf., USENIX, June 1998, pp. 235-246, describes Move-to-Rear List Scheduling (MTR-LS), a new CPU scheduling algorithm with demonstrated throughput, delay, and fairness guarantees. MTR-LS is an example of a so-called proportional share scheduler.
Other recently developed proportional share schedulers are described in, e.g., D. Stiliadis and A. Varma, xe2x80x9cFrame-Based Fair Queuing: A New Traffic Scheduling Algorithm for Packet-Switched Networks,xe2x80x9d Tech. Rep. UCSC-CRL-95-39, Univ. Calif. Santa Cruz, July 1995; J. Bennet and H. Zhang, xe2x80x9cWFQ: Worst-Case Fair Weighted Fair Queueing,xe2x80x9d in Proceedings of INFOCOM""96, IEEE, March 1996, pp. 120-128; J. Bennet and H. Zhang, xe2x80x9cHierarchical Packet Fair Queueing Algorithms,xe2x80x9d in Proceedings of SIGCOMM""96, ACM, August 1996; P. Goyal, X. Gao and H. Vin, xe2x80x9cA Hierarchical CPU Scheduler for Multimedia Operating Systems,xe2x80x9d in Proceedings of OSDI""96, USENIX, October 1996, pp. 107-121; and I. Stoica, H. Abdel-Wahab, K. Jeffay, S. Baruah, J. Gehrke and C. G. Plaxton, xe2x80x9cA Proportional Share Resource Allocation Algorithm for Real-Time, Time-Shared Systems,xe2x80x9d in Proceedings of Real Time Systems Symp., IEEE, December 1996.
A major shortcoming of the above-mentioned proportional share schedulers is that they do not prescribe satisfactory solutions to many problems that arise in their adoption in an operating system. First, it is desirable that an operating system provide a uniform application programming interface (API) for all of the system""s schedulers and resources. In the case of proportional share schedulers, this should be a resource reservation API, which allows applications to reserve for exclusive use portions of each resource. However, several of the above-mentioned proportional share schedulers were proposed without an API, since they were not implemented and were evaluated only analytically or in simulations. Other proportional share schedulers were implemented, but used only an API limited to a given scheduler and resource.
Second, it is desirable that the resource reservation API be easy to integrate with the conventional API of existing operating systems and allow resource reservations to be used in conventional interfaces. For example, in Unix-derived systems, a resource reservation API that allows disk or network reservations to be used in conventional read and write calls may advantageously reduce the number of modifications necessary in existing applications for the applications to benefit from proportional share scheduling. However, simply adding resource reservations to conventional objects such as files or sockets does not provide correct sharing semantics. Those objects can be shared by different users. If a user""s resource reservation is simply added to a shared object, other users may inappropriately use the first user""s resource reservation. None of the above-mentioned proportional share schedulers properly define how sharing is handled.
Third, the resource reservation API should define how a parent process running on the operating system can limit the resource reservations used by its children processes. This is necessary for system protection and may be useful also when a server process spawns a child process to handle a given client""s request. The above-mentioned proportional share schedulers do not propose how this would be accomplished.
Finally, a garbage collection mechanism is necessary for resource reservations. Such a mechanism automatically reclaims reserved resources when they no longer are needed. Without such mechanism, a process that terminates abnormally while holding a resource reservation would cause the reserved resource to become permanently unavailable to other processes. None of the above-mentioned proportional share schedulers propose a solution to this problem.
As is apparent from the above, many emerging applications require QoS guarantees from the operating system. Although conventional proportional share schedulers can provide QoS guarantees, the above-identified problems must be solved before such schedulers can be adopted in operating systems.
The invention provides techniques for ensuring a desired quality of service (QoS) for an application running on an operating system. An illustrative embodiment of the invention allows applications to create resource reservations using an application program interface (API) in the form of a hierarchical file system referred to herein as /reserv. The API has the advantage of applying uniformly to multiple proportional share schedulers and resources, e.g., CPU, physical memory, and disk and network bandwidth. The API represents resource reservations by directories under /reserv and includes a separate directory for each independently scheduled physical resource of the computer system. The parent of a resource reservation is either /reserv or another reservation for the same resource. Each resource reservation includes a share file that specifies the minimum amount of resources that the reservation receives from its parent and the weight with which a reservation shares its parent""s resources. A resource reservation is referred to as an internal reservation if it can have children, and is referred to as a queue if it cannot have children.
The invention allows a process to associate a reference to an object with a queue. The queue may be, e.g., a disk or network queue; the reference is possibly private to the process, e.g. a file descriptor; and the object is possibly shared with other processes, e.g., a file or socket. Thus, the invention preserves the protected use of a queue even when the queue is used in requests on shared objects.
In accordance with another aspect of the invention, when a process uses the operating system""s conventional API and an object reference to issue a request, the operating system internally tags the request with the identifier of the queue that is associated with that object reference. Schedulers use such queue identifier to place each request in the corresponding queue. A proportional-share scheduler apportions the respective resource to each queue in proportion to the queue""s share. Advantageously, the invention allows reservations to be used even when the application uses the operating system""s conventional API. Consequently, the invention minimizes the number of modifications that may be necessary in existing applications for them to be able to benefit from proportional-share scheduling.
The invention also includes a mechanism whereby a parent process may limit the resource reservations used by its children processes, and a mechanism for garbage-collecting resource reservations when they are no longer needed.
Advantageously, the invention allows selected applications to isolate their performance and the performance of their corresponding client(s) from CPU, memory, disk, or network interface overloads caused by other applications. Such a capability is becoming increasingly important for real-time, multimedia, Web, and distributed client-server applications.