1. Field of the Invention
This invention relates to the field of resource allocation mechanisms within computers, especially where the resource is system memory, which must be allocated among different subsystems or applications that may need to compete for use of the resource.
2. Description of the Related Art
The world is full of examples of competition for limited resources. As but one example, arable land is one such limited resource, and there are many different ways to allocate it. According to perhaps the simplest allocation scheme, might makes right, and the person or group with the most force gets the most, or the best, land. One of the main reasons for the existence of governments is to lessen the tendency to violent confrontation that the might-makes-right model usually leads to.
One way to allocate a limited resource such as land is to implement a first-to-arrive wins scheme. Pioneers have often applied this system, as have Oklahomans. Another way is to allocate land using a lottery. The drawback of these schemes is that they reward the quick (or cheaters) or lucky, respectively, but do nothing to ensure that the most productive will get the all the land they can use, or that the needy but hard-working will get enough land for their survival.
In one system of government, the ruler or ruling class (such as the “Party”) claims all land as his/its own, and allocates use of it to subjects according to some scheme. One allocation scheme is simply that each subject gets the same amount of land. In another static scheme, some subjects are “more equal” than others, and are allocated more land because they are for some reason favored by the ruler(s) (usually because they are the rulers, or their relatives or friends or benefactors).
Besides the tendency towards oppression and corruption, the greatest objection to these static, relatively inflexible, centrally planned allocation systems is that they are inefficient—not everyone needs or can use or even wants the amount of land he is allocated, and others who are more productive and ambitious do not get what they want or need. Greater accuracy can usually be achieved by introducing a feedback mechanism. In most situations involving the distribution of a shared resource, the feedback mechanism is typically some form of price, which is determined by the supply of and demand for the resource. Those who want more of the resource must pay for it, and when supply falls short of demand, the price rises until only those who can afford the resource remain in the bidding.
In the classical laissez-faire system, the government does not impose its goals on the market actors but rather simply enforces the decisions made by the market actors themselves. Of course (and many would say, “unfortunately”), those who control governments very often do have their own agendas. They may impose these agendas either directly or indirectly. Direct imposition usually involves edicts backed up by threats of confiscation, imprisonment, or bullets. Indirect imposition usually involves a tax: The actions of those interested in using the shared resource are influenced by penalizing those who do not use it in the way the government wants by being forced to pay a tax. In effect, a tax is used to alter the cost of and thus demand for the resource in a manner unrelated to supply.
Designers of computer systems face problems of resource allocation that are analogous to those that arise in the field of Economics, with the operating system usually playing the role of the government and various applications acting as the subjects. The analogy is not perfect, however: As has been well known since before Egyptian scribes counted individual sheaves of wheat for purposes of taxation, whenever a ruling system made up of humans controls and decides to allocate a resource, it often chooses to enrich itself and increase its own power at the expense of those who would use the resource productively. Even when the ruling elites believe in egalitarianism (for all but themselves), and set out to take a resource from A in order to give it to B, they usually drain off a large part of the transferred resource for their own benefit and for the cost of “administration.” In clear contrast, an operating system in a computer preferably creates as little overhead and waste as possible and tries to reduce delay and maximize the efficient use of the resource by the clients (applications, users, etc.) themselves.
One resource that frequently must be allocated among different competing applications or users is system memory, which is usually (but not necessarily) volatile, but which is much faster than non-volatile storage devices such as disks. For example, a common task of a server is to decide how to allocate the server's system memory to the many clients that are connected to the server at any given time. There are, accordingly, several different known methods for allocating this resource, most of which can be analogized to the economic methods described above. These methods include, for example, first-come first-served, where memory is allocated to the first clients that request it; static partitioning, where a fixed amount of memory is reserved for each client; and algorithms based on reducing the aggregate amount of swapping, which typically lack any way to express the relative importance of individual clients.
Memory is the most prevalent shared space-limited resource, but similar problems arise with respect to the allocation of other resources that are restricted as to time instead of, or in addition to, space. For example, CPU time is a resource, as is access to more than one CPU in a multi-processor architecture.
One known method for allocating a resource among competing clients (such as processes) involves the concept of “shares,” which are also referred to in the literature as “weights” or “tickets” and which represent a measure of each client's entitlement to the resource relative to other clients. In a “proportional-share” allocation scheme, a first client with twice as many shares as a second client will thus generally be entitled to be allocated twice as much of the resource as the second client. Shares encapsulate the right to use resources, and the distribution of shares provides explicit control over the relative importance of different clients. Share-based allocation is therefore desirable across a broad spectrum of systems that service clients of varying importance.
Several known methods for proportional-share scheduling of various resources, including both randomized and deterministic algorithms for allocating processor time, memory space, access to locks, and I/O bandwidth, etc., are described in “Lottery and Stride Scheduling: Flexible Proportional-Share Resource Management,” Carl A. Waldspurger, Ph.D. Dissertation, Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, September 1995, which also appears as Technical Report MIT/LCS/TR-667. Various aspects of these methods and extensions to them are also described in “An Object-Oriented Framework for Modular Resource Management,” Carl A. Waldspurger and William E. Weihl, Proceedings of the Fifth Workshop on Object-Orientation in Operating Systems (IWOOOS '96), Seattle, Wash., October 1996. According to the methods described in these references, resource rights are encapsulated by abstract, first-class objects called tickets, and active clients consume resources at a rate proportional to the number of tickets that they hold. Tickets can be issued in different amounts and may be transferred between clients. A modular currency abstraction is also introduced to flexibly name, share, and protect sets of tickets. Currencies can be used to isolate or group sets of clients, enabling the modular composition of arbitrary resource management policies. Furthermore, this reference introduces and describes a resource revocation mechanism termed “min-funding revocation,” according to which memory is allocated by revoking it from clients that “pay” fewer shares per unit memory, and grants it to clients that pay more per unit memory.
Extensions to the lottery-scheduling resource management framework that increase its flexibility while providing improved support for simultaneously managing multiple resources (including CPU time, physical memory, and disk bandwidth) are described in “Isolation with Flexibility: A Resource Management Framework for Central Servers,” David G. Sullivan and Margo Seltzer, Computer Science Technical Report TR-13-99, Harvard University, December 1999, which also appeared in USENIX 2000 Technical Conference, San Diego, Calif., June 2000. This paper also identifies a well-known limitation of existing proportional-share memory management techniques: As Sullivan and Seltzer point out, “[e]ffective proportional-share memory management is complicated by the difficulty of determining which processes are actively competing for memory and by the undesirability of a strict partitioning of memory among processes.” Because of these difficulties, the limited solution they then propose gives memory guarantees only to privileged processes that explicitly request them.
There are two separate issues involved in a resource allocation decision in a proportional-share system: first, how much of the resource each client “needs”, and, second, how much of the resource the client is entitled to based on its share allocation. Note that it is completely reasonable to give more resources to a client that has more shares than to another that “needs” them more, as long as both are actively using the allocations that they have been given. The main weakness of the conventional proportional-share methods is felt in the case where a client is unproductively hoarding the resource it has been allocated. For example, the hoarding client may have allocated to it memory pages that it does not reference for a long time and thus are idle—these pages could be more productively reallocated to another client. Indeed, in practice, much of the allocated resource may actually remain idle.
In order to eliminate this weakness, the system must be able to determine how much of the resource is being wasted through idleness or inactivity. This task has often been viewed as too difficult or complicated even to attempt, or more of a task than is justified by the potential reduction in idleness. Despite the different advances that have been made in the area of allocation schemes for computer resources, there is thus still the need for an allocation method that increases efficiency in the sense of optimum usage of the limited resource by clients of varying importance. In particular, within a proportional-share framework, what is needed is a way not only to respect the share allocations of the different clients, but also to identify when allocated resource units are idle and to be able to reallocate these units to other clients who will use them more productively. This invention provides such an improved method, as well as a system that implements it.