1. Field of the Invention
The present invention relates to distributed computing networks, and deals more particularly with programmatic techniques for allocating memory among competing services in a distributed computing environment such that maximum benefit can be realized from the memory allocations (e.g., to improve client response time).
2. Description of the Related Art
The popularity of distributed computing networks and network computing has increased tremendously in recent years, due in large part to growing business and consumer use of the public Internet and the subset thereof known as the “World Wide Web” (or simply “Web”). Other types of distributed computing networks, such as corporate intranets and extranets, are also increasingly popular. As solutions providers focus on delivering improved Web-based computing, many of the solutions which are developed are adaptable to other distributed computing environments. Thus, references herein to the Internet and Web are for purposes of illustration and not of limitation.
Whereas the early Internet served primarily as a distributed file system in which human users could request delivery of already-generated static documents, the trend in recent years has been to add more and more dynamic and personalized aspects into the content that is served to requesters. However, many dynamically-generated documents also include static content, such as forms, graphic images, sound files, and other types of embedded objects. (Thus, discussions herein are primarily in terms of already-generated static content, but apply equivalently to static content which is incorporated into dynamically-generated documents or other types of dynamically-generated content.)
The number of objects involved in servicing a content request may range from a single stored object to a relatively large number of objects (often, on the order of tens of objects). (The terms “stored object” and “object” are used interchangeably herein to refer to an object or file which is stored on a storage medium—or which may, in some cases, be distributed across more than one storage medium. It should be noted that references herein to objects are not to be construed as limiting the present invention to the field of object-oriented programming. Furthermore, the term “content” as used herein is intended to be synonymous with one or more objects or files unless the reference context indicates otherwise.)
While some content requests are generated programmatically, many content requests have a human user waiting for a response. Returning responses quickly and efficiently can therefore be critical to user satisfaction and to the overall success of a Web site.
In a Web hosting or service provider environment where a number of services are hosted, the hosted services are in competition for the scarce (i.e., limited) resources that are available, such as central processing unit (“CPU”) time, storage resources, and memory. It is desirable to tune the system so that each hosted service has an appropriate amount of access to those resources, enabling the collection of services as a whole to offer optimal response time to their users. When allocating memory amongst the services to use for cache space, it is therefore desirable to determine which service(s) will benefit most from this resource allocation.
As is well known in the art, caching reduces the number of requests that reach the Web servers, thereby improving response time (and also reducing processing load on devices upstream from the cache). When content cannot be served from cache, the content requests come to a Web server. This is commonly referred to as a “cache miss”, whereas finding content that can be served from cache is referred to as a “cache hit”.
A “cache hit ratio” is defined as the number of references to objects in the cache, divided by the total number of references for all objects. (For purposes of the present invention, cache hit ratios are preferably expressed in terms of each particular service offered by the Web hosting environment.) Typical cache replacement algorithms seek to maximize the cache hit ratio (with perhaps some caveats for considering the cost of replacing some cached objects, and balancing this cost against the improvements in the cache hit ratio).
Response time is longer for objects that have a cache miss, due to the added cost of retrieving the object from storage. If the cache hit ratio for a particular service “S” is low (i.e., there are a large number of cache misses), relative to the other hosted services, it may be desirable to allocate more memory for caching the objects of service S, to thereby reduce the response time for servicing S's client requests.
Accordingly, what is needed are improved techniques for allocating memory for cache storage space among competing services in a distributed computing environment.