1. Technical Field
This invention generally relates to storage and retrieval of electronic entities. More particularly, the invention relates to the use of multi-tiered caches for storing and retrieving objects, wherein groups of objects may be associated with each other, such as in the storage of multiple versions of Web content on a network transformation proxy.
2. Related Art
There are numerous methods for storing data. One such way is through the use of an associative array. In an associative array, an object that is to be stored is associated with a key. The object is stored in a particular location and the location is identified by the associated key. When it is desired to retrieve the object, it is only necessary to look up the key, which identifies the location of the object.
There are various implementations of associative arrays. For example, databases, file systems and caches are associative arrays. Caches, in particular, are of interest here.
Caches are associative arrays that provide local storage of data. “Local,” as used here, is somewhat relative. In the case of caches that are coupled to microprocessors to allow them to operate more quickly and efficiently, “local” may mean that the cache comprises memory manufactured on the same chips as the microprocessors. In the case of caches that are used in Web proxys, however, “local” may mean that the cache is implemented in a disk drive within the proxy housing.
Caching proxys store and retrieve Web content such as Web pages using the URLs associated with the Web pages as the respective keys. One of the problems that may arise in this situation, however, is that there may be a number of different Web pages that have the same URL. For example, the substance of the Web pages may be approximately the same, but they may each be adapted for viewing on a different type of device (e.g., a desktop computer or a Web-enabled cellular phone). The key may therefore need to include additional information in order to uniquely identify the Web page that has to be retrieved. The key may therefore incorporate other characteristics of the Web page, such as cookies or the type of browser for which the page is designed.
The caching implemented in prior art proxys is typically flat. In other words, there is a single cache with multiple entries. Each cache entry contains a Web page associated with a corresponding key. As noted above, the key may incorporate both the URL and other characteristics that are necessary to uniquely identify the cached content. Thus, if the proxy needs to store 1000 Web pages having different URLs, 1000 cache entries would be required. If the proxy were required to store 10 different versions of each of these Web pages, 10,000 cache entries would be required.
Because the cache is flat, the time and/or the memory required to store and retrieve entries in the cache increases with the number of entries. Depending on the data structure used, lookup time can vary from O(n) to O(log(n)). even to O(1) (constant time). No benefit is derived from the similarity of the entries (i.e., the fact that ten of the entries may simply be different versions of the same Web page).
Further, when a flat caching structure is used to store multiple versions of content, there is no way to handle sets of associated content. For instance, there is no way to store data that is common to all the associated content (e.g., storing HTTP headers or other information that is common to multiple versions of the same web page). The common information simply has to be stored for each of the separate versions. Similarly, there is no way to handle these sets of associated content as a group. For example, if it is desired to update every version of an obsolete Web page, there is no way to take a single action that affects all of the versions—they have to be individually located in the cache structure and updated.
It should be noted that, while multi-tiered storage mechanisms exist for databases, these are distinct from cache structures. Databases are not designed to be used as functional libraries inside of other programs. In databases systems, trees and multi-level storage and retrieval structures must be explicitly constructed by database programmers and, because of the effort, expense and overhead of implementing a database system, this technology is not applicable to high performance cache retrieval.