1. Field of the Invention
This invention relates to caches for use in computing environments.
2. Description of Background Information
At its most basic level, a computing environment is made up of (1) a storage area to maintain data and instructions and (2) one or more processing units to execute the instructions and to consume and produce data in accordance with those instructions. One goal in a computing system is to maintain efficient operation by providing the necessary data and instructions to the processing units as quickly as possible. Therefore, the ideal storage area would contain an unlimited amount of fast memory such that any piece of information within the system would be instantly available to the processing units as needed.
However, a certain access delay is associated with any type of storage device, and access delays tend to increase with device capacity. With reference to the year 1999, for example, semiconductor random-access memory (RAM) chips have capacities measured in megabytes and access delays measured in microseconds (10xe2x88x926 s), while disk drives have capacities of gigabytes but access delays measured in milliseconds (10xe2x88x923 s). Moreover, the cost per storage element is inversely related to device capacity and access delay, so that fast storage devices such as semiconductor RAM chips also tend to be more expensive (e.g. in terms of cents per bit) than slower, large-capacity devices such as disk and tape drives. Access delays are also dependent upon the nature of the interface between the processing units and the storage device; for example, information may be obtained much more quickly from an level 1 (CPU) or level 2 (memory) cache over a chip-level bus than from main memory over a board-level local or system bus or from a disk drive over a peripheral bus cable.
In order to create the illusion of an affordable memory that is both large and fast, the storage area of a modern computing system is commonly configured as a memory hierarchy wherein the upper levels are closer to the processor, run faster, and have more limited capacities than the lower levels. FIG. 1 shows a block diagram of a computing system comprising a processor 100 and a memory hierarchy 110 including a cache 120, main memory 130, and secondary memory 140. In such familiar examples as a personal computer or a file server, the cache and main memory are typically implemented in semiconductor RAM memory, while the secondary memory may include slower devices, such as magnetic and/or optical disk drives (e.g. floppy disk, hard disk or Winchester, so-called xe2x80x98floptical,xe2x80x99 phase-change, CD-ROM, and CD-RW drives and the like) and similar mass storage devices that can be either removable, dedicated, or shared over a network. One common configuration is shown in FIG. 1A, wherein processor 100, cache 120, and main memory 130 are components of host 10, which interfaces as a discrete unit with storage device 170 (e.g. a disk drive as indicated above).
Memory hierarchies leverage the principles of locality in order to create the illusion of a large memory that may be accessed rapidly. The principle of locality in time states that if a storage element is accessed, it will probably be accessed again soon. The principle of locality in space states that if a storage element is accessed, nearby storage elements will probably be accessed soon. In order to exploit these localities, each level of a memory hierarchy stores information from one or more lower levels that is local to information that has been accessed in the past and therefore is more likely to be accessed in the near future.
In FIGS. 1 and 1A, cache 120 exploits the principles of locality by storing, in a location quickly accessible to processor 100, information from the levels below which is likely to be accessed soon. A cache may also be applied between any two levels of the memory hierarchy. As shown in FIG. 2, for example, a cache 160 may be added at the interface between the main memory 130 and secondary memory 140. In this way, the average delay encountered when accessing a device within secondary memory 140 (such as a disk drive) may be reduced. In such applications, cache 160 stores information that is local to information previously accessed from secondary memory 140.
Cache 160 may be addressed over a system bus as a part of main memory 130, as in the example system shown in FIG. 2A, where the secondary memory is represented by storage device 170 (e.g. a magnetic and/or optical disk drive) and host 10 communicates with the storage device over a peripheral bus 172 such as a SCSI, IDE, PCMCIA, IEEE 1394, Universal Serial or Fibre Channel bus. In this case, cache 160 may be a dedicated hardware unit collocated with main memory 130 of the host machine or may even be implemented within such memory (i.e. cache 160 being defined at least in part by firmware or software).
In the alternative, cache 160 may be addressed over a peripheral bus as a part of secondary memory 140 as illustrated in FIG. 2B, wherein cache 160 is addressed over peripheral bus 172 as a part of storage device 170. In this case, it is possible for the operations of cache 160 to be completely transparent to host 10.
As shown in FIG. 3, a cache 190 may be implemented in the same manner at the interface between a local memory system 180 (e.g. memory hierarchy 110 in FIG. 1) and a remote storage system 195 (such as a local- or wide-area network or the World Wide Web) in order to reduce the average delay associated with accesses to system 195. In an exemplary application, local system 180 includes a proxy server coupled to a local-area network, and cache 190 stores Web pages which have recently been accessed and are likely to be accessed again. Once cache 190 has verified that it holds the most recent version of a page requested by a user within local system 180, it may supply that page and avoid the necessity of retrieving it again from remote storage system 195.
As shown in FIG. 4, a cache 200 comprises a set of n cache entries 210-1 through 210-n. Each cache entry 210-i (where index i is an integer from 1 to n) has an address portion 220-i, a data portion 230-i, and an auxiliary portion 240-i. Data portion 230-i holds data from a particular area of the underlying level of the memory hierarchy. This cached area, which may comprise one storage address or several adjacent addresses, is uniquely identified within the underlying level by some combination of the index i and the address portion 220-i. In a direct-mapped cache, for example, the address identifying the cached area is a concatenation of the address portion 220-i (also called a tag) and the index i. In a fully associative cache, on the other hand, address portion 210-i itself uniquely identifies the cached area. Many known addressing schemes for caches exist, ranging from fully associative through set associative to direct mapped.
It is desirable to distinguish (1) the potentially random state of a cache entry at power-up from (2) the state of a cache entry which holds valid information. For this purpose, each auxiliary portion 240 may include a validity field. If necessary, all of the validity fields may be cleared during an initialization routine, and the validity field of an auxiliary portion 240-i may be set when information is cached into address portion 220-i and data portion 230-i.
A cache generally has a smaller capacity for information than the lower level it reflects, so that only a portion of the information stored in the lower level will be present in the cache at any one time. When access to a piece of information is requested, the cache is queried in order to determine whether or not the information is currently stored in the cache (a cache hit or miss, respectively). When a cache miss occurs, the information to be read or written may be entered into the cache in case it is needed again in the near future. If the information is to be entered into a cache which is already full, however, then the information in one or more existing cache entries must be discarded in order to make room for the new information.
In a direct mapped cache, each location in the underlying level is mapped to a unique cache entry. When a replacement situation arises in such a cache, the new information being retrieved from or written to each location can only be stored in a particular one of the cache entries, and no replacement decision exists. On the other hand, if the cache includes a degree of associativity, then each location in the underlying level may be mapped to one among a number of cache entries and a decision must be made regarding which of these entries to replace.
It is desirable to retain information that will be accessed again soon and to replace information that will not be accessed soon or at all. However, an optimal cache replacement strategy requires perfect knowledge of future access requests, and such knowledge is generally unavailable. Therefore, most existing cache replacement strategies (also called cache replacement algorithms or policies or eviction policies) select entries for replacement either at random or according to the limited amount of historical information that may be stored within other fields in auxiliary portions 240. Examples of such historical information include a reference field, which is updated when the information is accessed; an age field, which is updated at predetermined intervals; and a dirty field, which indicates whether the information is still coherent to the contents of the cached area. One type of cache replacement strategy that may be applied to the historical information is a first-in-first-out or FIFO scheme, in which cache entries are selected for replacement (among those entries mapping to the location of the data to be stored) according to the order in which they were created. Another type of replacement strategy is a least-recently-used or LRU scheme, in which cache entries are selected for replacement according to the relative ages of their most recent use.
Cache storage has a high cost per bit, and cost-effectiveness limits the practical size of a cache in a given application. Unfortunately, limiting the size of the cache also limits the amount of storage available to hold a history of accesses to different areas of the cached level, thus restricting the amount of information available for prediction and replacement decisions. Because the history of accesses to a particular cached block is lost upon replacement of that block, limits on cache size may thus result in such undesirable consequences as cache thrashing, wherein information in a particular storage location is repeatedly retrieved and replaced because the cache fails to properly interpret the access history of that location. It is therefore desirable to provide a way to retain more information about access history.
An apparatus according to an embodiment of the invention includes a cache and a cache extender. The cache has n cache entries which each contain cache entry address information. The cache extender has m entries, wherein each entry has (1) an extender address portion having extender entry address information and (2) an extender auxiliary portion having historical information. The cache and cache extender are related in that the address information of each of the n cache entries relates to the address information of one among the m extender address portions. The address information of each of the m extender address portions also relates to a storage area of an underlying level of a memory hierarchy, and the historical information of the corresponding extender auxiliary portion relates at least in part to requests for access to locations within that storage area.