A cache is a temporary storage facility for storing copies of frequently accessed data that can be accessed faster than from the originating source, such as a system memory. A cache contains a number of entries, with each entry containing cache data (e.g., within a cacheline), and a tag referencing the data in the cacheline. Usually, there is a one-to-one correspondence between the tag and the cacheline for each entry.
FIG. 1 illustrates a conventional cache implementing a group 101 of entries 108, ranging from entry 0 to entry n. Each entry 108 is composed of a tag 102 and a cacheline 104. Entry (“3”) 100 depicts the one-to-one correspondence between a tag and cacheline that is common in conventional caching techniques. As shown, tag 112 includes a value of hexadecimal number 9A7E, which references data 114. Further, group 101 includes a number 120 (“N(tags)”) of tags 102 that is the same as the number (“N(cachelines)”) of cachelines 104. Virtually all normal caches have a tag referencing a corresponding cacheline, regardless of whether a cache uses direct-mapped, set-associative or fully associative look-up methods. While functional, conventional caches comporting with the caching techniques demonstrated in FIG. 1 are not well suited to independently optimize, for example, cache reuse and latency tolerance for the cache.
Cache structures are designed with cache reusability and latency tolerance in mind, both of which are operational characteristics. Cache reuse describes the ability of a cache to reuse tags without fetching data from the originating source. For example, data requests in some applications seem to frequently reuse a certain number of tags (and thus cachelines), such as during looping and branching operations. As such, latency tolerance relates to a number of tags used to provide a history of past data requests (of the cache) that can satisfy new requests without requiring an access to the originating source. Latency tolerance describes the ability of a cache to operate continuously without stalling (i.e., data to return from an originating source after it is requested). The depth of a cache (i.e., the number of cachelines) generally dictates the tolerance of a cache to avoid latency. Presently, cache designers generally tend to optimize both cache reuse and latency tolerance simultaneously in view of the one-to-one correspondence between tags and cachelines. A drawback to this approach is that if cache 100 requires a number of additional cachelines 130, as shown in FIG. 1, to achieve an acceptable level of latency tolerance, then a corresponding number of additional tags 140 (and supporting hardware resources) is also required—regardless of whether fewer tags could satisfactorily support the cache reusability. For example, if a cache designer adds an additional cachelines 130 that include two-hundred (200) cachelines, then the designer typically also implements a corresponding number of additional tags 140 to include two-hundred (200) tags.
In view of the foregoing, it would be desirable to provide an apparatus, a system, a method, a graphics processing unit (“GPU”), a computer device, and a computer medium that minimize the above-mentioned drawbacks, thereby implementing, among other reasons, enhanced tags to decouple a dependency between tags and cachelines.