A cache is a temporary storage facility for storing copies of data that can be accessed faster than from the originating source, such as a system memory. A cache contains a number of cachelines containing units of data, as well as a number of tags referencing the data in the cachelines. To determine whether there is a “cache hit” or a “cache miss,” a cache memory system performs a look-up operation to match addresses constituting a request for data against the contents of the repository of tags. In ordinary cache memory systems, the repository of tags can service only a limited number of tag look-ups per look-up operation.
A drawback to this approach is that when the number of addresses requested exceeds the number of tag look-ups per look-up operation, then multiple look-up operations are generally required to determine whether each of the requested data units reside in the cache. To illustrate, consider that a request includes six addresses and a look-up operation requires one clock cycle to look up three tags. To ascertain whether the data corresponding to the six addresses reside in the cache, conventional cache memory systems distribute the six addresses over two look-up operations (i.e., three addresses per look-up operation). But by distributing addresses over multiple look-up operations, the determination whether there are hits or misses for the request stalls until all the addresses are looked up, thereby degrading the performance of the cache memory system as well as the computing device in which the cache memory system operates.
In view of the foregoing, it would be desirable to provide an apparatus, a system, a method, a graphics processing unit (“GPU”), a computer device, and a computer medium that minimize the above-mentioned drawbacks, thereby implementing a look-up filter to filter superfluous look-ups to reduce stalling in look-up operations.