A computer system, in its most essential form, typically comprises a processor, a memory unit, and an I/O device with which the computer system communicates with an end-user. The end-user provides the computer system with a program typically comprising a set of instructions or codes directing the processor to perform tasks. Generally, the tasks involve manipulating data that is provided to the computer system by the end-user. Both the data and the codes are stored in the memory unit. The processor reads the codes and the data, manipulates it according to the program, and then stores the result in the memory unit.
Both processors and memory units have become faster as the technology has advanced in the field of electronics. However, the speed with which today's processors are able to execute instructions remains much faster relative to the speed with which the memory units are able to deliver stored data. This difference in speed, referred to as memory latency, causes an inefficiency as the processor remains idle while it is waiting for the slower memory to make the next piece of data available. Reducing memory latency is of great interest to computer users because it will result in improved overall performance of the computer system.
One way to reduce memory latency is to utilize a faster intermediate level of memory known as Cache. A general cache consists of ways, sets and lines. A way is comprises a plurality of sets which in turn includes a plurality of lines, and a line is a container of a fixed length that stores the data. In a single clock cycle, one look-up and one fetch is done, fetching one line from the cache.
Generally, cache stores a tag which identifies a corresponding data. Upon a processor receiving a request for data, a cache controller performs a look-up operation matching an abbreviated portion of the address of the requested data with one or more tags. If the search results in a match, i.e., a cache hit, then the corresponding data in cache is sent to the processor. Otherwise, a cache miss occurs, and the data is transferred from main memory. A look-up operation is costly in terms of power consumption and time savings, and if the data length exceeds the size of a cache line resulting in having to store a referenced data in multiple cache lines, then multiple look-ups are necessary to fully cache the data.
There are several possibilities for organizing the cache structure. One possibility is for each cache line to contain the entire block of data. In this approach, the length of the cache line is sufficient to hold the longest possible data block. This approach can cause substantial inefficiencies in memory usage, since the average length of a block of data is smaller than the cache line length.
Another approach is to divide a block of data and store it in several cache lines, as illustrated in FIG. 1. Data block 120 is 10 bytes long. However, in this example, the size of a cache line is only 4 bytes. Thus in order to store data block 120, three cache lines are needed. In line 0, the first 4 blocks of data are stored. In line 1, the next 4 bytes, and in line 3, the remaining 2 bytes. Since each line has a corresponding tag and only the first line of the block is looked-up, the remaining tags for line 1, and line 2 are wasted because the tag area is not used, and the cache lines that contain the continuation of the data block occupy lines that could be used for other blocks, hence the effective associativity of the cache is reduced. Now, if the data block contains 100 bytes, instead of 10, then 25 lines are required to cache the data block, resulting in storage of 24 additional tags which serve little or no use.