1. Field of the Invention
The present invention relates to the field of processors and, more particularly, to a technique for utilizing cache memory.
2. Background of the Related Art
The use of a cache or caches with a processor (whether integrated within the processor chip or external to it) is well known in the computer art. A primary purpose of using caches is to enhance processor performance by reducing data access time. It is generally understood that memory devices closer to the processor operate faster than memory devices farther away on the data path from the processor. However, there is a cost trade-off in utilizing faster memory devices. The faster the data access, the higher the cost to store a bit of data. Accordingly, a cache memory tends to be much smaller in storage capacity than main memory, but is faster in accessing the data.
Current generation high performance computer systems will utilize multiple caches, typically arranged in a hierarchical arrangement of cache levels. A processor of a computer system maintains cache coherency by updating all the caches simultaneously or by updating the different cache levels at various times. For example, in a write-through cache system, a write operation simultaneously updates the cache and the main memory or the next level cache in the hierarchy. If all of the caches are write-through caches, then all of the caches and the main memory can be updated simultaneously. In a write-back cache system, a write operation updates only the closest cache. The other cache level(s) and the main memory can be updated at a later time, such as when a cache line is evicted (victimized). Accordingly, there may not be data consistency between the main memory and the cache in a write-back cache.
The use of write-through and write-back caches are known in the art, along with the various cache line allocation and de-allocation schemes for accessing the cache memories. It is also understood that the caches can be inclusive, partially inclusive or exclusive, as pertaining to data storage in the cache hierarchy. A cache hierarchy is inclusive if a cache at a given level is a subset of a cache at a higher level of the hierarchy. A data request by a processor is typically satisfied by the closest cache level that contains the data. Lower in the hierarchy is defined as those levels closer to the processor. A cache is exclusive if cached data in one level does not exist in any other level. A partially inclusive cache implies that the data at a given cache level is not a full subset of a higher cache level. Generally in practice, most cache systems implement a partially inclusive cache structure.
One notable aspect of cache memory is the use of address tags to identify cache lines present in a cache. An address tag is a subset of the actual address. The number of bits in the tag is determined by the number of sets in the cache and also the cache line size. A cache line includes the tag, information (or data) and state (or status) bit(s), which provide certain state (or status) information pertaining to the cached line. For example, state bits are used to identify if the cache line is dirty (been modified), is shared by other resources, is invalid or is exclusive to one resource.
Since each cache line stores multiple bytes of data, the tag corresponds to the beginning address of the group of data in memory which are now stored in the cache. Accordingly, the cached data is a replication of the data stored in the main memory.
Whenever a read instruction requiring data retrieval from memory is executed, the processor generates an address for accessing that memory location to retrieve the data. This read address is then presented to the cache. A particular set is accessed and tags present in the different ways in the set are compared with the read address. If the compare operation is successful, data is provided to the processor. If the compare operation is unsuccessful, data is retrieved from external memory and loaded into the cache and also forwarded to the processor. If the same data is needed again, then data is retrieved from the cache instead of the main memory.
Likewise, when a write is executed by the processor, the processor will need to update the cached data, either prior to or simultaneously with the updating of the main memory. In a write operation from the processor, a cache is accessed to determine if the tag for that address is present in the cache, so that the cached data can be loaded into a write buffer. If data had not been cached, then the data is retrieved from the main memory and loaded into the cache(s) and the write buffer, similar to a read operation. Next, the data in the write buffer entry is updated with the store data. Subsequently, the modified data in the write buffer is written to the appropriate location(s) in the cache(s) and/or the main memory. It should be noted that the cache cannot be updated directly with the store data due to significant implementation difficulties.
In order to determine the presence of a particular address tag in a cache corresponding to the address associated with the data which is to be written, tag compares are performed at the cache levels. A tag comparison determines if there is a "hit" or a "miss" at a given cache level. A subtle but important point to be observed is that one tag compare operation is performed during a read and two tag compare operations are performed during a write operation..
In a typical tag comparison operation, some amount of time is needed to perform the comparison. Typically, at least a full clock cycle is required to read the tags and compare them to the address to determine if a cache line associated with the address is present in the cache. A mechanism that avoids the need for this tag compare operation when updating the cache line would save significant amount of time, thereby improving processor performance. The present invention provides for such a scheme in which tag compares during a write operation to modify an existing cache line is avoided, in order to improve processor performance. That is, the practice of the present invention reduces the required tag comparisons during a write to one tag compare operation, instead of two.