I. Field of the Invention
This invention relates generally to computer technology, and more particularly, to improving processor performance in a computer system.
II. Background Information
The use of a cache memory with a processor facilitates the reduction of memory access time. The cache memory may be configured, among others, as an instruction cache, a data cache, or a translation lookaside buffer (cache that stores recently used page-directory and page-table entries). The fundamental idea of cache organization is that by keeping the most frequently accessed instructions and data in the fast cache memory, the average memory access time will approach the access time of the cache. It is generally understood that memory devices closer to the processor operate faster than memory devices farther away on the data path from the processor. However, there is a cost trade-off in utilizing faster memory devices. The faster the data access, the higher the cost to store a bit of data. Accordingly, a cache memory tends to be much smaller in storage capacity than main memory, but is faster in accessing the data.
A virtual memory environment allows a large linear address space to be simulated with a small amount of physical memory (e.g., random access memory or read-only memory) and some disk storage. When a process references a logical address in memory, the processor translates the logical address into a linear address and then translates the linear address into a corresponding physical address. The physical address corresponds to a hardware memory location. A linear-to-physical address translation involves memory management hardware translating the linear address to the physical address. The linear-to-physical address translation is time consuming as it uses a 10 memory access (e.g., the memory access may be to a cache or main memory) and waiting for this translation before performing an action (e.g., performing a cache lookup) increases the memory access time.
In order to decrease memory access time, a cache may be organized as a linear-addressed cache where the linear address of the memory request is used for the cache lookup rather than the physical address. The linear-addressed cache forgoes the linear-to-physical address translation before performing the cache lookup. Forgoing the linear-to-physical address translation decreases the memory access time. When using the linear-addressed cache, the linear-to-physical address translation is still performed because the physical address resulting from the translation is used to validate the data accessed in the cache using the linear address (i.e., check to ensure that the correct memory locations are accessed), but this linear-to-physical address translation is performed in parallel with the cache lookup. Performing the linear-to-physical address translation in parallel with the linear-addressed cache lookup improves the memory access time as the translation overhead is minimized due to the overlap with the linear-addressed cache lookup.
More than one process may execute on a processor. Typically, the linear-addressed cache is flushed when the processor switches from executing one process to executing another process. A cache flush occurs when the processor writes the valid and current information from its cache back into main memory. The cache flush diminishes processor performance as the processor may have to wait for completion of writes to the main memory. Moreover, data that would have been accessed after the cache flush that was in the cache before the flush now has to be brought back into the cache. Therefore, cache flushes are avoided whenever possible in order to increase processor performance.
If a cache flush is not performed whenever a process switch occurs, then the linear-addressed cache may suffer from linear address aliasing. Linear address aliasing occurs when two separate processes running on the processor accesses the same cache line but those linear addresses map to different physical addresses (e.g., process one accesses linear address A and process two accesses linear address A but linear address A maps to different physical addresses). When linear address aliasing occurs, if the physical address, generated by performing a linear-to-physical address translation of the linear address, does not match a physical address within the tag of the cache line whose tag matches the linear address, then a data block referenced by the linear address is brought into a linear-addressed cache from a storage device at a higher level in the memory hierarchy (e.g., main memory or the hard disk). This memory access (resulting from the linear address aliasing) to the slower storage device at the higher hierarchical level decreases processor performance.
In order to reduce linear address aliasing, a process identifier that is unique to a process can be combined with the linear address to form an adjusted linear address. By combining the process identifier that is unique with the linear address, the resulting adjusted linear address provides a high probability of no aliasing. However, treating all accesses to a linear-addressed cache the same by combining a process identifier that is unique with the linear address can lead to replication of a shared data block (i.e., two or more processes use a data block whose physical address is in shared memory space). Because cache memory reduces the memory access time, storing only unique data blocks (i.e., shared data blocks are stored only once in the linear-addressed cache memory) decreases the memory access time as more unique data blocks in the linear-addressed cache result in fewer cache misses; the cache miss results in an increase in memory access time because of the resulting access to a slower storage device at a higher level in the memory hierarchy. Because cache memory is expensive, duplicating shared data blocks in the linear-addressed cache memory is not cost-effective.
For the foregoing reasons, there is a need to differentiate between shared data blocks and non-shared data blocks and how the shared data blocks and the non-shared data blocks are accessed and stored in a linear-addressed cache that is configured to reduce the problem of linear address aliasing.