1. Field of the Invention
The present invention relates in general to the field of processors, and in particular, to a technique of providing a shared cache structure for temporal and non-temporal instructions.
2. Description of the Related Art
The use of a cache memory with a processor facilitates the reduction of memory access time. The fundamental idea of cache organization is that by keeping the most frequently accessed instructions and data in the fast cache memory, the average memory access time will approach the access time of the cache. To achieve the maximum possible speed of operation, typical processors implement a cache hierarchy, that is, different levels of cache memory. The different levels of cache correspond to different distances from the processor core. The closer the cache is to the processor, the faster the data access. However, the faster the data access, the more costly it is to store data. As a result, the closer the cache level, the faster and smaller the cache.
The performance of cache memory is frequently measured in terms of its hit ratio. When the processor refers to memory and finds the word in cache, it is said to produce a hit. If the word is not found in cache, then it is in main memory and it counts as a miss. If a miss occurs, then an allocation is made at the entry indexed by the access. The access can be for loading data to the processor or storing data from the processor to memory. The cached information is retained by the cache memory until it is no longer needed, made invalid or replaced by other data, in which instances the cache entry is de-allocated.
In processors implementing a cache hierarchy, such as the Pentium Pro(trademark) processors which have an L1 and an L2 cache, the faster and smaller L1 cache is located closer to the processor than the L2 cache. When the processor requests cacheable data, for example, a load instruction, the request is first sent to the L1 cache. If the requested data is in the L1 cache, it is provided to the processor. Otherwise, there is an L1 miss and the request is transferred to the L2 cache. Likewise, if there is an L2 cache hit, the data is passed to the L1 cache and the processor core. If there is an L2 cache miss, the request is transferred to main memory. The main memory responds to the L2 cache miss by providing the requested data to the L2 cache, the L1 cache, and to the processor core.
The type of data that is typically stored in cache includes active portions of programs and data. When the cache is full, it is necessary to replace existing lines of stored data in the cache memory to make room for newly requested lines of data. One such replacement technique involves the use of the least recently used (LRU) algorithm, which replaces the least recently used line of data with the newly requested line. In the Pentium Pro(trademark) processors, since the L2 cache is larger than the L1 cache, the L2 cache typically stores everything in the L1 cache and some additional lines that have been replaced in the L1 cache by the LRU algorithm.
U.S. patent application Ser. No. 08/767,950 filed Dec. 17, 1996, now U.S. Pat. No. 5,829,025, entitled xe2x80x9cComputer System and Method of Allocating Cache Memories in a Multilevel Cache Hierarchy utilizing a Locality Hint within an Instructionxe2x80x9d by Milland Mittal, discloses a technique for allocating cache memory through the use of a locality hint associated with an instruction. When a processor accesses memory for transfer of data between the processor and the memory, that access can be allocated to the various levels of cache, or not allocated to cache memory at all, according to the locality hint associated with the instruction. Certain instructions are used infrequently. For example, non-temporal prefetch instructions preload data which the processor does not require immediately, but which are anticipated to be required in the near future. Such data is typically used only once or will not be reused in the immediate future, and is termed xe2x80x9cnon-temporal dataxe2x80x9d. Instructions that are frequently used are termed xe2x80x9ctemporal dataxe2x80x9d. For non-temporal data, since the data is used infrequently, optimal performance dictates that the cached application code and data not be overwritten by this infrequently used data. U.S. Pat. No. 5/829,025 solves this problem by providing a buffer, separate from the cache memory, for storing the infrequently used data, such as non-temporal prefetched data. However, the use of an extra, separate buffer is expensive both in terms of cost and space.
Accordingly, there is a need in the technology for providing a shared cache structure for temporal and non-temporal instructions, which eliminates the use of a separate buffer.
A method and system for providing cache memory management. The system comprises a main memory, a processor coupled to the main memory, and at least one cache memory coupled to the processor for caching of data. The at least one cache memory has at least two cache ways, each comprising a plurality of sets. Each of the plurality of sets has a bit which indicates whether one of the at least two cache ways contains non-temporal data. The processor accesses data from one of the main memory or the at least one cache memory.