The present invention generally relates to controlling cache entries in a cache memory and, more specifically, to providing an improved cache replacement mechanism and method.
Currently, modern computer systems typically contain several integrated circuits (ICs), including a processor which may be used to process information in the computer system. The data processed by a processor may include computer instructions which are executed by the processor, as well as data which is manipulated by the processor using the computer instructions. The computer instructions and data are typically stored in a main memory in the computer system.
Processors typically process instructions by executing the instruction in a series of small steps. In some cases, to increase the number of instructions being processed by the processor (and therefore increase the speed of the processor), the processor may be pipelined. Pipelining refers to providing separate stages in a processor where each stage performs one or more of the small steps necessary to execute an instruction. In some cases, the pipeline (in addition to other circuitry) may be placed in a portion of the processor referred to as the processor core. Some processors may have multiple processor cores, and in some cases, each processor core may have multiple pipelines. Where a processor core has multiple pipelines, groups of instructions (referred to as issue groups) may be issued to the multiple pipelines in parallel and executed by each of the pipelines in parallel.
As an example of executing instructions in a pipeline, when a first instruction is received, a first pipeline stage may process a small part of the instruction. When the first pipeline stage has finished processing the small part of the instruction, a second pipeline stage may begin processing another small part of the first instruction while the first pipeline stage receives and begins processing a small part of a second instruction. Thus, the processor may process two or more instructions at the same time (in parallel).
To provide for faster access to data and instructions as well as better utilization of the processor, the processor may have several caches. A cache is a memory which is typically smaller than the main memory and is typically manufactured on the same die (i.e., chip) as the processor. Modern processors typically have several levels of caches. The fastest cache which is located closest to the core of the processor is referred to as the Level 1 cache (L1 cache). In addition to the L1 cache, the processor typically has a second, larger cache, referred to as the Level 2 cache (L2 cache). In some cases, the processor may have other, additional cache levels (e.g., an L3 cache and an L4 cache).
To provide the processor with enough instructions to fill each stage of the processor's pipeline, the processor may retrieve instructions from the L2 cache in a group containing multiple instructions, referred to as an instruction line (I-line). The retrieved I-line may be placed in the L1 instruction cache (I-cache) where the core of the processor may access instructions in the I-line. Blocks of data (D-lines) to be processed by the processor may similarly be retrieved from the L2 cache and placed in the L1 cache data cache (D-cache).
The process of retrieving information from higher cache levels and placing the information in lower cache levels may be referred to as fetching, and typically requires a certain amount of time (latency). For instance, if the processor core requests information and the information is not in the L1 cache (referred to as a cache miss), the information may be fetched from the L2 cache. Each cache miss results in additional latency as the next cache/memory level is searched for the requested information. For example, if the requested information is not in the L2 cache, the processor may look for the information in an L3 cache or in main memory.
The implementation of a cache is normally accomplished through three major portions: directory, arrays and control. The directory contains the address identifiers for the cache line entries, plus other necessary status tags suitable for particular implementations. The cache arrays store the actual data bits, with additional bits for parity checking or for error correction as required in particular implementations. Cache control circuits provide necessary logic for the management of cache contents and accessing. Upon an access to the cache, the directory is accessed or “looked up” to identify the residence of the requested data line. A cache hit results if it is found in the cache, and a cache miss results otherwise. Upon a cache hit, the data may be accessed from the array if there is no prohibiting condition, e.g., protection violation. Upon a cache miss, the data line is normally fetched from the bulk memory and inserted into the cache first, with the directory updated accordingly, in order to satisfy the access through the cache.
Since a cache only has capacity for a limited number of line entries and is relatively small compared with the bulk memory, replacement of existing line entries is often needed. The replacement of cache entries in a set associative cache is normally based on algorithms such as the Least Recently Used (LRU) scheme. That is, when a cache line entry needs to be removed to make room for, i.e., replaced by, a new line, the line entry that was least recently accessed will be selected.
One of the problems with the LRU scheme of replacing cache entries is determining when a line entry is the least recently used cache entry. There are numerous methods to make this determination; however, some inefficiencies will still exist. Accordingly, there is a need for improved methods and apparatus for determining when a line entry in a cached memory will no longer be accessed in the near future and therefore is available to be replaced.