As processor speed continues to increase at a faster rate than memory speed, memory speed has become increasingly important. A cache is a type of buffer that is smaller and faster than main memory, and is typically disposed between the processor and the main memory. To improve memory speed, the cache stores a copy of instructions and/or data from the main memory that are likely to be needed next by the processor.
A cache can store instructions that were copied from the main memory in cache lines. A cache line may store one or many consecutive instructions. Each cache line can have a tag entry that is used to identify the memory address of the copied instructions. In its simplest form, a tag is the minimal portion of the address needed to uniquely identify the copied instructions. Other forms of tags can include encoded addresses.
A cache hit occurs when a requested instruction is present in the cache. A cache miss occurs when the requested instruction is not stored in the cache. Typically, when a cache miss occurs, the execution unit of the processor must wait or stall until the requested instruction is retrieved from the main memory before continuing the execution of the program, causing processor performance to degrade. The number of cache hits and misses can be used as a measure of computer system performance.
Multi-level cache structures may have two or more independent cache memories such as L0 and L1 caches (Level 0 cache and Level 1 cache). These cache memories can have different sizes and have different speeds or memory latency access time. Typically, higher level caches (e.g. L1 cache) store more instructions but are slower to access than lower level caches (e.g. L0 cache).
To optimize microprocessor performance, an instruction cache should deliver instructions with the lowest possible latency, and with throughput at least as high as the instructions can be processed by an instruction fetch unit. Some prior cache designs attempt to achieve these goals by using multi-ported memories that allow multiple, simultaneous accesses to the memories. For example, the designs may implement the tag array using 3-ported memory array cells, and the data array using 2-ported memory array cells where each port into a memory array can independently access any piece of data in that memory array. In this way, various types of accesses and cache events such as hits, fills, and snoops can be processed without interfering with other events. However, this approach results in higher design complexity, larger chip area, and greater power consumption.