Caches are very common structures in modern processors. The main functionality of a cache is to hold recently used copies of data, e.g., lines from main memory. These lines are likely to be used by the processor and therefore are available within few clocks, which is referred to as the level 1 (L1) cache access time latency. A L1 cache is generally a lowest level cache, and is typically formed on the same semiconductor die as a core of the processor. Caches stores copy of lines of the main memory with a line size that is often 64 bytes in modern processors
A non-blocking cache is the most common cache structure and is used in an out-of-order micro-architecture. In this structure the cache is not blocked when handling a L1 cache miss so that it can serve latter requests (loads and stores). This behavior is accomplished using dedicated hardware that is called a fill buffer (FB). Typically, multiple individual fill buffers will be present, one of its tasks is to store a line received from the main memory before it is inserted in the cache. A fill buffer may contain a copy of a line in any state (the same way a data cache keeps lines). Generally fill buffers are considered an extension to a data cache. In this case the fill buffers are accessed whenever the data cache is accessed
A cache structure is also characterized by its size and set-associativity. A size of a cache is often expressed as the number of bytes that can be stored. Set-associativity of the cache is the partitioning the cache between sets and ways. For example, a 32K byte, 8-way set associative cache having a 64 byte line size structure includes 64 sets, and for each set there are 8 ways (e.g., lines or entries) of 64 bytes each.
Typically, a L1 cache is the first level data cache that is accessed by load and store operations from a processor. With the lack of general-purpose registers in an x86 architecture, L1 cache activity is high. Therefore loads and stores that access the L1 cache are frequent. Cache size and its associativity can affect performance. Any load accessing the cache will read all ways of a set before determining from which way to obtain the data. For example, all N ways are accessed in parallel (timing) on each access. In addition, fill buffers (which are an extension to the cache) are also accessed on every memory access.