Although we have developed very high speed processing cores, one significant problem in computer systems is that these high-speed processing cores often sit idle because the instructions that they need to execute cannot be retrieved, or “fetched” quickly enough from memory. Fundamentally, if all memory in a computer system could be made lightning fast, then these high-speed processing cores would operate at their intended speed and never sit idle.
If all memory in a computer system was incredibly fast in terms of response time, then instructions could be fetched rapidly enough that a processing core could continue executing instructions without any lost time. But memory is expensive and the faster the memory, the more expensive it is. That is why computer systems have varying types of memories.
A cache memory is a small, very high speed memory that can provide instructions or data to a processing core so quickly that the processing core does not need to wait. Early implementations of cache memory would simply copy memory from a slower main memory in hopes that the same instruction would be needed again soon. This technique works fairly well when a computer program is intended to perform the same functions on a large data set. In this scenario, the same instructions, having been recently stored in the cache memory, could be quickly delivered to the processing core each time they are needed to execute the same function on different data.
It is not difficult to see that if a computer program prescribes a more linear series of functions, simply storing the most recently executed instructions into a cache memory is not very useful. This is because the most recently executed instructions are not likely to be executed soon, if ever again. In those applications where a linear program must be supported, one way that a cache memory can be effective is to pre-fetch instructions just after a currently executed instruction. This way, once a processing core executes a current instruction, it is likely that the next-needed instruction will be in the cache. When this happens, the next-needed instruction can be provided immediately once the processing core tries to fetch that next instruction.
Most computer systems use different levels of cache memories. For example, a “Level-0” cache is a cache memory that is located in the same silicon chip as the processing core. “Level-1” cache is closer to the processing core than a “Level-2” cache, and so on until main memory is reached. Cache memory that is disposed closer to the processing core is usually smaller in capacity than a cache memory set closer to the main memory. No matter what the overall structure of a cache memory system is, it can only be effective if it contains instructions that are next-needed by the processing core. Otherwise, the processing core is forced to wait until the next instruction is retrieved from main memory.
Pre-fetching instructions from main-memory is relatively simple, and an appropriate schema for doing so can be selected based on the type of computer program that is being executed. Repetitive tasks can rely on decent cache memory performance based on copying into the cache the most recently used instructions. Linear program paradigms typically use instruction pre-fetch so that next-needed instructions can be retrieved from main memory and stored into cache memory.
Cache memory can also be useful for data subject to a particular program. Some data is compact and is used repeatedly through a complex series of functions. Multiplying a matrix by another matrix is one mathematical function where the same set of data is used repeatedly. In such applications, the data is compact and repeatedly accessed by the processing core. In this case, storing the most recently used information in a cache memory can be very effective.