The functions of a computer system are achieved by a processor executing instructions. Both the instructions and the data used in the execution of the instructions are stored in memory on the computer system. Accordingly, in order to execute the instructions, the processor must first obtain the instructions and the data from memory. However, the latency, or time delay, to access memory may be unduly long and adversely affect efficient processing.
To minimize latency, the typical computer system uses a memory hierarchy. Specifically, the memory hierarchy divides the storage of the instructions and data into components. For example, the components often include processor cache, main memory, and secondary storage. The higher a component is in the memory hierarchy, the closer the component is to the processor, the less instructions and/or data that is stored by the component, and the faster the access to the instructions and/or data stored in the component. Conversely, the lower the component is in the memory hierarchy, the more instructions and/or data that can be stored by the component, the farther the component is from the processor, and the slower the access to the data stored in the component. For example, the cache, which is generally located on the same chip as the processor, is higher than secondary storage in the memory hierarchy. Accordingly, the cache stores less instructions and/or data and is faster to access than secondary storage.
The typical cache is divided into lines. The lines of the cache are grouped into sets. Each set has disjoint groups of memory addresses corresponding to the set. Data elements with a memory address in the group map to the set. Accordingly, when a data element is stored in the cache, the data element is stored any line of the set mapped to by the memory address of the data element. Thus, the data element can only be stored in any line of a particular set. The data element may be an instruction or data accessed by a program.
When a processor executes an instruction that requires data, the instruction and data are obtained according to the memory hierarchy. Specifically, if the instruction and/or data is not in the cache, then a determination is made as to whether the instruction and/or data is in main memory. If the instruction and/or data is not in the main memory, then a determination is made as to whether the instruction and/or data is in secondary storage. If the instruction and/or data is in secondary storage, then the instruction and/or data is loaded into the main memory, and then loaded into the appropriate cache(s).
In order to further minimize the latency, instructions and/or data may be pre-fetched into the cache. Specifically, instructions and/or data are obtained from memory and stored into the cache before the instruction/data is needed by the processor.
For example, in a scenario in which data element A is stored next to data element B which is stored next to data element C, a compiler of a program may identify that data elements B and C are generally accessed in the program shortly after data element A is accessed in the program. Accordingly, the compiler may embed data pre-fetch instructions into the program after each instruction that uses data element A. The data pre-fetch instructions instruct the processor to pre-fetch data elements B and C. In general, the compiler cannot be certain that both data elements B and C are needed after every pre-fetch of data element A. As shown, the pre-fetch of data elements B and C are based on the observation that the access to data elements B and C generally follow an access to data element A. Instead of depending on a compiler to embed pre-fetch instructions in a program, a processor may also implement a hardware pre-fetcher. The hardware pre-fetcher attempts to recognize patterns in the sequence of instructions and/or data accessed by a program and bring in instructions and/or data into the processor's caches just before they are accessed by the program.
Pre-fetching data can add a latency cost to executing a program. As discussed above, the cache is smaller than main memory. Moreover, the storage in the cache typically is less than the amount of storage required by the program. Accordingly, when new instruction/data is pre-fetched, the new instruction/data typically replaces previously existing instruction/data in the cache. If the pre-fetch is inaccurate (e.g., the new instruction/data is not required at all and/or the previously existing instruction/data replaced by the new instruction/data is required before the new data is required), then an additional cost is incurred to reload the previously existing instruction/data back into the cache.