The present application generally relates to a parallel computing system. More particularly, the present application relates to prefetching data to improve a performance of the parallel computing system.
Prefetching refers to a technique used in a processor to improve processor speed. Traditionally, prefetching places data in a cache memory before the data is needed. Thus, when the data is needed, the data can be provided to the processor more quickly because the data already resides in the cache memory before being requested.
Traditionally, prefetching data into a cache memory device is a standard method used in processor units to increase performance through a reduction of an average latency for a memory access instruction (e.g., load instruction, etc.). Typically, in a parallel computing system (e.g., IBM® Blue Gene®\L or Blue Gene®\P, etc.), a prefetch engine (i.e., a hardware module performing the prefetching) prefetches a fixed number of data stream with a fixed depth (i.e., a certain number of instructions or a certain amount data to be fetched ahead).
The present application discloses improving a performance of a parallel computing system, e.g., by prefetching data or instructions according to a list including a sequence of prior cache miss addresses (i.e., addresses that caused cache misses before).