FIG. 1 depicts a traditional CPU complex 101 and system memory complex 102 for a multi-core processor 100. The processor or “host” side of the system memory complex 102 includes a memory controller 103 that interfaces with a system memory 104. As is understood in the art, the individual processing cores 101_1 through 101_N of a multi-core processor will snoop their internal caches (not shown) for the program and data needed by their respective threads. If an item of program code or data desired by a processing core thread is not found in the core's cache, the program code or data item may ultimately be fetched from system memory 104 by the memory controller 103.
The processing cores 101_1 through 101_N are interconnected by an interconnection network 105 (e.g., mesh network, front side bus, etc.) that is coupled to a last level cache 106. A last level cache 106, typically, caches program code and data for all the cores 101_1 through 101_N of the processor 100 rather than any particular core. A last level cache 106 is typically the last cache that is snooped for a desired item of program code or data before fetching the item from system memory 104 through the memory controller 103.
Accurately predicting what items of data will be needed in the future, reading the data from system memory 104 and loading the data into the last level cache 106 can greatly improve system performance. Here, the time cost of retrieving the data is reduced to the time cost of obtaining it from the last level cache 106 which can be a significantly less than the time cost of retrieving it from system memory 104.
In one approach, referred to as “spatial memory streaming”, copies of the program counter values from the processing cores' various threads are sent through the processor 100 down to the last level cache 106. Prefetcher logic associated with the last level cache 106 studies the program counter values for data access patterns and any data that is deemed to be likely to be called upon by a software thread but is currently not within the last level cache 106 is called up from system memory 104 by the memory controller 103 and stored in the last level cache 106.
According to one approach, a large pattern history table structure is coupled with the prefetcher logic to store the data access patterns of the processor's various threads. Apart from the size of the table being very large and consuming large amounts of processor chip space, there can be aliasing problems when trying to access the table. Here, a hash of program counter and offset values are used to access the table. In the case of a large table structure, different program counter and offset values can hash to a same table entry (which corresponds to improper operation of the table itself).