The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for shared prefetching to reduce execution skew in multi-threaded systems.
Today, data processing system architecture is primarily moving to the multi-processor architectures in which multiple processors or cores, either one the same or different integrated circuit chips, are provided in data processing system to provide additional computational power. Subsets of processors/cores typically share some portion of memory, e.g., system memory, and thus, can all read and write to this shared memory. In some architectures, the processor/cores may further have their own local memories as well, such as in the Cell Broadband Engine (CBE) processor available from International Business Machines Corporation of Armonk, N.Y.
Managing memory bandwidth on shared memory multiprocessor data processing systems is an extremely important task. Memory bandwidth is the rate at which data can be read from or written to memory by a processor or from one memory to another, e.g., from system memory to cache or vice versa. Memory bandwidth, e.g., between caches and/or memory subsystems, is often a very critical resource. Moreover, as a data processing system becomes larger, e.g., through addition of hardware resources having additional processing capabilities, balancing the load between threads executing in the various processors of the data processing system becomes increasingly more important.