In data processing systems, the efficiency of the system is impacted directly by the number of data read requests, and by the manner in which requested data sets is provided. In typical systems, each compute element has its own associated cache memory. In alternative systems, two or some other limited number of compute elements may share a cache memory. However, in both types of systems, a system may have tens, hundreds, or thousands of compute elements, with corresponding numbers of cache memories. These cache memories in such a system are not coordinated, in the sense that one cache memory and the compute elements associated with that cache memory, do not know what is held in the other cache memories. As a result, when a compute element needs a data set, it must request and receive the data set from main memory, even if the data set was previously requested by another compute element and stored in local cache.