Processing systems that employ a shared memory often employ a coherence directory (also frequently referred to as a “probe filter”) to help maintain coherency among the caches of the multiple processing units sharing the memory. Some such processing systems have a particular processing unit, or subset of processing units, that is memory bandwidth intensive, and in such instances the memory controller, and thus the coherence directory associated with the memory controller, often is located close to this high-memory-bandwidth processing unit. To illustrate, in a system implementing one or more central processing units (CPUs) on separate die along with a graphics processing unit (GPU) and shared memory integrated in the same package, the coherence directory for the system typically will be integrated near a memory controller on the GPU due to the expected bandwidth-intensive use of the shared memory by the GPU relative to the CPUs. Although this conventional approach improves the memory bandwidth of the GPU, the CPUs that access the shared memory through the memory controller on the GPU suffer relatively long coherence directory access latencies and thus risk the potential for degraded performance by the CPUs.