Various approaches exist to implement multi-processor systems, including, for example, traditional symmetric multi-processor (SMP) designs and cache coherent non-uniform memory access (ccNUMA) designs. ccNUMA systems generally are capable of scaling to higher processor counts since a ccNUMA system is not limited by a single shared resource. In a ccNUMA system, the processors have direct access to all the memory located anywhere in the system. Performance characteristics and scalability in the ccNUMA and other systems, however, tend to vary according to where the memory is allocated. This is because there is an increased performance cost associated with using global memory versus using local memory. The increased cost is particularly evident in large snoopy-based coherency systems, since overhead associated with snooping in large systems can overload the interconnect fabric and reduce performance.