In the field of semiconductor processor chip fabrication, single-chip processors were fabricated by many companies during the early stages of processor technology. In the last decade or so, as Moore's Law has continued to shrink dimensions, many companies and other entities have designed processor chips with multiple processors on a single layer. However, as the number of processors per chip continues to increase, on chip communication between processors becomes problematic. For example, as the 2-D size of the processor chip increases to accommodate more processors, the length of the horizontal wiring between the processors increases (in the range of mm or cm) resulting in cycle delays in the communication between processors, and requiring the use of high-powered on-chip drivers along communication paths between processors. Furthermore, the cycle delay with respect to communication between processors increases as the operating frequency increases. In a multiprocessor system, each processor core can have one or more private lower-level caches, backed up with one or more levels of shared higher-level caches. The speed of accessing (access latency) shared data in a multiprocessor system depends on the length of the interconnect that a shared memory access request has to traverse, the time needed for broadcasting the request and receiving the responses from all the cores in the multiprocessor system, the time for identifying the location of the data (in remote cache or main memory) from where the data needs to be fetched based on the responses received, and the arbitration time for accessing any shared resources such as directories, shared buses and read/write ports during the process. Therefore, as the number of processors and shared cache memories per chip continues to increase in a 2-D system, the on-chip communications between processors for shared memory accesses in a shared cache scheme becomes more problematic.