1. Field of the Present Invention
The present invention is in the field of data processing systems and more particularly a distributed memory multiprocessor system with asymmetric latency.
2. History of Related Art
Multiprocessor data processing systems are well known. In perhaps the most widely implemented multiprocessor system, multiple processors share equal access to a common system memory over a shared bus. This type of system is generally referred to as a symmetric multiprocessor system (SMP) to emphasize that the memory latency is substantially independent of the processor. While symmetric memory latency is a desirable attribute, SMP systems are limited by the finite bandwidth of the shared bus connecting each of the processors to the system memory. This bandwidth bottleneck typically limits the number of processors that can be advantageously attached to the shared bus.
Attempts to overcome the limitations of SMP systems have resulted in distributed memory multiprocessor systems. In one implementation of such a system, each of a set of processors has its own local system memory and each of the processors is interconnected to the other processors so that each processor has “remote” access to the system memory of the other processors. Recently, one implementation of this type of configuration has employed the HyperTransport link between processors. The HyperTransport is a point-to-point interconnect technology that uses unidirectional, low voltage differential swing signaling on data and command signals to achieve high data rates. Additional descriptions of HyperTransport are available from the HyperTransport consortium (hypertransport.org).
As their name implies, point-to-point technologies require dedicated ports on each of the connected pair of devices. In a multiprocessor configuration this dedicated port requirement can quickly add pin count to the design. The narrowest implementation of HyperTransport, for example, requires 16 pins/link (plus power pins) while the widest (fastest) implementation requires 148 pins/link (plus power pins). Because it is undesirable to have large pin counts, the number of point-to-point links that any processor can accommodate is effectively limited. This limitation can have a negative impact on the performance benefits achievable when a design is scaled. Specifically, if the number of point-to-point link ports on each processor in the design is insufficient to connect each processor directly to each other processor, the memory access asymmetry increases because some memory accesses must traverse more than one point-to-point link. As a result, the memory access latency for these indirect memory accesses is higher. If a particular, memory-intensive application generates a disproportionate number of indirect memory accesses, the higher latency may result in lower overall performance. It would be desirable to implement a solution to the memory latency problem caused by indirect accesses in a distributed memory multiprocessor system employing point-to-point processor interconnects.