1. Field of the Invention
The present invention is related to multiprocessor computer systems, and more particularly to load balancing in communications within a multiprocessor computer systems.
2. Background Information
Often, in multiprocessor computer systems, there are multiple paths for the transfer of data between compute nodes. Given a distribution of packets being exchanged among processors on a network, some network links will typically carry more traffic than other network links. These “hot spots” can become saturated, causing network congestion that slows down the progress of packets traversing the bottlenecked links, and also causing backups in the network that can slow the progress of packets not routing through the bottlenecked links. The result is network performance degradation.
There are two primary techniques that have been used in the past to alleviate the network performance degradation caused by network hot spots: adaptive routing, and randomized oblivious routing. Adaptive routing techniques are discussed in the paper by Singh, A., Dally, W. J., Gupta, A. K., and Towles, B., “GOAL: a load-balanced adaptive routing algorithm for Torus networks, Proc. 30th Annual International Symposium on Computer Architecture”, June 2003, pp. 194-205. In addition, a comprehensive treatment of interconnection networks is given in the book by William J. Dally and Brian Towles, Principles and Practices of Interconnection Networks. Morgan Kaufmann, 2004.
Adaptive routing allows packets to dynamically choose among multiple allowable paths to reach their destination. Most adaptive routing mechanisms are minimal, meaning that packets only choose amongst paths of the same number of hops. At each routing step, packets may only take hops that take them closer to the destination. Non-minimal adaptive routing algorithms allow packets to take longer paths in order to avoid local congestion. Adaptive routing can be quite effective in reducing the severity of hot spots from non-uniform traffic distributions.
Randomized oblivious routing does not use local congestion information to avoid hot spots, but rather uses randomization to minimize non-uniformities in the traffic. Valiant's algorithm (L. G. Valiant, “A scheme for fast parallel communication,” SIAM Journal on Computing, 11(2):350-361, 19982), and the ROMM algorithm (T. Nesson and S. L. Johnsson, “ROMM routing on mesh and torus networks,” Proc. 7th Annual ACM Symposium on Parallel Algorithms and Architectures, pp 275-287, 1995) are examples of randomized oblivious routing. In both Valiant's algorithm and the ROMM algorithm, packets are first routed from the source node to a random intermediate node, and then to the destination node.
Valiant's algorithm is non-minimal, choosing any intermediate node in the network. While it does an excellent job of smoothing traffic in the network, it doubles the average traffic load in the network.
The ROMM algorithm is minimal, choosing only intermediate nodes that lie within the bounding box defined by the source and destination nodes. While it does not increase average traffic load in the network, it does not do as effective a job of removing hot spots.
A significant problem with both adaptive and randomized oblivious routing that both approaches can reorder packets in the network Two packets sent from the same source to the same destination may be delivered in the opposite order from which they were sent. This is particularly problematic for references to the same address in a shared memory machine, where references may be sent in program order and must not be re-ordered in the network. Certain coherence protocols and messaging protocols may also rely upon ordering of certain packets in the interconnect.
What is needed is a system and method for reducing non-uniform traffic distributions in computer system interconnects that preserves packet ordering when necessary.