1. Field of the Invention
The present invention generally relates to multiprocessor computer systems in which a plurality of processors and a plurality of memory modules are interconnected by means of an interconnection network consisting of two multistage switching networks. More specifically, the invention relates to the routing of data traffic from the processors to the memory modules through the interconnection network.
2. Description of the Prior Art
Buffered multistage interconnections are used for processor-to-memory and processor-to-processor communications in shared-memory multiprocessor systems. These networks typically consist of log.sub.k N stages of switching elements, each switching element having k input and output terminals. Messages sent through the network are usually in the form of fixed-size packets which represent a memory word or cache line. Buffers are provided in the individual switching chips for queuing the incoming packets in the event of contention.
The traffic carried by the interconnection network in a shared-memory multiprocessor consists of requests from processing elements to memory modules for storage or retrieval of data. The nature of this traffic can vary from time to time depending on the algorithm being executed by the processing elements and the way the accessed data are distributed among the memory modules. When the traffic pattern is such that all memory modules are accessed with approximately equal frequency, such traffic is referred to as uniform traffic. If a subset of memory modules receive a larger share of traffic than the remaining ones, such traffic is referred to as non-uniform traffic.
Buffered multistage networks perform well under uniform traffic but degrade severely with even slight non-uniformities in the network traffic. When the traffic is highly non-uniform, multistage networks suffer from a phenomenon known as tree-saturation. This phenomenon is studied in an article entitled "Hot-Spot Contention and Combining in Multistage Interconnection Neworks" by G. F. Pfister and V. A. Norton, IEEE Transactions on Computers, October 1985, pp. 943-948. The tree-saturation causes severe congestion in the network, increasing delay through the network substantially for all traffic to memory modules.
Several solutions have been proposed to alleviate the tree-saturation effect produced by non-uniform traffic on a multistage interconnection network. Pfister and Norton suggested combining of messages to reduce the request rate experienced by the hot memory module. Two messages can be combined in a switching element only if they are addressed to the same memory location in the hot memory module and they arrive at the switch almost at the same time.
Another method, described in an article entitled "The NYU Ultracomputer--Designing a MIMD Shared Memory Parallel Computer", by A. Gottleib et al., IEEE Transactions on Computers, February 1983, pp. 175-189, uses a special network, called a combining network. The combining network attempts to reduce congestion by combining access requests to a memory module enroute to the memory, thereby reducing the effective traffic into the memory module. The individual switching elements in a combining network are designed in such a way that two requests to a memory module arriving at a switching element simultaneously are combined into one outgoing request.
Although they reduce congestion from a hot spot, combining networks are complex to design and introduce additional delay owing to the combining function. Besides, the approach is useful for only one particular type of non-uniform traffic where the large fraction of traffic is directed to a single memory-location in a memory module. Combining is ineffective if the traffic to a memory module are not directed to a single location.
An alternate approach to reduce congestion is described in an article entitled "Distributing Hot-Spot Addressing in Large-Scale Multiprocessors", by P. -C. Yew, N. -F. Tzeng and D. H. Lawrie, IEEE Transactions on Computers, April 1987, pp. 388-395. This method is a software technique where the computational algorithm being run on the multiprocessor is structured so as to distribute the memory accesses among several memory modules. This method is useful only in the case of certain computations and in cases where the behavior of the program being executed on the multiprocessor can be predicted beforehand. This is difficult to achieve in a vast number of programs, where the memory-access behavior is unpredictable and difficult to control.
When two identical multistage switching network are used for processor-memory interconnection, every request from a processing element to a memory module can be routed along either network. Therefore, some method is needed to determine the path to be used for each request. A simple strategy is to divide the traffic equally among the two networks, perhaps best achieved by routing packets addressed to odd memory addresses to one network and the remaining packets to the second network. This strategy is referred to herein as the balanced strategy. The balanced strategy performs well with uniform traffic. However, a single hot spot causes tree-saturation in one of the network, affecting all traffic through that network. Multiple hot spots can saturate both networks. Under such situations, it is best to distribute the non-uniformities to one of the networks so that the second network provides a clear path for the rest of the traffic. If the network carrying the hot spot traffic happens to be a combining network, this improves performance of hot traffic as well. When no hot spots are present, however, the second strategy leaves one of the networks under-utilized. Therefore, what is needed is a scheme to detect hot spots when they develop and to change the routing strategy to improve performance.
If the location of the hot spots are known prior to run-time of a program, such dynamic detection of hot spots can be avoided. It may be possible to gather this information at compile time in the case of some scientific programs, but it is difficult in general. The presence of a local cache in each processor introduces an additional degree of complexity.
In practice, the traffic non-uniformities caused by memory-access patterns of programs change spatially and temporally during execution. That is, such non-uniformities appear and disappear over time, and their locations change. To exploit the flexibility afforded by the two switching networks fully, the routing method should be changed dynamically during program-execution, depending on the nature of the traffic.
Also known in the prior art the following U.S. patents which are related to interconnection networks for multiprocessor systems.
U.S. Pat. No. 4,512,011 to J. S. Turner discloses a packet-switching network comprising duplicated switch arrays to interconnect a number of high-speed digital trunk lines. A trunk controller associated with each trunk line routes packets to the switching network, selecting one of the arrays. If both arrays are available, the controller divides outgoing packets evenly among the arrays by alternating between them. If one of the arrays is unavailable, either because of a fault or because of traffic congestion, the controller routes traffic to the alternate one. A central processor monitors the condition of the switch-arrays and provides feedback to the trunk controllers.
While the present invention achieves similar function in a multiprocessor interconnection network, it does not use a central processor as in U.S. Pat. No. 4,512,011 and instead relies on a distributed feedback mechanism. Further, the present invention does not employ apparatus for monitoring traffic within the network and communicating this information to a central processor. Such functions, as well as the processing required by the central processor, would be prohibitively expensive in large multiprocessor systems. Therefore, the present invention places the monitoring function within the memory modules, and allows a simple design for the interconnection network.
U.S. Pat. No. 4,752,777 to P. A. Franaszek describes a dual-network interconnection system for processor-memory interconnection in a multiprocessor system. One of the networks is a multistage switching network while the other is a crosspoint switch. Traffic from any processing element to any memory module may be routed through either network, as in the present invention. However, no methods are disclosed or suggested for monitoring the system traffic and performing the routing based on the traffic conditions.
U.S. Pat. No. 4,621,359 to R. J. McMillen discloses a load balancing circuit for use within a switching node. This circuit distributes the incoming packets in the switching system evenly among the output ports. This method is applicable only if the system allows any packet to be routed to any output port without regard to its location. Such an assumption cannot be imposed in a multiprocessor system, where an access request issued by a processing element identifies a particular memory module and cannot be routed elsewhere.