The present invention relates generally to packet switching nodes employed in multi-processor and parallel computer systems, and the like, and more particularly to a load balancing circuit arrangement for use in such packet switching nodes which redistributes incoming data packets to its output ports for more efficient processing.
One developing area of computer technology involves the design and development of large-scale, multi-processor-based distributed and parallel computer systems. Typical of these classes of computer systems and architectural approaches are the single instruction stream, multiple data stream (SIMD) computer architecture and the multiple instruction stream, multiple data stream (MIMD) computer architecture.
A SIMD computer typically comprises a control unit, N processors, N memory modules and an interconnection network. The control unit broadcasts instructions to all of the processors, and all active processors execute the same instruction at the same time. Each active processor executes the instruction on data in its own associated memory module. The interconnection network provides a communications facility for the processors and memory modules.
A MIMD computer typically comprises N processors and N memories, and each processor can execute an independent instruction stream. Each of the processors may communicate to any other processor. Similar interconnection networks may be employed in the MIMD computer.
Various interconnection networks may be employed to interconnect processors and memories employed in either type of computer system. These interconnection networks include delta networks, omega networks, indirect binary n-cube networks, flip networks, cube networks and banyan networks, for example.
The above-cited networks are discused in some detail in the following publications: "LSI implementation of modular interconnection networks for MIMD machines," 1980 Int'l. Conf. Parallel Processing. Aug. 1980, pp. 161-162; "Analysis and simulation of buffered delta networks," IEEE Trans. Computers, Vol. C-30, pp. 273-282, April 1981; "Processor-memory interconnections for multiprocessors," 6th Annual Int'l. Symp. Computer Architecture, April 1979, pp. 168-177; "Design and implementation of the banyan interconnection network in TRAC," AFIPS 1980 Nat'l. Computer Conf., June 1980, pp. 643-653; "The multistage cube: a versatile interconnection network," Computer, Vol. 14, pp. 65-76, Dec. 1981; "The hybrid cube network," Distributed Data Acquisition, Computing and Control Symp., Dec. 1980, pp. 11-22; and "Performance and implementation of 4.times.4 switching nodes in an interconnection network for PASM," 1981 Int'l Conf. on Parallel Processing. Aug. 1981, pp. 229-233.
Several types of data switching techniques may be employed to transfer data in SIMD and MIMD computers, and the like, including packet switching, message switching, time-division circuit switching or space-division circuit switching. Packet switching involves sending one or more words of data at time through the system.
A multiple queue packet switching node is described in a presently copending patent application entitled "Packet Switched Multiple Queue NxM Switch Node and Processing Method," invented by R. J. McMillen, and assigned to the assignee of the present invention. This patent application discloses a packet switching node which processes applied data packets containing routing tag signals indicative of the output port destination to which the data packets are to be applied and transfers these packets to those output ports.
The packet switching node comprises a plurality of input ports and a plurality of output ports. A plurality of queue selectors are individually coupled to corresponding ones of the plurality of input ports. Each of the plurality of queue selectors are adapted to route data packets applied to each of the input ports in accordance with the output port destination of the data packets.
A plurality of queue sets are individually coupled to corresponding ones of the plurality of queue selectors. Each of the queue sets comprise a plurality of queues for storing and forwarding data packets applied thereto as a function of output port destination. A plurality of output arbitrators are individually coupled between corresponding ones of the plurality of output ports and the respective queue of each of the queue sets which store and forward data packets whose destinations are the corresponding output port. The output arbitrators are adapted to transfer the data packets stored in the queues to the corresponding output port in accordance with a predetermined priority arbitration scheme.
Another related presently co-pending patent application is entitled "Packet Switched Multiport Memory N.times.M Switch Node and Processing Method," invented by R. J. McMillen and A. Rosman, and assigned to the assignee of the present invention. This packet switching node processes applied data packets containing routing tag signals indicative of the output port destination to which the data packets are to be applied. The packet switching node comprises a plurality of input ports and output ports with a multiport memory coupled therebetween. The memory has a predetermined number of memory locations available for storage of data packets applied to each of the input ports. Control logic is coupled to the input and output ports and the multiport memory which controls the storage of data packets in the memory. The control logic also controls the routing of the data packets to the output ports in accordance with the routing tag signals.
In general, this invention comprises an N.times.M switch node that accepts data packets at any of N input ports and routes each to any of M output ports. The output selected is determined by the routing tag signal in the packet. The control logic is designed so that the data packets are effectively sorted according to their desired output port destination. Arbitration logic randomly, in a statistical sense, chooses among any data packets that are directed to the same output port. The algorithm implemented by the arbitration logic is designed so that data packets will not wait indefinitely to be routed from the switch node.
However, although both of these nodes improve upon the performance and throughput of conventional packet switching designs, their performance can be improved. An example will illustrate a typical problem. Assume that processor 1 applies data packets to input port 1. Assume that all output ports of the node are connected to identical execution units. Processor 1 assigns routing tag signals to each of the data packets transmitted thereby. If processor 1 continually assigns routing tag signals corresponding to output port 1, for example, then the remaining output ports are not used, and the execution units connected thereto are not used. This clearly lessens the efficiency of the node.