The landscape of computer network infrastructure consists of a set of trade-offs between scalability, efficiency, throughput, and latency. The advancement of high performance computing (HPC) and data center interconnect fabrics over the past two decades has included two significant developments: (1) large, high-capacity networks based on cascaded electrical packet switches, and, (2) optical fiber transmission media; in particular, wavelength division multiplexing (WDM) is used to further increase the fiber bandwidth. The move to optical interconnect has been a strategy to deal with the frequency-dependent losses of electrical cabling while simultaneously system size has been growing and signaling rates have been increasing. The use of WDM further increases fiber data bandwidth by utilizing the spectrum of transmittance available to encode independent data channels on different wavelengths of light on the same fiber.
FIG. 1 illustrates the challenges facing computer networks relying upon electronic switches. The network 12 illustrated in FIG. 1 enables a number of nodes 4 to be interconnected. A “node,” consisting of a sub-system of one or more processing elements and memory elements, in the notional networks discussed here resides at a network endpoint. The network provides connectivity between the network endpoints. Furthermore, the networks described here could even be used as the interconnect fabric of the sub-system. In principle, the networks described here can be used to connect any (including possibly heterogeneous) elements of a computing system, e.g. processing elements with storage. In FIG. 1 the nodes 4 represent a set of computers. A communication path is needed between each node 4 and each remaining node 4. First, second and third levels of switches 6, 8, 10 are cascaded to allow communication between each of the nodes 4 and each reaming node 4. In some instances it is only necessary to traverse a single switch to provide communication between two nodes. For example, node 4a may communication with node 4b using only switch 6a. This is referred to as a “two-hop” connection, as two links are traversed in the communication. In other instances several switches must be traversed in order to provide communication between two nodes. For example, node 4a may communicate with node 4p by traversing switch 6a, switch 8a, switch 10a, switch 8d, and switch 6h. This connection is referred to as a 6 hop connection. This route is equivalent to the worst case minimal route for the system shown, and therefore the “diameter” of the network is equal to 6. The communication paths, or links, provided in the network 2 are shared by multiple nodes in the system. As a result, communication across the shared links must be arbitrated. As the networking system is scaled to accommodate a greater number of nodes, an increase in the network's latency occurs and the efficiency of the network suffers.
Latency in the network can be minimized by providing all-to-all connection between the nodes in the system. An all-to-all system allows every node to send a unique message to any other node at any time, unaffected by traffic or congestion in the network. A dedicated, switch-free communication path is provided from each node to every other node in the system. Because no switch is required, and no links are shared, resource arbitration of the communication link is not required. Such an arbitration-free network is the densest communication pattern that can be imposed on a computing network system. FIG. 2 illustrates a network 12 demonstrating all-to-all connection. As shown in FIG. 2, the networking system includes six nodes 14. An arbitration-free link is provided from each node 14 to every other node 14 in the system. Because a network having all-to-all connection eliminates concerns regarding blocking or the need for arbitration, these all-to-all networks particularly benefit communication-bound parallel HPC applications when used as the inter-node interconnection network within the HPC system. This type of all-to-all connection is not utilized in a network having a more than approximately 16 nodes, for example, because the interconnection wiring requirements are difficult to implement on a large scale. Specifically, the number of links in such a network is equal to (N)(N−1) if the links provided are unidirectional, or (N)(N−1)/2 if the links provided are bidirectional. As the number of nodes is increased linearly, the number of links in the system increases exponentially as N2. When the number of nodes provided is large (e.g. N>16) the number of wires is impractical due to the costs of the wiring, the weight of the wiring, etc.
Another way in which all-to-all communication has been achieved is with an arrayed wave guide grating router (AWGR). An example of an AWGR 16 is illustrated in FIG. 3a. In this example, the AWGR 16 includes five input ports and five output ports (i.e. a port count of 5). Therefore, the size of the AWGR 16, defined as k×k, is 5×5. The number of ports, k, in the AWGR in principle is limited only by fabrication accuracy and inter-wavelength crosstalk. Each input of the AWGR takes data from the transmitters of a node in the network and each output provides data to the receivers of a node in the network. The signals received at the input ports are optical signals modulated on k different wavelengths. The AWGR performs a static permutation, routing the signals received at each input port to the output ports, such that each of the k signals received at a single input port are distributed to a different of each of the five output pots; signals are routed, not replicated or split and fanned out. Thus, the AWGR 16 provides arbitration-free all-to-all connection among N nodes where N≦k (five nodes in FIG. 3). The AWGR 16 achieves the arbitration-free all-to-all connection by utilizing optical signals having different wavelengths of light. For a contention-free and arbitration-free all-to-all, the number of different wavelengths of light needed is equal to the number of nodes in the system. Thus a five node system (N=5) requires a 5×5 sized AWGR (i.e. k=N) and optical signals of five different wavelengths (W=5). Thus, the AWGR 16 is a W=N AWGR.
FIG. 3b illustrates the well-known wavelength routing properties of AWGR 16. The signals provided to each input port of the AWGR 16 are represented in the rows of the table and the signals provided at each output port of the AWGR 16 are represented in the columns of the table. When a system requires a greater number of nodes, the AWGR 16 must be scaled to accommodate the additional nodes. The size of the AWGR 16 must be increased so that an input port and an output port can be provided for each node. In addition, because the number of wavelengths required is equal to the number of nodes (W=N; i.e., the number of wavelengths scales linearly with the number of nodes), additional optical wavelengths must be provided on each input and output port. Difficulties arise, however, when scaling the AWGR 16 to accommodate these additional ports and wavelengths. Specifically, the optical signal band width is limited (i.e. the bandwidth for these communication signals generally ranges from 1310 nm to 1600 nm). An increase in the number of wavelengths (W) routed by the AWGR, results in a reduction to the channel spacing as additional channels must fit within (approximately) the same free spectral range (FSR). With channel spacing reduced, higher precision is required during the AWGR fabrication process. In addition, during use of the AWGR, the reduced channel spacing requires greater accuracy wavelength registration. i.e, the wavelength used by the transmitter, the receiver and the AWGR must be very closely matched. As the wavelength spacing is reduced, achieving the required registration accuracy becomes increasingly difficult. This is particularly true when considering temperature fluctuations which cause the wavelengths to drift.
Although it is possible to provide a system utilizing a W=N AWGR for a 512 node system, the fabrication and implementation of a 512 port AWGR is not practical. Fabrication of a 512 port AWGR presents difficulty from the standpoint of size and the fabrication of 512 input ports and 512 output ports. In addition, the channel spacing requirements for achieving optical signals having 512 different wavelengths is also challenging. The high density channel requirements lead to significant increases of coherent (in-band) and incoherent (out-of-band) crosstalk. This crosstalk significantly impairs the performance of the W=N AWGR as an all-to-all interconnection because of its negative impact on bit error rate (BER).
Another difficulty with utilizing a W=N AWGR to provide an all-to-all network is that such a network would require 5122 (N2) transceivers. Each of the N transceivers associated with a node must be supplied with a unique wavelength of light onto which it will modulate its data; therefore each node requires N unique wavelengths. Thus, scaling of the W=N AWGR network to 512 nodes for use in a data center network or an HPC networks, for example, is unrealistic because the channel spacing required to accommodate 512 different wavelengths of light is not realizable.
Yet another difficulty with utilizing a W=N AWGR network configuration is that it requires the use of N lasers to provide signals having W different wavelengths. Use of a laser to generate optical signals results in the formation of heat within the system. The greater the number of lasers utilized the greater the amount of heat generated. Because wavelength registration is affected by fluctuations in temperature, temperature controls are often imposed on these optical systems and may limit the ability to scale the N=W AWGR network configuration.
Thus, a network is needed which provides arbitration-free, all-to-all connection which can be scaled to accommodate an N large enough to be relevant to high performance computing and data center networks.