During the last years, clusters of PCs (or Networks of Workstations, NOWs) are being considered as a cost-effective alternative to small and medium scale parallel computing systems. The performance of clusters is closely related to the advances in the interconnection network field. Currently, there are many proposals for NOW interconnects like Myrinet, Servernet II, Gigabit Ethernet, InfiniBand, and PCI Express ASI that allow to build high-performance clusters.
As the number of components in the cluster increases, the probability of faults also increases. Moreover, the components (processors, switches, and links) are often used close to their technology limits which also increases the probability of experiencing a fault—for large computer network it is more likely that one or more of the network components are broken at any time than that all of them are up and running. For some environments, like high-performance computation and web servers, it is critical to keep the system running even in the presence of faults. Therefore, automatic routing and re-routing becomes very important.
Clusters are usually arranged with switch-based networks whose topology is defined by the customer. The layout of the network can be designed by using regular or irregular topologies. However, regular topologies are often used when performance is the primary concern. Preferred topologies are multistage networks. However, in the presence of some switch or link failures, a regular network will become an irregular one. In fact, most of the interconnects available (Myrinet, Quadrics, PCI Express ASI, Ethernet) to build custom-made clusters allow the use of an irregular topology.
A common property of these networks is that packets are not allowed to be dropped in the presence of congestion. Instead, packets are buffered and flow control mechanisms are used to prevent packet dropping. For this reason, these networks are referred as lossless networks. In lossless networks mechanisms for acknowledging and retransmitting packets are not necessary, thus, lower packet latencies are achieved. The drawback of lossless networks is, however, that they are prone to deadlocks. A deadlock may occur if the routing of packages includes a cyclic dependency, i.e. when a set of three or more nodes are connected by parts of three or more paths. As a simple example, illustrated in FIG. 1, a node A has a packet destined for node B, node B has a packet destined for node C, and node C has a packet destined for node A. The three nodes are thus waiting for each other, and are in deadlock.
In order to avoid deadlocks, the efficiency of the routing may be reduced. One way to avoid deadlocks while maintaining routing efficiency, is to divide a physical network into a plurality of virtual layers. This is illustrated schematically in FIG. 2, for the case of three virtual layers. Here, each node is assigned three addresses, and each physical link contains three different channels. The channels connect the nodes using the addresses in such a way that three separate identical layers are formed. Now, a packet can be sent from node A to node B in layer L1, a packet sent from node B to node C in layer L2, and a packet sent from node C to node A in layer L3. The deadlock is avoided.
It is noted that in principle, for unlocking one deadlock only two virtual layers are required. However, three layers are illustrated in FIG. 2, to indicate that typically a larger number of layers are required, to avoid a large number of potential deadlocks.
Another differential aspect is the computation cost of the routings. For instance, some of the routings are focused in achieving the best set of paths taking as a reference future traffic balance. As the number of possible routing paths among the same <source, destination> pair usually grows with system size, the computation time to achieve the best set of paths (one for each <source, destination> pair) may be too excessive for some critical scenarios (large topologies and real-time systems).
Additionally, as routing algorithms require different resources from the network, they may not be well suited for different technologies. For instance, Infiniband specifications allows up to 15 virtual channels for routing purposes but real implementations may not implement virtual channels at all. This means that routings schemes based on virtual layers may not be applied in these implementations.