1. Technical Field
The present application relates generally to an improved data processing system and method and more specifically the present application is directed to providing efficient, contention free, pipelined broadcasting within an interconnection network implementing a constant bisection bandwidth (CBB) network topology.
2. Description of Related Art
To improve network performance, high performance parallel computing environments have been developed which often include one or more cluster systems, with each cluster system connecting many nodes by one or more interconnection networks. Nodes may include one or more processors, one or more I/O devices, memory, and other components. As cluster systems continue to add additional nodes and other components, the communication latency and bandwidth requirements for communications within the cluster also increases.
To manage and reduce communication latency and bandwidth requirements within a parallel computing environment, one or more switches may be implemented to connect the nodes in an interconnection network. For example, a parallel computing environment may implement a crossbar switch to connect multiple nodes, where the crossbar switch provides full bandwidth and uniform latency between any pair of nodes the crossbar switch connects.
In one example of an interconnection network implements a fully-connected network topology with one or more crossbar switches sufficient to provide a dedicated link between any pair of nodes, such that for “N” number of nodes, a crossbar switch with N×N ports is needed. In implementing a fully-connected network topology, as the number of nodes increases, the number of ports required also increases, and a crossbar switch of N×N ports may become impractical.
Within an interconnection network, one option for avoiding the requirement of an N×N port crossbar switch to connect nodes is by implementing multiple levels of crossbar switches connected hierarchically. A constant bisection bandwidth (CBB) network topology, also known as “fat tree”, is one example of a network topology that implements multiple levels of crossbar switches connected hierarchically and also reduces the number of switches required to connect N nodes. In one example, the CBB network topology may implement multiple layers of crossbar switches to connect nodes within a cluster system by effectively dividing the group of nodes into two equal subgroups, with each node connected to one switch in a first layer, and the first layer of switches interconnected through a second layer of switches, such that through the second layer of switches there is a shared link between any pair of nodes not sharing a same first layer crossbar switch.
While a CBB network topology reduces the number of switches required to connect any pair of nodes, sharing links between multiple nodes introduces the possibility of contentions that may occur when multiple requests to send from multiple nodes simultaneously arrive at a send channel for a switch for a shared link. Managing contentions increases data latency at crossbar switches. Within a parallel computing environment, the effects of increased data latency from contentions and inefficient use of available bandwidth by crossbar switches may increase if a parallel application broadcasts a large amount of data to all the nodes using a pipelined approach to break the data into chunks.