System architectures that provide a solution by dividing a problem set into a number of smaller parts and processing the collection of parts in parallel at respective nodes have become increasingly common. Such distributed systems have come to represent an increasingly effective means of approaching complex problems due to a number of trends, including but not limited to improvements in parallel processing algorithms, the availability of low-cost power-efficient computing nodes, and the availability of high-bandwidth interconnect fabrics. Distributed system approaches are now used in a number of problem domains, including the processing of complex database queries, machine learning and other artificial intelligence algorithms, and so on.
Overall system performance in such distributed systems may depend upon a number of factors such as the number of processing nodes, each node's performance capability, the scalability of the parallel processing algorithms, the amount of node intercommunication required, and the performance of the interconnect fabric or fabrics to which the nodes are attached. Some applications nay require considerable communication between nodes (at least during certain phases of an application), making the interconnect fabric performance an important factor in overall performance. Given this, the architecture of the interconnect fabric may represent a major part of the total system design. Ideally, at least for some applications, distributed systems would employ an all-to-all interconnect such that each node has a dedicated link to every other node in the system. However, in practice, this type of implementation may not scale as the number of nodes increases. Power, cost, size, and hardware/software complexity constraints may tend to make the all-to-all interconnect approach infeasible.
Consequently, distributed systems containing hundreds or thousands of nodes may often employ a hierarchical interconnect architecture. In such an approach, some number of links in the fabric may have to handle traffic associated with multiple nodes. Such shared links may not be able to sustain the amount of traffic that could potentially be generated if all of the multiple nodes operated at their maximum traffic-generating capacity. In a well-balanced distributed system, the probability of overloading a shared link in this manner may typically be low, although such overload situations may nevertheless occur occasionally. Depending on the traffic management algorithms employed at the distributed system, data movement between application-layer components may be slowed substantially during such situations. As a result, overall application performance may be substantially degraded, especially in scenarios in which multiple nodes transmit traffic based on greedy approaches (e.g., approaches which consume as much bandwidth as permitted by the protocols in use).