1. Field of the Invention
This invention relates generally to the field of distributed-memory message-passing parallel computer design and system software, and more particularly, to a method and apparatus for supporting global interrupts and global barrier operations for multiple interconnected processing nodes of computer structures.
2. Discussion of the Prior Art
In the supercomputing arts, massively parallel computing structures interconnecting large numbers of processing nodes are generally architected as very regular structures, such as grids, lattices or toruses.
One particular problem commonly faced on such massively parallel systems is the efficient computation of a collective arithmetic or logical operation involving many nodes.
While the three-dimensional torus interconnect structure 10 shown in FIG. 1 which comprises a simple 3-dimensional nearest neighbor interconnect which is “wrapped” at the edges works well for most types of inter-processor communication, it does not perform as well for collective operations such as reductions, where a single result is computed from operands provided by each of the compute nodes 12.
It would thus be highly desirable to provide an ultra-scale supercomputing architecture that comprises a unique interconnection of processing nodes optimized for efficiently and reliably performing many classes of operations including those requiring global arithmetic operations, distribute data, synchronize, and share limited resources.
Moreover, on large parallel machines, it is useful to implement some kind of global notifications to signal a certain state to each node participating in a calculation. For example, if some error happens on a node, it would signal a global interrupt so that all other nodes know about it and the whole machine can go into an error recovery state. It is further useful to implement a global barrier to prevent operations in participating nodes until a certain status level for all processing nodes is attained.
It would thus be further desirable to provide a global interrupt and barrier network to have very low latency so that a whole computing structure of interconnected processing elements may return to synchronous operations quickly. The normal messaging passing of high-speed networks such as an interconnected torus are simply not fully suited for this purpose because of longer latency.