1. Field of the Invention
This invention relates generally to the field of computer processors. More particularly, the invention relates to a collective communication apparatus and method for parallel systems.
2. Description of the Related Art
Collective operations are a common and critical operation in parallel applications. Examples of collectives include, but are not limited to: reductions, all-reductions (reduce-2-all), broadcasts, barriers, and parallel prefix operations. When collectives are implemented purely in software, they suffer from significant slowdowns, serialization, inefficiency of data movement, and require a large number of instructions to execute. When collectives are implemented solely in hardware, the formation, creation, and disbanding are substantially limited to what the hardware designer allows.
Collective operations require many small messages and often require barriers after setting up participating cores and before the first operation can take place, these barriers can limit overall application performance. Such protocols can be so onerous that some software is written to avoid these operations altogether—even changing algorithms to do so. However, collectives represent an abstraction that is natural to the programmer and that can provide an efficient means to communicate amongst a pool of processors.
Another aspect of collectives is that even with hardware support, they are slow compared to local computation, and introduce delays due to load imbalance and communication latency. As such, avoiding explicit waits and barriers across the collective are keys to efficiency.