Two of the most important types of data movement for a parallel processing system are the scatter exchange and the gather exchange. A scatter exchange moves data from one or more locations specified by the user in a single direction away from the starting location(s) such that dispersed computational elements in the system gain access to the data. This is analogous to the scattering of seeds across a field to ensure that the widest seed dispersal is accomplished. A gather exchange is the opposite, wherein data is gathered from the dispersed computational elements and sent back to a user defined location or locations.
The two standard methods of performing a scatter exchange are the true broadcast and the tree broadcast. The true broadcast transmits the data all at the same time from some central location to all computational elements in a group. The problem with a true broadcast is that in order to provide reliable data transmission it must use an error correcting code (the simplest being a single dimensional checksum, which is subject to double bit errors) of sufficient length, given the channel fault rate. Alternatively, if no error detection/correction is employed, data transmission errors will inevitably occur at some point.
Error correction cannot guarantee that the transmitted data is correct, rather only that it is ‘statistically’ correct; thus additional data must be transmitted, effectively degrading the performance of the communication channel. For example, a Reed-Solomon ECC adds 8 percent overhead to a code while being able to handle up to a 4 percent data error rate (e.g., standard 188 data bytes transmitted plus 16 redundant bytes).
True broadcasts cannot use a faster method which employs a bi-directional communication channel that moves the first from the sender to the receiver and from the receiver back to the sender (which insures that the data is correct). This is because there are multiple receivers and only one sender, greatly increasing the safe data transmission time, thus eliminating the advantage of broadcasting the data.
Because of these issues, modern parallel computer systems typically use a tree broadcast, in which data is sent from one computational element to another using a binomial tree arrangement of computational elements. This binary tree solution allows a series of pair-wise exchanges rather than a single broadcast, making it possible to have safe data transmission. Instead of taking (dataset size)/(transmission time) time, as is the case with a true broadcast, a tree broadcast instead takes [(dataset size)*Ig2(number of computational elements)]/(transmission time) time.
The approach, described herein, of using a mathematical forest (multiple parallel trees) consisting of binomial (or other function) trees has the advantage of a safe broadcast with a minimum performance that is twice that of the industry standard tree-broadcast. Gather exchanges always use a tree-broadcast model, but in reverse. The present approach is equally advantageous for a gather exchange.