Provisional patent application No. 60/271,124, titled “A Novel Massively Parallel SuperComputer” describes a computer comprised of many computing nodes and a smaller number of I/O nodes. These nodes are connected by several networks. In particular, these nodes are interconnected by both a torus network and by a dual functional tree network. This torus network may be used in a number of ways to improve the efficiency of the computer.
To elaborate, on a machine which has a large enough number of nodes and with a network that has the connectivity of an M-dimensional torus, the usual way to do a global operation is by the means of shift and operate. For example, to do a global sum (MPI_SUM) over all nodes, after each computer node has done its own local partial sum, each node first sends the local sum to its plus neighbor along one dimension and then adds the number it itself received from its neighbor to its own sum. Second, it passes the number it received from its minus neighbor to its plus neighbor, and again adds the number it receives to its own sum. Repeating the second step (N−1) times (where N is the number of nodes along this one dimension) followed by repeating the whole sequence over all dimensions one at a time, yields the desired results on all nodes. However, for floating point numbers, because the order of the floating point sums performed at each node is different, each node will end up with a slightly different result because of roundoff effects due to the fact that the order of the floating point sums performed at each node is different. This will cause a problem if some global decision is to be made which depends on the value of the global sum. In many cases, this problem is avoided by picking a special node which will first gather data from all the other nodes, do the whole computation and then broadcast the sum to all nodes. However, when the number of nodes is sufficiently large, this method is slower than the shift and operate method.
In addition, as indicated above, in the computer disclosed in provisional patent application No. 60/271,124, the nodes are also connected by a dual-functional tree network that supports integer combining operations, such as integer sums and integer maximums (max) and minimums (min). The existence of a global combining network opens up possibilities to efficiently implement global arithmetic operations over this network. For example, adding up floating point numbers from each of the computing nodes, and broadcasting the sum to all participating nodes. On a regular parallel supercomputer, these kinds of operations are usually done over the network that carries the normal message-passing traffic. There is usually high latency associated with such kinds of global operations.