1. Field of the Invention
This invention relates to the field distributed-memory message-passing parallel computer design and system software, as applied for example to computation in the field of life sciences. More specifically, it relates to system software libraries supporting global communications used by many parallel applications.
2. Background Art
Provisional patent application No. 60/271,124, titled “A Massively Parallel SuperComputer” describes a computer comprised of many computing nodes and a smaller number of I/O nodes. These nodes are connected by several networks. In particular, a dual-functional tree network that supports integer combining operations, such as integer maximums and minimums.
In parallel supercomputers, there is an enormous amount of data sent from one processor to another in the form of messages. These messages are protected from corruption by standard techniques such as parity checking, crc checking, etc where the parity of the data, or crc of the data, is included in the data sent from one processor to another, calculated by the receiving processor, and compared. However, if there is a failure in the logic used to compute the compare, or double error faults escape parity detection, etc, some message errors may propagate through the machine without detection. In the machine described in the above-mentioned provisional patent application No. 60/271,124, there are approximately 64,000 processors with about 1,000,000 links sending 256 bytes each microsecond. Even with an error rate as low as 1 in 10–15, which is an extremely low bit error rate, there would be 1 error per second. What is needed is a simple means to insure that no error has occurred in data transmission between sender and receiver, without additional computation.