1. Field of the Invention
This invention generally relates to the field of error recovery, and more particularly, the invention relates to a procedure that is very well suited for error recovery across long communication lines. Even more specifically, the invention relates to an error recovery protocol that is particularly well adapted for use with massively parallel computers used for various applications such as, for example, applications in the field of life sciences.
2. Background Art
FIG. 1 illustrates a pair of communication nodes A and B, each of which has a sender and a receiver. These two such nodes are connected with two cables. Each wire connects a sender/receiver pair. The cables may be long in the sense that a bit of data takes many clock cycles to traverse the cable. This type of hardware is encountered in many applications and most notably in massively parallel supercomputers.
Obviously, as with any communication channel, errors can occur during communication. Assuming that the receivers have the capability of detecting such errors, a protocol is needed in order to ensure that both nodes recover from the error correctly and resume communication without any data loss. If there are no extra sideband cables to communicate recovery signals, this is a difficult task since the original cables must be used. In doing so, one is exposed to errors in the recovery signals themselves. Although error recovery methods that solve this problem exist, they have the disadvantage that they do not put the system of two nodes into a known state and that they depend on time-out and specific data sequence methods.