Networked systems transmit data between distributed nodes via some form of network connection. Unfortunately, these network connections can be unreliable and sometimes introduce errors to the transmitted data. Network connections typically comprise physical links which form data channels. Link failures constitute a significant portion of data transmission errors in networked systems and are typically of three types: (1) transient, (2) intermittent, and (3) permanent. Transient failures are due to various effects such as noise, alpha particle hits, and vibration. Intermittent and permanent failures may be caused by one or more faulty wires in the link, a loose link connection, or some other physical link failure.
In order to overcome these link failures, networked systems typically utilize some type of error correcting code (ECC) technique. ECC techniques can correct certain errors that occur during data transmission over a network connection. Many types of ECC techniques are available, and one commonly employed ECC technique is a single error correction code utilizing a Hamming code.
In one traditional single error correcting code technique utilizing a Hamming code, a sending node employs a first logical function to calculate parity bits from the data bits and appends the parity bits to the data bits prior to transmission. The data bits and their associated parity bits are then transmitted in a predetermined configuration to a receiving node via the network connection. The receiving node then employs the first logical function to calculate the parity bits from the received data bits and compares the calculated parity bits to the received parity bits. A mismatch between one of the calculated parity bits and its corresponding received parity bit indicates a data bit error, and the receiving node can identify and correct the data bit that is in error.
Unfortunately, as its name implies, the error correcting capability of this single error correction code technique is limited to situations where only one bit of a transmitted data block is in error. While this ECC technique can detect the presence of multi-bit errors, it cannot correct them, and events creating such link failures will cause a networked system to malfunction. Thus, where two or more links of a network connection fail, a networked system utilizing a single error correcting code scheme will typically be able to detect the error but not correct it, which typically results in a system malfunction.
Many networked systems would benefit from a transmission scheme that provides continued reliable data transmission in the presence of multiple-bit link failures.