An important feature of network communication is data integrity. Under Ethernet, for example, this is accomplished using a 32-bit Cyclic Redundancy Check (CRC32) field that is added to each Ethernet MAC (Media Access Control) frame. The CRC provides full protection against many types of errors, including up to 3 bit errors in a normal-size MAC frame and bursts of consecutive errors up to 32 bits long. Other combinations of errors may pass the CRC32 check with a small probability (up to 2^-32 for random error distribution).
If multiple errors occur on an Ethernet link, the MAC frame could pass the CRC32 check; this event is called false packet acceptance, and ideally it should never occur. For example, the data for a MAC frame could be received with multiple errors that by random chance produce the same CRC32 value as a MAC frame with no errors. In practice, communication errors can't be totally prevented; the desire is that false packet acceptance would be so rare that the time until one is expected to happen (mean time to false packet acceptance, or MTTFPA) is larger than the age of the universe (AOU—about 13 Billion years).
Several physical layer (PHY) types for Ethernet over backplanes, Optics, and copper cables, at 10 Gb/s data rates and above are defined in various clauses of the IEEE 802.3 standard. The bit error ratio (BER) required for these PHYs is typically 1e-12. With this BER, if errors are uncorrelated to each other, the probability that enough errors occur to prevent CRC32 from detecting them is low enough to ensure MTTFPA>AOU. If errors occur at a much higher rate (BER>>1e-12), then MTTFPA may not be as large as desired. To prevent this condition from existing too long, there is a mechanism called BER monitor that, is sensitive to errors at known locations (sync headers). If errors occur at random times, some of them will eventually occur at the sync headers. Detecting too frequent sync header errors (a condition called hi_ber) will cause a receiver fault condition, which will in some cases trigger a disconnection of the communication link, or optionally may cause transmission of data over the link to be temporarily paused.
The 802.3bj task force defines 100 Gb/s Ethernet over backplanes and copper cables. This work includes strong forward error correction (FEC) using Reed-Solomon (RS) code, which enables operation with lower signal to noise ratio (SNR) than unprotected data encoding. This code (denoted RS-FEC) operates over 10-bit blocks (called “symbols”) and can correct several symbol-errors in a block of 514 symbols; for two PHY types (100GBASE-KR4 and 100GBASE-CR4), up to 7 symbol-errors are correctable, and for a third type (100GBASE-KP4) up to 15 symbol-errors are correctable.
If link quality is not high enough, events of too many symbol errors can occur; these errors can't be corrected. If the erroneous data is passed to the MAC, the CRC32 is not guaranteed to detect the errors, since there are too many of them. To prevent this from happening, the FEC decoder is required to mark uncorrectable codewords in a way that would cause the MAC to ignore them; this marking is done by corrupting the sync headers in the data output of the RS-FEC sublayer.
One problem with this approach is that implementation of uncorrectable error marking requires identifying that a codeword is uncorrectable; there are ways to implement that with low gate count (but with high latency), or with low latency (but with high gate count); both cannot be achieved together. Thus, requiring uncorrectable error marking poses a tradeoff that must be made at design time.
It would be advantageous if this requirement could be removed; however, when errors are not marked, the problem of MTTFPA arises. Due to the nature of the error correction code, small changes in SNR, which have a small effect on the BER before error correction (PMD BER), have a large effect on the uncorrectable codeword ratio (UCR) and thus on MTTFPA. In fact, a difference of 1 dB in SNR can change the MTTFPA from being >AOU to a few thousand years, which is unacceptable. Thus, it is difficult to assess whether the link is safe or not.