1. Field of the Invention
Embodiments of the invention relates to fault tolerant data computing. Specifically, embodiments of the invention relates to parity bit generation in a data-packing device.
2. Background
The demand has increased dramatically for high performance server-class chip sets. Critical data typically has low tolerance for errors; in some scenarios, even a single bit error in the data may wreak havoc. For example, financial data stored on the servers of a financial institution may be extremely sensitive to errors as any bit error in a dollar amount may incur great financial loss to the institution and its clients.
Electronic devices are generally susceptible to single event upsets (SEUs). SEUs are caused by impinging alpha particles which temporarily invert one or more data bits in memory cells or logic. The error is not permanent in that the underlying hardware is normally not harmed. However, any uncorrected bit error may propagate along the data computation and transmission path, rendering the entire data sequence in error. Thus, protecting critical data against the SEUs is crucial in noisy environments like large arrays of processors, server farms, telecom sites, and non-lead lined rooms.
Most systems include fault tolerant logic to protect stored data against the SEUs. The fault tolerant logic typically adopts an industry-standard scheme to implement an error checking and correction code (ECC). The ECC encodes an error protected unit (EPU) and records the encoded data in redundant bits associated with the EPU. An EPU may be any unit of bits; for example, a word, a double-word (Dword), or a quad-word (Qword). The redundant bits allow data detection or recovery in the case of an error occurring to the associated EPU. The number of bits detectable or correctable by a given ECC is dependent on the number of redundant bits and the size of the EPU. For example, an industry-standard ECC using 8 redundant bits for each 64-bit Qword may detect a double-bit error or correct a single-bit error. Before an EPU is sent to a device requesting the EPU, error checking and correcting logic in the memory controller checks the integrity of the data and corrects any correctable error. If an uncorrectable error is detected, the error checking and correcting logic marks the EPU as corrupted before sending it to the requesting device.
The SEUs may also occur on transmission wires and logic that carry data from one device to another. However, sending an ECC-protected EPU throughout the transmission path can be costly as the redundant bits occupy non-negligible bandwidth. Thus, a parity bit may be used outside of the memory to substitute for the multiple ECC bits. A parity bit may be used to detect the presence of a single bit error in a data unit of any length, for example, a byte, a word, a Dword, or a Qword. However, a parity bit is unable to correct a single bit error because it cannot locate the error. Further, a parity bit is unable to detect the presence of a double-bit error in an EPU because the effects of the two bit errors on the parity cancel each other. The double-bit error typically occur by two consecutive SEUs each upsetting one bit. Unless the two bit-errors occur in the same data bit, which is statistically almost impossible, both bit errors will be masked by a valid parity bit. Thus, the data in the EPU will be mistakenly treated as good and the errors will propagate to the downstream logics.