This invention relates to transferring data across computer, communications, or storage device buses, and particularly to protecting the data by means of a nested error correcting code (ECC) scheme.
In the past, it was very common for computer systems to use wide parallel buses with many bits or bitlanes in a parallel configuration. These buses would deliver a dataword from a source to a receiver in one transfer. Thus, for example, a commonly used bus would deliver 64 databits to its destination every transfer cycle. Such a bus could be found both on-chip, on-module, and on-board. Also in the past, it was very common for communications systems to use a narrow, single wire bus with only one bitlane used per bus. These buses would deliver their dataword from a single source to a single (or multiple) receivers over many transfer cycles, i.e., one bit after another would be sent down the bitlane until the entire payload or dataword was delivered.
In order to insure that the data arrives safely at the receiver, some kind of error checking or correcting on the bus may be employed. In high-reliability computers, the parallel buses are typically protected with an ECC. In high-reliability communications links, cyclical redundancy checking (CRC) is often employed. Generally speaking, ECC is usually used to provide “real-time” correction of a bad databit(s), and CRC is usually used to provide “real-time” detection of a bad databit(s). In the ECC scheme, the data is manipulated by the logic of the ECC to adjust the data received by the receiver such that “good” data will be passed along downstream. In the CRC scheme, the data source is required to resend the bad dataword when signaled by the CRC that bad data was received. In such systems, ECC tends to be more effective when the nature of the errors is permanent (e.g., hard errors), and CRC tends to be more effective when the nature of the errors is transient (e.g., soft errors).
In future electronic systems, the traditional boundaries between computers and communication systems is blurring. Data is often transferred along a parallel, high-speed bus over several transfer cycles. This scheme provides very high bandwidth, but it also makes it necessary to deal with both hard and soft errors. Hard errors occur when the physical medium experiences a fault, such as a burned-out driver. Soft errors occur when noise, skew and jitter flip a bit along a single bitlane. It would be desirable to have a fault-tolerant high-speed parallel bus that is resilient to both hard and soft errors.
The industry is moving in the direction of using CRC across the multiple bitlanes of a high-speed, parallel bus that signals for a retry whenever an error is present. These schemes have strong error detection, which is effective for soft errors, they but cannot correct an error, which makes them less useful for hard errors. In systems where hard error protection is necessary, an extension to the CRC has been proposed which includes a spare bitlane in the bus such that when a hard error is encountered, the bus will re-configure itself to replace the failing bitlane with the presumably-good spare bitlane. Another alternative to provide protection for both hard and soft errors is a symbol-protecting bus ECC structure, where the symbols are defined along the bitlanes, rather than the traditional, across-word structure. This has been described in United States Patent Publication No. US20060107175A1, of common assignment herewith, filed Oct. 29, 2004, entitled: “System, Method and Storage Medium for Providing Fault Detection and Correction in a Memory Subsystem.”
Finally, while the previously disclosed base structure provides advantages over the CRC/spare approach, it is not always an obvious or non-trivial task to create an ECC that meets the needs of the system. One such need that is becoming more and more frequent is the case where one ECC word is sent across a bus using a second, different, nested ECC scheme for protection on the bus. For example, data stored in memory may best be served by a Single Error Correcting (SEC) and Double Error Detecting (DED) code, often shortened to “SEC/DED.” However, if this ECC word is sent across a high-speed parallel bus in two transfers, a different code is required to protect against bitlane failures. Thus for the bus transfer, a single 2-bit-symbol error correcting and double 2-bit-symbol error detecting (S2EC/D2ED) code is appropriate, where the symbols are aligned along the bitlanes. However, the construction of such a nested code is neither obvious nor non-trivial, especially for the 2-bit-symbol case. It would be desirable to have a scheme to generate such nested, 2-bit-symbol codes, which maintain and/or reuse part of the original SEC/DED code.