The technical field is computer and networking systems that implement error correcting code schemes.
Modern computer systems use various interconnection mechanisms to allow communications between various components of the computer system. In a multi-computer system, central processing units or the interconnect chipsets may communicate with one another through various defined transactions such as a fetch request, a data return, and a snoop request, for example. Transactions may be sent in each interconnect using a protocol format defined by the specification for that interconnect. Such a transaction may include one or more packets. Different transactions may need different packet lengths. For example, a number of packets required to send a fetch request may be less than a number of packets required to send a cache line data return. A packet is the basic unit of data transmission and includes a number of cycles of data transfer in the interconnect structure.
Most interconnect structures provide a form of error detection and/or correction. An error correcting code (ECC) and associated circuit gives the computer system the ability to tolerate various anticipated errors and to provide a high degree of reliability during data transmission. One approach to implementing an ECC is to provide the ECC at the packet level such that each packet is independently protected by the underlying ECC for anticipated failures.
Error correction codes have been developed that both detect and correct certain errors. One well known class of ECC algorithm is the xe2x80x9cHamming codes,xe2x80x9d which are widely used for error detection and correction in digital communications data storage systems. The SEC-DED Hamming code is capable of detecting double bit errors and correcting single bit errors. A detailed description of the Hamming codes is found in Shu Lin et al., xe2x80x9cError Control Coding, Fundamentals and Applications,xe2x80x9d Chapter 3 (1982). Another well known ECC algorithm is the xe2x80x9cReed-Solomon codexe2x80x9d widely used for error correction in the compact disk industry. A detailed description of this ECC algorithm is found in Hove et al., xe2x80x9cError Correction and Concealment in the Compact Disk System,xe2x80x9d Philips Technical Review, Vol. 40, No. 6, pp. 166-172 (1980). The Reed-Solomon code is able to correct multiple errors per word. Other conventional ECC algorithms include the b-adjacent error correction code described in D. C. Bossen, xe2x80x9cB-Adjacent Error Correction,xe2x80x9d IBM J. Res. Develop., pp. 402-408 (July 1970), and the odd weight column codes described in M. Y. Hsiao, xe2x80x9cA Class of Optimal Minimal Odd Weight Column SEC-DED Codes,xe2x80x9d IBM J. Res. Develop., pp. 395-400 (July 1970). The Hsiao codes, like the Hamming codes, are capable of detecting double bit errors and correcting single bit errors. The Hsiao codes use the same number of check bits as the Hamming codes (e.g., 8 check bits for 64 bits of data), but are superior in that hardware implementation is simplified and speed of error detection is improved.
Use of an ECC imposes an overhead on each transaction. The extra overhead required to implement the ECC reduces bandwidth available for data transmission and other functions.
A method and an apparatus are used to maximize available transmission bandwidth by using multiple error correcting code (ECC) schemes. A transaction between components in an interconnected computer system may involve the transmission of header information in a header packet. One or more separate data packets may then be used to transmit other information, depending on the particular transaction and the interconnection buswidth. For example, a cache line data return transaction may involve transmission of 64 bytes of cache line data (i.e., 512 data bits). The transmission bus width may be 76 bits wide. Using a multiple ECC scheme, the header packet may be protected using a standard SEC-DED code of eight ECC bits. The data packets may be combined and protected by a single ECC code of eleven bits, thus significantly reducing the ECC overhead, and improving available data bandwidth.
To reduce data latency, parity bits may be distributed with each of the data packets, with the remaining ECC bits included in the last data packet. In an alternative embodiment, the remaining ECC bits may be placed anywhere in the transaction. This arrangement allows early detection of single bit errors in a specific data packet, and thus reduces latency.