Whenever data is transmitted from one agent (e.g., processor, memory, input/output device, etc.) to another agent there is the possibility that the data will be exposed to transmission errors. These transmission errors can be caused by spurious signals, such as power fluctuations, electromagnetic fields, etc., affecting the medium used to transmit the data. Additionally, because data can become corrupted in a variety of ways, whenever one agent transmits data to another agent there is the possibility that the data is at least partially corrupted before it is even transmitted.
Prior art approaches to handling error detection and correction have been as rudimentary as simple parity checking and have been as involved as specialized error detection and correction circuitry. Simple parity checking usually involves either the sending agent or the receiving agent comparing a single bit with the sum of all the other bits. For instance, if the sum is an odd number and the single bit is a 1, or if the sum is an even number and the single bit is a 0, then parity checks and hence no error is reported. However, if the sum is an even number and the single bit is a 1, or if the sum is an odd number and the single bit is a 0, then parity does not check and an error is reported.
Parity checking, while relatively easy to implement, has at least three disadvantages. First, every portion of data transmitted must be checked to be sure the parity is correct. Second, if a parity error is detected, the only option is to have the data retransmitted in the hopes that the error occured in the transmission itself. However, if the error was due to the sending agent having corrupted data, mere retransmission will not avoid another parity error. In the case of the sending agent having corrupted data, parity errors will continue to be found, until either the sending agent stops trying to send the data or the receiving agent stops requesting the data. The third disadvantage of parity checking also occurs if a parity error is detected. Then, even though the sending agent and the receiving agent may both know that there was an error in the data already sent, the error may not have been brought to the receiving agent's attention in time to prevent the receiving agent from processing the good data sent before bad data. In that case, when the data is resent, it will be processed again. This repetitive process of sending data, processing it, and learning that is bad data will continue, as state above, until either the sending agent stops trying to send the data or the receiving agent stops requesting the data.
Specialized error correction and detection circuitry (ECC) is often used to avoid either sending bad data or relying on bad data. Error correction and detection circuitry can be utilized in several ways. One way to use ECC is known as the correct always mode. Referring to FIG. 18, a simplified correct always circuit is shown whereby data from the sending agent is transmitted on signal line 1801 to correction circuitry 1803 (usually contained with the controller interface of the sending agent). Correction circuitry 1803 performs correction steps on the data and transmits the corrected data on signal line 1805 to bus 1807 which carries the data to the receiving agent. Using a correct always ECC approach assumes that the data is bad even when it isn't which is evident by the fact that the data is always being corrected even when it may be perfectly valid. Thus, the correct always ECC approach introduces latency into every data transmission by requiring the data to be corrected even when it may be valid. This correction latency may not be acceptable in today's high speed data transmissions of large amounts of data.
Another way to use ECC circuitry is known as the detect and correct mode. Referring to FIG. 20, the detect and correct ECC approach can be seen whereby the sending agent transmits data on signal line 2001 to the ECC circuitry (again, usually found in the controller interface of the sending agent). The ECC circuitry passes the incoming data on signal line 2001 to three signal lines 2002, 2003, and 2004. The detect and correct ECC approach takes the opposite assumption to that of the correct always ECC approach. The detect and correct ECC approach assumes that the data is always valid and transmits the data (input on signal line 2001 to signal line 2003) to output multiplexor (MUX) 2011. Output MUX 2011 transmits the data on signal line 2013 to bus 2015 which carries the data to the receiving agent.
While the data is being carried on signal line 2003 of FIG. 20, to the output MUX 2011, the data is also being carried on signal line 2002 to error detection logic 2005. Error detection logic 2005 is continuously checking the data for any data errors. If error detection logic 2005 detects an error in the data being transmitted, error detection logic 2005 sends out an error flag on signal line 2008.
Error correction logic 2007 continually tracks which was the last portion of data transmitted by the sending agent through the controller interface. In actual practice, the controller interface of the sending agent usually contains a buffer which holds the data being transmitted by the sending agent. The buffer may be used in a flyby mode, in which case the controller interface is merely placing the data in the buffer at the same time the controller interface is transmitting the data across the bus 2015 to the receiving agent. Alternatively, the buffer may be used as a short term holding place for the data, kept until the controller interface is ready to transmit the data from the buffer acriss the bus 2015 to the receiving agent.
Error correction logic 2007, upon receiving the error flag from signal line 2010 (which is itself connected to signal line 2008), attempts to correct the bad data. If the data is bad because of only a single bit error, the error correction logic 2007 may be able to correct the error on the fly and immediately send the corrected data to the receiving agent through output mux 2011. This is accomplished by having output mux 2011 switch its input (due to receiving the error flag on signal line 2009, which is connected to signal line 2008) from the straight through data transmission of signal line 2003 to the corrected data from signal line 2006. Note that correction of the bad data on the fly is more likely to be successful if the transmission to the receiving agent is handled with an asynchronous transmission as this may allow more time to correct the bad data before the receiving agent has actually received the bad data.
However, if the error in the data which error detection logic 2005 detected is more serious than a single bit error, or if the transmission is synchronous, then a simple correction and retransmission on the fly is not usually possible in the prior art. Instead, the error correction logic 2007, upon receiving the error flag on signal line 2010, attempts to correct the bad data and will update the sending agent with the corrected data in what is commonly known as a "scrubbing" operation. Then, the receiving agent, also having received the error flag, would request retransmission of the data.
Retransmission requires an additional bus transaction to request retransmission of the data. Retransmission also requires that the sending agent resend the entire portions of data originally requested, not just the corrected bad data and any data which would have followed the bad data had it not been bad. Further, during the delay between the time the receiving agent began originally receiving data and the time the receiving agent received the error flag, the receiving agent may have completed a considerable amount of processing on the data. With a retransmission of the entire portions of data originally requested, all of this processing would now have to be repeated. While this may be acceptable with single word transfers of data (because reprocessing of a single word may be an acceptable performance degradation as compared to a more elaborate error detection and correction methodology), modern burst transmissions of numerous words usually cannot afford to have the receiving agent re-process large amounts of data. Additionally, with modern burst transmissions of large amounts of data and multiple agents vying for bus access and system resources, the bus and system overhead of retransmission requests and retransmissions of large amounts of good data for the occasional portion of bad data is not acceptable.