A variety of factors including faulty components and inadequate design tolerances may result in errors in the data being processed by a computer. These errors also commonly occur during data transmission due to "noise" in the communication channel. As a result of these errors, one or more bits, which may be represented as X, which are to be transmitted within the system, are corrupted so as to be received as /X (i.e. the logical complement of the value of X). In order to protect a computer system against such errors, the data bits may be coded via error correcting code ("ECC") in such a way that the errors may be detected and possibly corrected by special ECC logic circuits. A typical ECC implementation appends a number of check bits to each data word. The appended check bits are used by the ECC logic circuits to detect errors within the data word.
The simplest and most common form of error control is implemented through the use of the parity bit. The single parity bit is appended to the data word and assigned to be either a 0 or a 1, so as to make the number of 1's in the data word even in the case of even parity codes, or odd in the case of odd parity codes.
Prior to the transmission of the data word in a computer system, often upon the initial storage of the data word, the value of the parity bit is computed at the source point and appended to the data word. Upon receipt of the transmitted data word, logic at the destination point recalculates the parity bit and compares it to the received, previously appended parity bit. If the recalculated and received parity bits are not equal a single bit error has been detected. Specifically, this means that a single data bit in the data word has transitioned from its original value, for example 1 to 0 or 0 to 1. If the received and recalculated parity bits are equal, then it can be concluded that such a single bit error did not occur, however multiple bit errors may not be ruled out. For example, if a data bit changes from a 0 to a 1 and another data bit changes from a 1 to a 0 (i.e. a double bit error) the parity of the data word will not change and the error will be undetected. Thus, use of the parity bit provides single error detection, however, it fails to detect every multiple even bit error, and it fails to provide information on the location of the erroneous bit(s).
By appending additional parity bits to the data word, each corresponding to a subset of data bits within the data word, the parity bit concept may be easily expanded to provide the detection of multiple bit errors or to determine the location of single or multiple bit errors. Once a data bit error is located it is a simple matter to cause a logic circuit to correct the located erroneous bit, thereby providing single error correction ("SEC"). Many single error correction codes have the ability to detect double errors and are thus termed single error correcting double error detecting codes ("SEC-DED").
Multiple error detection schemes rely on appending additional check bits to the data word. The most well-known SEC-DED ECC is the so-called Hamming code, which appends a series of check bits to the data word as it is stored in memory. Upon a read operation, the retrieved check bits are compared against recalculated check bits to detect, locate and correct a single bit error. By adding more check bits and appropriately overlapping the subsets of data bits represented thereby, other error correcting codes have been devised for providing three bit error detection and two bit error correction, and, via the further addition of check bits, codes can be formulated to detect and correct any number of data bit errors.
Robust error detection and correction systems have long been mandatory features on most large scale systems. Recently, the widespread adoption of the networked model of computing systems has heralded the emergence of a new role for the small to mid-sized PC heretofore intended for desktop applications; that of a network server. Concomitant with the adoption of this new computing model arose the need to provide greater assurances that the data being accessed by clients of these small to mid-sized servers was as accurate as the data on their larger system server counterparts. As a consequence, the industry began to provide error correction solutions for inclusion or retrofit into this new class of servers.
Manufacturers of these new ECC systems have encountered difficulty in demonstrating the benefits of this enhanced protection to prospective customers. Specifically, it has been determined that it is desirable to present some type of visual or other human perceptible signal to the observer indicative of the error detection and correction functions being undertaken by the ECC product.
Typically, the occurrence of an error detection or correction operation in an ECC system occurs at the same time that the erroneous data is needed. Consequently, utilizing current technology a typical error event is completed in approximately 50 nanoseconds, and as semiconductor technology advances, the time required for such an operation is continually growing shorter. The notification of these ECC events within event recording systems operate at commensurate speeds with similarly fast hardware logic recording the error event so as to permit that record to examined later at human speeds i.e. on the order of seconds rather than nanoseconds. Clearly, if a visible error indicator element such as a light emitting diode ("LED") or display circuit were activated only for the occurrence of the memory cycle in which it occurred, the human eye could never detect it. Consequently, the aforementioned event recording systems accomplish the objective of allowing an observer to visually perceive a single error event.
For example U.S. Pat. No. 5,068,851 to Brucker et al. discloses a fault recording system wherein the occurrence of a fault is stored by non-volatile memory and presented to the observer by means of a visible indicator.
The simple latching or recording of a single error event, however, does not afford the observer the ability to appreciate via a visual indication, the occurrence of plural temporally sequenced error events unless the system were continually manually reset or a large number of error indicators were employed, each latching a single error event as it occurred, both of these are impractical and possibly cost prohibitive alternatives. Without the practical ability to visually or otherwise demonstrate the operation of an ECC system over time as it detects and corrects a plurality of errors and signifies uncorrectable errors in data from a memory subsystem it is difficult to accurately convey to an observer the function being performed by the system.
Accordingly, a need exists for the provision of a human sensorially significant ("HSS") indication of error detection and/or correction and the detection of uncorrectable errors by an ECC system automatically on a recurring basis without requiring an excessive number of error indicators.