The present invention relates to a control device that uses an ECC (Error Check and Correct) memory having a function of automatically correcting errors.
In a control device of an electric power generation plant or the like, an ECC memory is used for a main memory (main storage unit) to store control data that a CPU (central processing unit) reads and writes directly. When data is stored in the memory, the ECC memory adds an error-correcting code, ECC, that is different from the original data before the data is stored in the memory, thereby enabling errors to be corrected or detected.
In general, if the error-correcting code corresponding to data n is m, a unique error-correcting code generation function m=f(n) is found that turns (n & m) into a code word having a certain hamming distance and, instead of n, {n & f(n)} is stored in the memory as a code word to realize the ECC memory.
In general, as the number of errors increases, the situation changes from one in which errors can be corrected to one in which errors can be detected but not corrected, and eventually to one in which errors cannot be either detected or corrected. For example, if a code word with a hamming distance of 4 includes a 1-bit error, then the code word is not a code word because the code word has moved by a distance of 1 from the previous value that has yet to have the error. Since the code word is not a code word, it can be determined that the code word contains an error. Moreover, there is only one code word that is closest to the value, and it can be determined that the value is the previous code word that has yet to have the error. Therefore, the use of the value as a correct value can be regarded as making a correction.
If a code word with a hamming distance of 4 includes a 2-bit error, then the code word is similarly not a code word. Therefore, an error can be detected but not corrected because there could be two or more code words that are closest to the value. If a code word with a hamming distance of 4 includes a 3-bit error, then the code word is similarly not a code word. Therefore, an error can be detected but error correction may result in an error because the code word closest to the value is not necessarily the previous code word that has yet to contain an error. If a code word with a hamming distance of 4 includes a 4-bit error, then the code word can be a code word even if the code word includes an error. Therefore, the error may not be detected. That is, for a code word with a hamming distance of 4, an 1-bit error can be corrected and a 2-bit error can be detected. Accordingly, the ability to correct (and detect) errors of the ECC memory is dependent on the hamming distance of a generated code word.
A device using the ECC memory can read data that is corrected when data is read out owing to the error correction function of the ECC memory if the number of errors is within a range that allows correction. However, an error stored in the memory usually remains uncorrected.
There is a memory control circuit that corrects an error promptly after the error is detected in read data before writing the data in order to improve the reliability of the memory (see Japanese Patent Application Laid-open Publication No. 6-52065 (Patent Document 1), for example). That is, if an error stored in the memory is a soft error (temporary error), error correction can be made by storing data again (or overwriting data). Storing data again means that correct data and a new error-correcting code generated from the correct data are to be stored. As a result, the data is, in effect, corrected. Because of the one disclosed in Patent Document 1, the simple updating of data that is routinely carried out by the operation of the control device, and the like, it is unlikely that soft errors will continue to increase in the control device using the ECC memory.
On the other hand, apparently there is a problem that errors caused by permanent defects remain uncorrected and accumulated. If operation goes on under such circumstances, the number of errors stored in the memory could exceed the correction ability of the ECC and the errors can be uncorrectable without any sign of what is about to happen. The current ECC memories have such a characteristic.
The problem here is that it is impossible to predict the timing when errors turn into uncorrectable errors. If the timing when errors turn into uncorrectable errors is predictable, it is possible to stop the control device at an appropriate timing for maintenance. For example, the control device can be stopped for maintenance when the plant controlled by the control device is out of operation. However, if the timing when errors turn into uncorrectable errors is not predictable, the unplanned, immediate suspension of the plant and the like is necessary and the plant and the like can be seriously damaged.
That is, the ECC memory merely serves for life extension of the device. Even if the timing when errors turn into uncorrectable errors is not predictable, the following procedure may be possible for a control device having the ECC memory correction function which works when data is read out and the function of issuing notice that an error stored in the memory is present: the device may be stopped at an appropriate timing for maintenance each time the notice is issued.
However, if the above error is a soft error, in many cases the error stored in the memory will disappear after the updating of data and the like as described above. Therefore, it is extremely inefficient for the control device to stop each time the notice is issued. In practice, such an operation is not common. Moreover, it is difficult to determine whether the issuing of the notice is associated with errors of software or permanent defects.
Accordingly, what is extremely important is the function to reliably detect that the error stored in the memory is attributable to permanent defects. The reason is that if it is possible to detect the existence of permanent defects, the control device can be stopped at an appropriate timing for maintenance before errors abruptly turn into uncorrectable errors and the plant and the like are seriously damaged.