This invention relates to a data processing system with an error processing device and an error processing method. More particularly, to an error processing system and method in a data processing system in which, when an error occurs in a main memory, an alternate memory is used in place of the main memory.
A data processing system such as a computer comprises a main memory for storing data and a data processing unit. The data processing unit reads data from the main memory by accessing the main memory. By using the read data, the data processing unit executes data processing. In order to execute accurate data processing, the data to be supplied from the main memory to the data processing unit must be accurate. However, there may be hard errors or soft errors in the main memory. The hard errors occur due to a permanent destruction of a part of the main memory. The soft errors occur, particularly in a semiconductor memory element, due to alpha particles or internal voltage variations within a memory cell. In order to avoid the soft errors and hard errors, there is generally provided an error correcting circuit in the data processing system. One example of the error correcting circuit is the well known SEC-DED (Single Error Correct-Double Error Detect) circuit. By means of the SEC-DED circuit, a one bit error in a bit line or a word line in the main memory can be corrected by rewriting correct data but two or more bit errors in a bit line or word line in the main memory cannot be corrected. When two or more bit errors are detected, the data processing system changes to a system down status. Similarly, in the other error correcting circuits, correctable errors are also corrected, but uncorrectable errors are only detected in order to cause a system down status. A soft error is an accidental inversion of the polarity of data stored in a memory cell of the main memory. Therefore, the soft error can be corrected by rewriting correct data into the memory cell.
However, in recent years, the memory capacity of the main memory has been increased by means of, for example, decreasing the size of each memory cell in the a semiconductor memory element. Because of the decreased memory cell size, the possibility of soft errors occuring due to alpha particles is increased. Those alpha particles, which are radiated from package materials, temporarily turn off dynamic nodes of a semiconductor memory cell. Therefore, in spite of the functioning of the error correcting circuit, the possibility of an uncorrectable error occuring is increased because of the increase in the frequency of soft errors. As a result, the possibility of the system down status occuring is increased.
In order to decrease the possibility of the system down status occuring, errors may be removed before the data processing unit accesses the main memory. For this purpose, the data processing system may further comprise an error processing unit and an alternate memory. The error processing unit can access the main memory when the data processing unit does not access the main memory. The correctness of data stored in the main memory is checked by the error processing unit. If there are one or more errors in the stored data, the errors are analyzed by the error processing unit. As a result of the analysis, if the errors are uncorrectable, correct data is stored in the alternate memory. Then, the correct data stored in the alternate memory is used in the data processing after the end of operation by the error processing unit. By means of this modified system, correctable errors are removed before the data processing unit accesses the main memory. Therefore, the possibility of a system down status occuring is decreased. However, a system down status still occurs when uncorrectable errors, such as, two or more bit errors in the SEC-DED circuit, occur during the time the data processing unit accesses the main memory. When, for example, the SEC-DED circuit is used as an error correcting circuit, these uncorrectable errors may occur not only when two or more bits of hard error occur, but also when two or more soft bit errors occur, one bit hard error plus one or more bit soft errors occur, or a burst error plus one soft error occurs, during the time data processing is executed. Burst error means hard errors in a plurality of memory cells in one or more chips in the main memory.