This invention relates broadly to the field of dynamic semiconductor memories and particularly to a circuit for improving the mean time between failures for large size dynamic semiconductor memories.
The reliability of a dynamic semiconductor memory is known to be a function of the failure rate of individual random access memory cells, the density of the memory cells on a chip and the quality of the chip. Failures which occur on such chips are classified as "hard" and "soft" failures where "hard" failures comprise a permanent malfunction while "soft" failures are intermittent failures. "Hard" failures in dynamic semiconductor chips frequently take the form of failures of a single cell, a bit line, a word line or other physical portion of the chip. "Soft" failures, however, are most frequently caused by radiation such as alpha particle radiation due to the radioactive decay of trace amounts of uranium or thorium in the packaging materials used for the chips. Such "soft" failures are usually of the type where only a single bit is affected.
In order to enhance the reliability of dynamic semiconductor memories, numerous approaches are utilized. To deal with "hard" failures, periodic maintenance is performed to check and replace chips which show high error rates or complete failure. Regular maintenance schedules are maintained to accomplish this.
However, statistical analysis has shown that single cell "soft" errors occur far more frequently than any other type of error. In order to reduce single cell "soft" failures, manufacturers have introduced coatings to isolate the semiconductor from the radioactive traces in the package surrounding them. This has helped a great deal although it has not eliminated the problem.
Still another approach to the "soft" error problem is to utilize an error checking and correcting scheme. This approach involves use of a code, along with the data. When an error on readout from memory is detected, the code and the data are run through an error correcting circuit to correct the error. The corrected data is then written back into memory and transmitted elsewhere in the system coupled to the memory. However, this approach is not effective when some multiple/double bit errors have occurred because while double and some multiple errors are detected by the error detecting and correcting scheme, they are generally not correctable.
The above-mentioned technique for correcting single bit "soft" errors is operative to detect and correct such errors on readout of data from the location where an error has occurred. This approach, however, is not capable of discovering when such an error occurs and cannot assure that a second "soft" error does not occur at the same location in the memory before it is read. This is especially true for memory locations which are infrequently read.
It is therefore a primary object of the present invention to provide a means to check all locations in a dynamic semiconductor memory and correct single "soft" failures before they become undetectable double errors.
It is a further object of the present invention to provide a circuit to check all locations of a dynamic semiconductor memory and correct any "soft" errors detected, the circuit being operative without modification regardless of the size of the memory it is designed to check.
It is a further object of the invention to provide a circuit for correcting "soft" failures before they become "hard" or double failures which may not be detectable while only minimally interfering with normal system operation.