Ensuring the integrity of data processed by a data processing system such as a computer or like electronic device is critical for the reliable operation of such a system. Data integrity is of particular concern, for example, in fault tolerant applications such as servers, databases, scientific computers, and the like, where any errors whatsoever could jeopardize the accuracy of complex operations and/or cause system crashes that affect large numbers of users.
Data integrity issues are a concern, for example, for many solid state memory arrays such as those used as the main working storage repository for a data processing system. Solid state memory arrays are typically implemented using multiple integrated circuit memory devices such as static or dynamic random access memory (SRAM or DRAM) devices, and are controlled via memory controllers typically disposed on separate integrated circuit devices and coupled thereto via a memory bus. Solid state memory arrays may also be used in embedded applications, e.g., as cache memories or buffers on logic circuitry such as a processor chip.
A significant amount of effort has been directed toward detecting and correcting errors in memory devices during power up of a data processing system, as well as during the normal operation of such a system. It is desirable, for example, to enable a data processing system to, whenever possible, detect and correct any errors automatically, without requiring a system administrator or other user to manually perform any repairs. It is also desirable for any such corrections to be performed in such a fashion that the system remains up and running. Often such characteristics are expensive and only available on complex, high performance data processing systems. Furthermore, in many instances, many types of errors go beyond the ability of a conventional system to do anything other than “crash” and require a physical repair before normal device operation can be restored.
Conventional error detection and correction mechanisms for solid state memory devices typically rely on parity bits or checksums to detect inconsistencies in data as it is retrieved from memory. Furthermore, through the use of Error Correcting Codes (ECC's) or other correction algorithms, it is possible to correct some errors, e.g., single-bit errors up to single-device errors, and recreate the proper data.
In addition, some conventional correction mechanisms for solid state arrays may be capable of disabling defective devices or utilizing redundant capacity within a memory system to isolate errors and permit continued operation of a data processing system. For example, steering may be used to effectively swap out a defective memory device with a spare memory device.
Despite the advances made in terms of error detection and correction, however, one significant limitation of the aforementioned techniques is that such techniques are not configured to directly verify, immediately after a store or write operation, whether correct data is stored in a memory device as a result of that operation. Put another way, conventional techniques have typically relied upon error correction and detection mechanisms that operate in connection with retrieval of data from a memory storage device, rather than in connection with the storage of data in the device.
Verification of write or store operations, which is referred to hereinafter as write verification, has conventionally been performed via a brute force method: issuing a read or fetch operation immediately after each write or store operation, and comparing the retrieved data to the data intended to be written to the memory storage device by the write or store operation. By doing so, however, each write or store operation effectively requires two operations to be issued and processed by the memory architecture, and thus can have a significant adverse impact on performance, in terms of both processing and communication bandwidth in a system.
One solution that may be utilized to potentially reduce the adverse impact of write verification is to perform what is referred to as “memory scrubbing” where a background process periodically reads each location in a memory array and utilizes ECC circuitry to detect and (if possible) correct any errors in the array. The background process may be configured to issue read operations only during periods of inactivity such that the impact on memory bandwidth is minimized. However, memory scrubbing still requires a read operation to be directed to each location in a memory array, and furthermore, may have limited verification capability when a system is under a constant, heavy workload, and thus has few (if any) periods of inactivity. Furthermore, memory scrubbing cannot, by itself, verify that a write operation was completed successfully or that a write operation was directed to the intended location, e.g., due to the possibility of interface errors or address/command integrity errors.
Therefore, a significant need continues to exist in the art for a manner of performing write verification of a solid state memory array with reduced impact on system performance.