1. Field of the Invention
The present invention generally relates to data processing techniques and, in particular, to a system and method for dynamically moving a checksum from one memory location to another memory location without introducing data errors.
2. Related Art
Large computer systems (e.g., servers) often employ a plurality of memory units to provide enough instruction and data memory for various applications. Each memory unit has a large number of memory locations of one or more bits where data can be stored, and each memory location is associated with and identified by a particular memory address, referred to hereafter as a “memory unit address.” When an instruction that stores data is executed, a bus address defined by the instruction is used to obtain a memory unit address, which identifies the memory location where the data is actually to be stored. In this regard, a mapper is often employed that maps or translates the bus address into a memory unit address having a different value than the bus address. There are various advantages associated with utilizing bus addresses that are mapped into different memory unit addresses.
For example, many computer applications are programmed such that the bus addresses are used consecutively. In other words, one of the bus addresses is selected as the bus address to be first used to store data. When a new bus address is to be utilized for the storage of data, the new bus address is obtained by incrementing the previously used bus address.
If consecutive bus addresses are mapped to memory unit addresses in the same memory unit, then inefficiencies may occur. In this regard, a finite amount of time is required to store and retrieve data from a memory unit. If two consecutive data stores occur to the same memory unit, then the second data store may have to wait until the first data store is complete before the second data store may occur. However, if the two consecutive data stores occur in different memory units, then the second data store may commence before the first data store is complete. To minimize memory latency and maximize memory bandwidth, consecutive bus addresses should access as many memory units as possible. This can also be described as maximizing the memory interleave.
As a result, the aforementioned mapper is often designed to map the bus addresses to the memory unit addresses such that each consecutive bus address is translated into a memory unit address in a different memory unit. For example, a bus address having a first value is mapped to a memory unit address identifying a location in a first memory unit, and the bus address having the next highest value is mapped to a memory unit address identifying a location in a second memory unit. Therefore, it is likely that two consecutive data stores from a single computer application do not occur in the same memory unit. In other words, it is likely that consecutive data stores from a computer application are interleaved across the memory units.
Backup systems are often employed to enable the recovery of data in the event of a failure of one of the memory units. For example, U.S. Pat. No. 4,849,978, which is incorporated herein by reference, describes a checksum backup system that may be used to recover the data of a failed memory unit. To backup data stored within the memory units of a typical computer system, one of the memory units in the computer system is designated as a checksum memory unit. Each location in the checksum memory unit is initialized to zero and is correlated with locations in the other non-checksum memory units. Each data value being stored in a location of one of the non-checksum memory units is exclusively ored with the data value previously stored in the location of the one non-checksum memory unit. In other words, the data value being stored via a data store operation is exclusively ored with the data value being overwritten via the same data store operation. The result of the exclusive or operation is then exclusively ored with the value, referred to as the “checksum,” in the correlated address of the checksum memory unit. The result of the foregoing exclusive or operation is then stored in the foregoing address of the checksum memory unit as a new checksum value.
When a memory unit fails, the data value stored in a location of the failed memory unit can be recovered by exclusively oring the checksum in the correlated location of the checksum memory unit with each of the values in the other memory units that are stored in locations also correlated with the location of the checksum. The process of maintaining a checksum and or recovering a lost data value based on the checksum is generally well known in the art.
There are situations when it is desirable to move the data values, including the checksum values, stored in one or more locations of one or more of the memory units to other locations in one or more of the memory units. For example, it may be desirable to remove one of the memory units when the memory unit is performing unreliably. To prevent the loss of data that may be stored in the memory unit to be removed, the computer system employing the memory unit is often shut down before removing the memory unit. Once the memory unit has been removed, the computer system is rebooted. The shutting down and rebooting of the computer system is an obviously undesirable consequence of removing the memory unit, since the computer system is unable to run any applications until the reboot is completed.
Some techniques have been developed that allow a memory unit to be removed from the computer system without shutting down the computer system. For example, the processor's virtual memory mapping system may be used to re-map the physical addresses. This results in the temporary halting of applications and the copying of data from the memory unit being removed to a disk or some other data storage device until the removed memory unit is replaced by a new memory unit. The primary reason for halting the executions of applications is to prevent attempts to update the values being moved so that data errors are prevented. All threads in a multi-threaded application as well as the I/O system should always have a consistent view of a memory location.
Once the removed memory unit has been replaced, the aforementioned data copied from the removed memory unit is then written to the new memory unit. Then, execution of applications is resumed. While the foregoing techniques lessen the amount of time that the computer system is unable to run applications, there is still a finite amount of time in which the computer system is unable to run an application.
A checksum can be moved from one memory unit to a different memory unit by disabling checksum protections, obtaining a consistent copy of all of the data values that are to be backed up by the checksum, exclusively oring these data values, and storing the result of the exclusive or operation in the new memory unit. However, the foregoing methodology has the disadvantage of running the computer system without checksum protection for a significant time period and of consuming significant memory bandwidth, since the foregoing methodology should be performed for each checksum being moved.
Thus, a heretofore unaddressed need exists in the industry for providing a system and method for moving data values, particularly checksum values, to different memory locations of a computer system without requiring the computer system to halt execution of applications.