1. Field of the Invention
The present invention generally relates to data processing techniques and, in particular, to a system and method for efficiently building a checksum of a checksum set without restricting accessibility of the non-checksum values within checksum set.
2. Related Art
Large computer systems (e.g., servers) often employ a plurality of memory units to provide enough instruction and data memory for various applications Each memory unit has a large number of memory locations of one or more bits where data can be stored, and each memory location is associated with and identified by a particular memory address, referred to hereafter as a xe2x80x9cmemory unit address.xe2x80x9d When an instruction that stores data is executed, a bus address defined by the instruction is used to obtain a memory unit address, which identifies the memory location where the data is actually to be stored. In this regard, a mapper is often employed that maps or translates the bus address into a memory unit address having a different value than the bus address. There are various advantages associated with utilizing bus addresses that are mapped into different memory unit addresses.
For example, many computer applications are programmed such that the bus addresses are used consecutively. In other words, one of the bus addresses is selected as the bus address to be first used to store data. When a new bus address is to be utilized for the storage of data, the new bus address is obtained by incrementing the previously used bus address.
If consecutive bus addresses are mapped to memory unit addresses in the same memory unit, then inefficiencies may occur. In this regard, a finite amount of time is required to store and retrieve data from a memory unit. If two consecutive data stores occur to the same memory unit, then the second data store may have to wait until the first data store is complete before the second data store may occur. However, if the two consecutive data stores occur in different memory units, then the second data store may commence before the first data store is complete. To minimize memory latency and maximize memory bandwidth, consecutive bus addresses should access as many memory units as possible. This can also be described as maximizing the memory interleave.
As a result, the aforementioned mapper is often designed to map the bus addresses to the memory unit addresses such that each consecutive bus address is translated into a memory unit address in a different memory unit. For example, a bus address having a first value is mapped to a memory unit address identifying a location in a first memory unit, and the bus address having the next highest value is mapped to a memory unit address identifying a location in a second memory unit. Therefore, it is likely that two consecutive data stores from a single computer application do not occur in the same memory unit. In other words, it is likely that consecutive data stores from a computer application are interleaved across the memory units.
Backup systems are often employed to enable the recovery of data in the event of a failure of one of the memory units. For example, U.S. Pat. No. 4,849,978, which is incorporated herein by reference, describes a checksum backup system that may be used to recover the data of a failed memory unit. To backup data stored within the memory units of a typical computer system, one of the memory units in the computer system is designated as a checksum memory unit. Each location in the checksum memory unit is correlated with locations in the other non-checksum memory units. During operation, a checksum value is maintained in each memory location of the checksum memory unit according to techniques that will be described in more detail hereinbelow. Each checksum value may be utilized to recover any of the non-checksum data values stored in any of the memory locations correlated with the checksum memory location that is storing the checksum value. The checksum value stored in a checksum memory location and each of the non-checksum values stored in a location correlated with the checksum memory location shall be collectively referred to herein as a xe2x80x9cchecksum set.xe2x80x9d
Each location in the checksum memory unit is initialized to zero. Each data value being stored in a location of one of the non-checksum memory units is exclusively ored with the data value previously stored in the location of the one non-checksum memory unit. In other words, the data value being stored via a data store operation is exclusively ored with the data value being overwritten via the same data store operation. The result of the exclusive or operation is then exclusively ored with the value, referred to as the xe2x80x9cchecksum,xe2x80x9d in the correlated address of the checksum memory unit. The result of the foregoing exclusive or operation is then stored in the foregoing address of the checksum memory unit as a new checksum value.
When a memory unit fails, the data value stored in a location of the failed memory unit can be recovered by exclusively oring the checksum in the correlated location of the checksum memory unit with each of the values in the other memory units that are stored in locations also correlated with the location of the checksum. The process of maintaining a checksum and of recovering a lost data value based on the checksum is generally well known in the art.
During a recovery of a lost data value in a checksum set, many computer systems replace the checksum of the checksum set with the recovered data value. Since the checksum set no longer includes a checksum, the data values within the checksum set cannot be recovered in the event of another memory unit failure unless additional steps are taken to backup the data of the checksum set. An example of additional steps that may be taken to backup the checksum set includes installing an additional memory unit and storing a checksum of the checksum set in the additional memory unit.
However, building a checksum can be complicated, if the computer system is allowed to continue data stores to the checksum set during the checksum building. In this regard, data stores to the memory locations of the checksum set may change the values of the checksum set while the checksum is being built. If care is not taken to ensure that the checksum is appropriately updated to account for such updates to the checksum set, it is possible for the checksum to be inconsistent with the non-checksum values of the checksum set. Thus, to prevent errors in the checksum build process, most computer systems prohibit data writes to any memory location storing a non-checksum data value of the checksum set once the checksum build process is initiated. When the checksum build process is completed, data writes to the memory locations of the checksum set are again enabled. However, the inability of the computer system to service write requests to the checksum set during the checksum build process reduces the overall efficiency of the computer system.
Thus, a heretofore unaddressed need exists in the industry for providing a system and method for building a checksum for a checksum set within a computer system without requiring the computer system to temporarily stop servicing write requests that overwrite the data values of the checksum set.
The present invention overcomes the inadequacies and deficiencies of the prior art as discussed hereinbefore. Generally, the present invention provides a system and method for efficiently building a checksum of various data values that are stored in different memory units of a computer system. During the checksum build process, data stores to the memory locations storing the various data values are enabled, thereby enabling the checksum to be built without significantly impacting the performance of the computer system.
In architecture, the checksum building system of the present invention utilizes a plurality of memory units, a plurality of memory controllers, and an indicator. Each of the memory units has a plurality of memory locations for storing data, and each of the memory controllers is configured to access memory locations within a respective one of the memory units. One of the memory controllers is configured to build a checksum in one of the memory locations, and the indicator indicates which of the other memory controllers are enabled for updating the one memory location of the checksum.
In building the checksum, the one memory controller may be configured to perform the following steps: setting the indicator to indicate that each of the other memory controllers is disabled from updating the one memory location; transmitting, subsequent to the setting step, read-for-rebuild requests to each of the other memory controllers; receiving rebuild values that have been retrieved from the memory units in response to the read-for-rebuild requests; updating the one memory location with each of the rebuild values; and changing, for each rebuild value received by the one memory controller, the indicator to indicate that the transmitting memory controller is enabled for updating the one memory location.
Other features and advantages of the present invention will become apparent to one skilled in the art upon examination of the following detailed description, when read in conjunction with the accompanying drawings. It is intended that all such features and advantages be included herein within the scope of the present invention and protected by the claims.