This invention relates in general to memory accesses in multi-node, multi-processor, cache coherent non-uniform access system and relates in particular to a system and method for performing a write purge operation in such a system.
The Scalable Coherent Interface (SCI) Direct Memory Access (DMA) write operations in the standard SCI specification completely over-writes the cache lines. If the DMA devices are only updating some of the bytes in the line, the information contained in the memory for the other bytes are lost in the complete overwrite.
The DMA write operation also relies on software to maintain cache coherency. If two devices are writing the same line, the second device can believe that it has finished purging all caches that contain the old data, even though the lines are being purged because of the first device. This second device can then allow other software to read stale data. The use of stale data by the software will cause program errors.
Other prior art methods for writing partial lines rely on reading the line into a local cache before the specified bytes are updated. This results in other desirable data being swapped out of the cache because of conflicts with the stored read data. In the end, this results in poor performance of other processes that are currently running because of the extra memory operations and the latency associated with refetching affected data.
Therefore, there is a need in the art for a method and system that has a write command that does not allow the use of stale data by the software.
In addition, there is a need in the art for a method and system that does not require reading memory lines into a local cache before the updating memory bytes.
These and other objects and features are achieved in a system which follows the same general flow as the DMA Write as described in the SCI specification, however, the inventive system and method does not detach the cache sharing list from memory. Instead, it joins the sharing list. This prevents another write purge from another node from believing it has finished its operation while memory lines are still encached. If no sharing list exists, a mask supplied by the command is used to merge the new data into memory. The system and method tracks down stale data in remote caches and merges it into the memory line using a mask instead of discarding it.
One technical advantage of the present invention is to issue a write purge command that joins the new node to the sharing list, while maintaining the connection between the memory and the sharing list.
Another technical advantage of the present invention is to have the new node issue the purging command to each node in the sharing list, while maintaining the connection between the memory and the sharing list.
A further technical advantage of the present invention is to have the new node issue the collapsing command to separate the sharing list from the memory after the purging command has been issued to each node. The collapsing command completes the destruction of the sharing list.
A further technical advantage of the present invention is to use a write mask with the write purge command.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and the specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.