1. Technical Field
The present invention relates generally to computer data cache schemes, and more particularly to a method and apparatus for simultaneously process a series of data writes from a standard peripheral computer interface device when having multiple data processors in a system utilizing non-uniform memory access.
2. Description of the Related Art
In computer system designs utilizing more than one processor operating simultaneously in a coordinated manner, data handling from peripheral component interface (PCI) devices is controlled in a fashion that provides only for single transactions to be processed at one time or in strict order, if multiple data output ads are received from one of the PCI devices in a system utilizing any number of such devices. In a multiprocessor system which uses non-uniform memory access where system memory may be distributed across multiple memory controllers in a single system this may limit performance.
A PCI device, such as a hard disk controller, may issue a write command. Any multiple processor address control system will send a “invalidate” indication of the data line to be written to all caching agents or processors. One method of handling such invalidate's in the past is that a controller waits to receive acknowledgments that the data invalidate has been received and then makes that data line available for writing. The controller then sends an invalidate of a flag line for that data line, which was just made available for write. In the prior art, many such controllers will wait to receive acknowledgments from all memory sources prior to proceeding and then will accept the data from the PCI device attempting to write to memory. After such a device writes to the memory management device, that device makes the flag line available. Usually, controllers found in the prior art post write commands only in the same order as the invalidate commands are issued on a particular PCI bus.
All of this has the effect of slowing down system speed and therefore performance, because of component latency and because the ability of the system to process multiple data lines while waiting for invalidate indicators from other system processors is not fully utilized.