Modern Memory Systems, e.g. multi-channel Double Data Rate (DDR) memory, such as DDR/DDR1 or DDR2 memory architectures, have a significant performance advantage if several memory write accesses can be executed in an arbitrary order. The benefit of such an advantage is enhanced where the memory accesses also have to be handled by a cache coherency protocol. This protocol keeps the logic view of the memory content coherent in presence of caches. Before storing new data to memory for each cache line, the cache coherency protocol is checked to determine whether a recently changed version of the corresponding data is present in any cache. In this case, previously modified data is written to memory first. Therefore, the memory write operation can have a different duration for each cache line.
Memory write accesses that could be permitted to be carried out in arbitrary order would result in an average higher memory throughput. A complication arising from memory write accesses carried out in arbitrary order, however, is that any application program or electronic data system, that depends on data written by an IO device into memory, has to rely on the sequence in which data becomes visible to the application program or electronic data system. For example, using the Infiniband protocol, after reception of a data item (e.g., a write access) a “Completion Queue Element” is written to memory, the writing signaling to an application program, or to an electronic data system, the availability of newly written data.
Traditional Ethernet network interfaces are implemented using a buffer descriptor that is written after the received data frame is in memory, thereby signally the driver, or an executable application program that embodies the driver, that a write has occurred. The PCI-Express standard (where PCI stands for Peripheral Component Interconnect) defines (for a given operation) two modes to express ordering relationships. An access can be either ordered or it can be marked “relaxed ordered,” with respect to other accesses with the same identifier. The ordering is applied within each Traffic Class (TC). Eight (8) different TCs are available for use in a system implementing PCI Express.
In known memory systems operating in accord with PCI-Express protocol, where an Input/Output (IO) device generates a high number of small requests, each resulting in two memory write operations (one for the payload data, another for the completion notification), the ordering scheme is slowed because the write completion notification must be written with the relaxed ordering switched off, i.e., in order mode. The ordering scheme is therefore sequentially dependent, with no parallelism.