Under a coherent memory architecture, all components in a computing system are assured access to the same data values. Memory coherency ensures that data being moved into or out of memory does not appear to have different values when accessed by processors or other components that access shared memory resources. Under the simplest approach, there is only one copy of any particular data at any time in the entire system, and only one component may access that data at one time. While this scheme guarantees memory coherency, it does not permit memory caching, which is common to modern processor architectures. Since memory caching involves making at least one copy of data stored in system memory and then allowing that copy to be modified outside of system memory, there needs to be a mechanism to ensure that only a valid version of a given piece of data may be accessed. This problem is easily solved for a single-processor system by using one of several well-known memory and cache coherency schemes that are managed by the processor and/or memory controller.
The memory coherency problem becomes more complex in multiprocessor architectures that share a common memory space. Since each processor has its own cache, there needs to be a mechanism to ensure only coherent atomic memory transactions may be performed, and that there is only one valid copy of a given piece of data at a time. For symmetric agents, such an Intel® 32-bit architecture (IA-32) processors, a bus read-for-ownership transaction is employed to access a memory location, which invalidates all cache lines corresponding to that memory location in other processor caches. This allows the symmetric agent to perform an atomic operation on that memory location, while preventing other symmetric agents from accessing the data until it is written back into its original location in the shared memory and marked as accessible. The IA-64 architecture (e.g., Intel® Itanium® processor) adds to this the concept of guaranteeing cache line ownership. By asserting the OWN# signal during a transaction, an Itanium® processor may instruct the memory controller to ignore memory updates due to an implicit write-back in response to the bus read-for-ownership. In doing this, the Itanium® processor has informed the memory controller that the memory controller does not need to write back the dirty data to memory: the processor guarantees that it will claim the dirty data, modify it as needed, and write the data back to memory at some later time. In an Itanium-based system, only the processors (i.e., symmetric agents), have the ability to assert OWN# and claim ownership for a cache line.
In recent years, higher and higher performance networking equipment has become available. For example, it is now common for business networks and even some home networks to employ 1 Gigabit per second Ethernet (1 GbE) connections. Even higher data rates are envisioned for the future. In order to support such high data rates, the use of dedicated input/output (I/O) agents has been introduced. By handling network communication operations that would typically be performed by a communications software stack (e.g., TCP/IP protocols) running on a processor, these I/O agents enable much of the communication workload for a computer system to be off-loaded from the processor, freeing it up to perform other tasks. In addition, next-generation I/O agents will integrate specialized network protocols and security acceleration in dedicated off-load units.
Currently, many modern computer architectures do not provide a mechanism for an I/O agent to guarantee the atomicity of a transaction within a shared, coherent memory space. This limits the flexibility of architectures that employ I/O agents, requiring either memory segmentation to be employed (e.g., the processors and I/O agents access separate memory spaces), or requiring access to shared memory resources to be routed through the processors at some level. Systems employing these I/O agents would significantly benefit from the ability for I/O agents to perform atomic operations in coherent shared memory spaces in a manner that is similar to that supported by today's processors.