1. Field of the Invention
The present invention generally relates to offloading transactional memory accesses to a globally coherent memory system between compute nodes from processors in a compute node to a transactional memory agent that resides near the compute node.
2. Description of the Related Art
Current multiprocessor memory systems typically contain a globally coherent memory controller. The purpose of the globally coherent memory controller is to maintain coherency of the memory image across the cache hierarchy of all processors in the system. Today multi-processor systems frequently maintain multiple copies of pieces of data in separate locations. If one processor were to overwrite one copy of data and not other copies, the system would lose coherency. Thus to maintain coherency of data throughout the multi-processor system, the state of data residing in memory must be managed throughout the entire multi-processor system. This is referred to as managing the coherence of a globally shared memory.
Shared memory is a memory that is typically accessed by one or more processors. Shared memory is memory that may be accessed via a common address space with a plurality of other processors in a computer system In some computer architectures, shared memory is managed such that all copies of the data of that memory system that may be resident in one or more processor's cache hierarchy across the global domain of a multi-processor computer system are kept consistent via a global coherence protocol enforced by a globally coherent memory controller.
Globally coherent memory controller functions may be distributed across one or more nodes in a multi-processor computer system. Each individual node containing the physical memory associated within a globally shared memory system may perform globally coherent memory controller functions associated with the data physically local to that node. In particular, a globally coherent memory controller tracks the location or locations and state of individual pieces of data its local data.
The globally coherent memory controller also associates states to each individual piece of data. Commonly, coherently shared data states include at least a “shared state” and an “exclusive state”. Data that is in the shared state can be read but not written. There may be one or more copies of a particular piece of data in the shared state that are cached in one or more of the processors in a globally coherent shared memory system. Data that is in the exclusive state has one and only one owner. Conventionally, the owner of data in the exclusive state is a processor. Data that is in the exclusive state can only be read or written by the owner of that data.
Data in the shared state is therefore referred to as “read only” data, and data in the exclusive state is referred to as “writeable” data. Processors containing a copy of shared memory in a shared state are commonly referred to as sharers.
Conventionally, when a processor wishes to write data that is currently in the shared state, that processor must first become the owner of the data. Before the data can be written that data must be transitioned to the exclusive state. The globally coherent memory controller administrates the transition of data from the shared state to the exclusive state.
The globally coherent memory controller enforces a coherence protocol, frequently using “snoops” or “probes”. A snoop or probe is a query used by the globally coherent memory controller to check or change the status or state of data contained in shared memory. Examples of probes sent from the globally coherent memory controller include querying a processor if it has a copy of a particular piece of data, querying a processor running software if it has modified writeable data, commanding a processor to give up or delete a piece of data, and changing the state of a particular piece of data in a processor's cache hierarchy. Probes are part of a global coherence protocol that maintains coherence of the global memory. A global coherence protocol includes rules that govern when certain data may be shared, and when certain data must be exclusive.
Current transactional memory based communications require a processor to manage its own transactional requests for data being read from or written to shared memory. In conventional transactional memory implementations when errors or failures occur in a transactional memory data request, the processor must handle the failure which may require the processor to abort execution of the software running at the time of the failure
The hardware addressing capabilities of a particular processor is another limitation of conventional transactional memory implementations. A particular commodity processor's hardware addressing capabilities may be limited to the number of address bits that a processor has.
New methods that overcome the limitations of processors performing their own transactional memory transactions are needed in order to support emerging ‘big data’ applications.