Multiple-node computing systems are a common way to improve application execution performance. Each node typically includes one or more processors. Execution of an application may be distributed over the nodes of such a computing system. The computing system may include memory that is shared among all the nodes. For example, in a non-uniform memory architecture (NUMA) computing system, each node has local memory that is remote memory to all the other nodes.
Because the nodes may have caches that cache the contents of the memory of the computing system, for the computing system to operate properly, typically cache coherency, or consistency, has to be maintained. Cache coherency is the process by which it is ensured that the contents of memory that are cached by any given node accurately reflect what is stored in that memory. For example, a node may currently accurately cache the contents of a memory address as the value A. If another node writes the value B to that same memory address, the memory address as cached at the former node has to be invalidated, so that this node does not believe that the memory address currently stores the value A when in fact it currently stores the value B.
Memory coherency, however, can impact the performance of multiple-node computing systems. In particular, when a read-related memory access request is made at a node, the node has to ensure that if the memory address in question is currently cached, that the contents of the memory addressed as cached are valid. Likewise, when a write-related memory access request is made at a node, the computing system has to ensure that the memory address in question is invalidated at any other node that is currently caching the contents of this memory address. Thus, memory access requests as used herein encompass both read-related and write-related requests.
In particular, the so-called modified-exclusive-shared-invalidated (MESI) coherency protocol, which is also known as the Illinois protocol, can hamper the performance of a multiple-node computing system. Under the MESI coherency protocol, all read-related requests to a cached memory address that is marked invalid have to be broadcast to all the other nodes of the computing system. This performance penalty in particular has been the motivation for introducing directory-based coherency protocols in cache-coherent NUMA (CC-NUMA) computing systems in particular.
However, employing directory-based coherency protocols in multiple-node computing systems, while improving performance, requires additional hardware to be added to these computing systems, increasing their cost. As a result, directory-based CC-NUMA computing systems, for instance, are suitable only for commercial applications where hardware cost is not a significant constraint. For other applications, such as in embedded systems, the hardware cost involved with employing a directory-based coherency protocol can be prohibitive, meaning that in effect such embedded systems usually incur a performance penalty to maintain cache coherency.
For these and other reasons, there is a need for the present invention.