Computer systems are now being created which have multiple nodes that communicate over an interconnected network. Some of these systems have multiple processors per node. One system requirement is to have a coherent memory system between the nodes even though each node is a multiprocessor system in and of itself.
In a typical shared multiprocessor (SMP) system only the processors make requests to memory; the memories do not make requests back to processors. However, in a multi-node system other nodes must make requests to a local memory to obtain information from the local memory. It could happen, however, that the data desired by a remote processor could be checked out to the local cache memory. In such a situation, the local memory must make a request to the processor to tell the processor to copy the information back to memory so that it can be used at the requesting node. The problem is to design a system which contains an interconnect mechanism so that memories can talk to processors as well as processors being able to make requests to memory.
A coherent memory system, for purposes of this discussion, is a memory system where if one processor makes an access to memory (either a read or write) then all other processors in the system, if they are actively pursuing that same data, will obtain the most up-to-date copy of the data at all times. In such a situation, the processors will always obtain their data from the local cache on the processor. Thus, if the data is not in its cache, it will be put in the local cache by the node serving the memory containing the desired information.
Noncoherent memory operations are those operations where data goes directly to or from memory and is returned directly back to the processor and never goes through a cache.
One problem occurs when a processor makes an access to memory on its own node. Such an access is fairly quick (small latency) because it's essentially a local memory. If, on the other hand, the processor is accessing another node's memory, the memory access time is fairly long (long latency). The problem is to be sure not to increase the latency of the shorter memory accesses because they have to wait behind accesses to memories at remote nodes.
Another problem exists in such systems when multiple processors within a node have accessed the same piece of memory for read access, and a processor at one of the nodes now wants that data for write access. The requesting processor must be able to inform all other processors efficiently that they must invalidate that data so that the processor requesting write access has sole possession of that data. If this is not accomplished in a very quick manner, the access latency for the write access would become very large thereby slowing the entire system.
Compounding the problem of latency is the fact that before a processor can actually access data from its cache for write purposes, it must be certain that the other processors have completed their invalidation of the data in a weakly ordered consistency model. Thus, the processor making the write access must send invalidates to all of the other processors that have read access and get the responses back before it performs the write operation. This must be accomplished efficiently so that the write access does not take an inordinate length of time.
In addition, the system must be able to guarantee that there is no situation where the system will deadlock because the resource that is required to complete an operation on a first processor is being consumed by a second processor (or another transaction) and that second processor is waiting on resources that the first processor is holding. This condition is a circular deadlock that must be avoided.
A goal for the interconnect between nodes is to provide sufficient bandwidth so that the interconnect is not the limiting performance factor when executing a program. Due to the memory bandwidth requirements of today's processors, this goal will rarely be met. Therefore, an objective of any design is to provide as much bandwidth as possible for the nodal interconnect without violating other constraints (cost, space, power).