Large-scale shared memory multi-processor computer systems typically have a large number of processing nodes (e.g., with one or more microprocessors and local memory) that cooperate to perform a common task. Such systems often use some type of synchronization construct (e.g., barrier variables or spin locks) to ensure that all executing threads maintain certain program invariants. For example, such computer systems may have some number of nodes that cooperate to multiply a large matrix. To do this in a rapid and efficient manner, such computer systems typically divide the task into discrete parts that each are executed by one of the nodes. All of the nodes are synchronized (e.g., when using barrier variables), however, so that they concurrently execute their corresponding steps of the task. Accordingly, such computer systems do not permit any of the nodes to begin executing a subsequent step until all of the other nodes have completed their prior corresponding step.
To maintain synchronization among nodes, many such computer systems often use a specialized variable known in the art as a “synchronization variable.” Specifically, each time a node accesses the memory of some other node (referred to as the “home node”) or its own memory (the accessing node thus also is the home node in such case), the home node synchronization variable changes in a predetermined manner (e.g., the synchronization variable may be incremented). Some time thereafter, the home node transmits the changed synchronization variable to requesting system nodes (either automatically or in response to requests from the remote nodes). This transmission may be in response to a request by the remote nodes. Upon receipt, each remote node determines if the incremented synchronization variable satisfies some test condition (e.g., they determine if the synchronization variable equals a predetermined test variable). If satisfied, then all remote nodes can continue to the next step of the task. Conversely, if not satisfied, then the remote nodes must wait until they subsequently receive a changed synchronization variable that satisfies the test condition. To receive the changed synchronization variable, however, the remote nodes continue to poll the home node.
Undesirably, these repeated multidirectional transmissions and corresponding coherence operations can create a network hotspot at the home node because, among other reasons, the request rate typically is much higher than its service rate. Compounding this problem, the total number of repeated transmissions and remote node requests increases as the number of nodes in large-scale shared memory multi-processor computer systems increases. Such repeated transmissions/requests thus can congest data transmission paths, consequently degrading system performance.