Barrier synchronization is a key primitive of parallel programming. It can be applied either between cores that share a cache of a memory, or between clusters of cores, where each cluster has its local memory, and where clusters are connected by a network.
A hardware synchronization barrier is mentioned in [Benoît Dupont de Dinechin, Pierre Guironnet de Massas, G. Lager et al. “A Distributed Run-Time Environment for the Kalray MPPA-256 Integrated Manycore Processor”, International Conference on Computational Science (ICCS), Volume 18, pages 1654 to 1663, Barcelona, Spain, 2013, Elsevier].
According to this article, each core or cluster of cores has mailboxes that can be configured in a synchronization mode. In this mode, the payload of an incoming message is bitwise OR-ed with the previous content of the mailbox, and a master core is only notified if the new content has all bits to 1.