The implementation of complete multiprocessor systems, and, in particular, Symmetrical Multiprocessing (SMP) system in a single monolithic device has grown in popularity in recent years, fueled by the increasing density of VLSI devices and emergence of computation tasks with increasing complexity, such as those required for real-time machine-vision. In some multiprocessor systems, memory resources are shared by a plurality of processors. Such sharing, however, may create memory coherency issues and performance bottlenecks.
In U.S. Pat. No. 7,529,799, whose disclosure is incorporated herein by reference, the inventors present a distributed system structure of a large SMP system, using a bus-based cache-coherence protocol. The distributed system structure contains an address switch, multiple memory subsystems, and multiple master devices, either processors, I/O agents, or coherent memory adapters, organized into a set of nodes supported by a node controller. The node controller receives transactions from a master device, communicates with a master device as another master device or as a slave device, and queues transactions received from a master device. Since the achievement of coherency is distributed in time and space, the node controller helps to maintain cache coherency. In addition, a transaction tag format for a standard bus protocol is expanded to ensure unique transaction tags are maintained throughout the system. A sideband signal is used for intervention and Reruns to preserve transaction tags at the node controller in certain circumstances.
In U.S. Pat. No. 7,237,071, whose disclosure is incorporated herein by reference, an SMP system having parallel multiprocessing architecture composed of identical processors and including a single program memory is presented. Program access arbitration logic supplies an instruction to a single requesting central processing unit at a time. Shared memory access arbitration logic can supply data from separate simultaneously accessible memory banks or arbitrate among central processing units for access. The system may simulate an atomic read/modify/write instruction by prohibiting access to the one address by another central processing unit for a predetermined number of memory cycles following a read access to one of a predetermined set of addresses in said shared memory.