The system of FIG. 1 is a prototypical prior art symmetric multiprocessor (SMP) system 100. This traditional approach provides uniform access to memory 130 over a shared system bus 110. Each processor 120 has an associated cache and cache controller. The caches are individually managed according to a common cache coherency protocol to insure that all software is well behaved. The caches continually monitor (snoop) the shared system bus 110, watching for cache updates and other system transactions. Transactions are often decomposed into different component stages, controlled by different system bus signals, such that different stages of multiple transactions may be overlapped in time to permit greater throughput. Nevertheless, for each stage, subsequent transactions make sequential use of the shared bus. The serial availability of the bus insures that transactions are performed in a well-defined order. Without strong transaction ordering, cache coherency protocols fail and system and application software will not be well behaved.
A first problem with the above-described traditional SMP system is that the serial availability of the bus limits the scalability of the SMP system. As more processors are added, eventually system performance is limited by the saturation of the shared system bus.
A second problem of traditional SMP systems is that multiple cycles are required to process each transaction. This is partially attributed to the use of multi-point tri-state busing of lightly pipelined transactions.
A third problem exists for existing SMP systems using pipelined bus structures. Difficulties may arise from permitting an initiator to perform locked operations. Normally, a simple priority scheme (such as a rotating priority) is used to permit all initiators to generate transactions on an equal access basis. Locked operations permit transaction initiators to make a number of subsequent transactions without surrendering the bus to other initiators in the short term. This is necessary to implement semaphores used to prevent race and deadlock conditions. Unfortunately, interactions between such locked operations and simple bus priority schemes may result in an initiator being starved for access for excessive periods.
What is needed is an SMP system architecture that provides greater scalability by permitting concurrent use of multiple buses, while still providing a system serialization point to maintain strong transaction ordering and cache coherency. What is also needed is an SMP architecture that further provides increased transaction throughputs. Additionally, an SMP architecture is needed to enable locked operations while preventing initiator starvation.
A preferred embodiment of a symmetric multiprocessor system includes a switched fabric (switch matrix) for data transfers that provides multiple concurrent buses that enable greatly increased bandwidth between processors and shared memory. A Transaction Controller, Transaction Bus, and Transaction Status Bus are used for serialization, centralized cache control, and highly pipelined address transfers. The shared Transaction Controller serializes transaction requests from Initiator devices that can include CPU/Cache modules and Peripheral Bus modules. The Transaction Bus of an illustrative embodiment is implemented using segmented buses, distributed muxes, point-to-point wiring, and supports transaction processing at a rate of one transaction per clock cycle. The Transaction Controller monitors the Transaction Bus, maintains a set of duplicate cache-tags for all CPU/Cache modules, maps addresses to Target devices, performs centralized cache control for all CPU/Cache modules, filters unnecessary Cache transactions, and routes necessary transactions to Target devices over the Transaction Status Bus. The Transaction Status Bus includes both bus-based based and point-to-point control of the target devices. A modified rotating priority scheme is used to provide Starvation-free support for Locked buses and memory resources via backoff operations. Speculative memory operations are supported to further enhance performance.