The aim of parallel processing is to utilize a number of processing elements that can communicate and cooperate to solve a problem. In a highly parallel processing system, hundreds of processing elements are used to solve a problem that is spread over many processing elements. Not all of the processing elements are used to run a single problem and the system can be configured to execute multiple problems simultaneously. By contrast, in a low parallel processing system, tens of processing elements are used to solve an entire problem.
Symmetric multiprocessing (SMP) is one such type of low parallel processing system. A SMP system is characterized by "symmetric" processors that each have an equal share and access to the system resources, including memory and I/O. The processors are managed by a single operating system that provides an application program with a single view of the entire system.
FIG. 1 illustrates one such shared memory SMP 100. There is shown a number of symmetric processors 102A-102N interconnected by a bus 104. A main memory 106 is provided that is connected to the bus 104 and shared by each of the processors 102. In addition, I/O devices 108 are connected to the bus 104 and are accessible by each processor 102 and the main memory 106. Each of the components of the system 100 are synchronized to a common system clock 110.
In order to reduce the traffic to the main memory 106, each processor 102 has a local cache memory 112 that can contain shared data. Since the data in the each processor's cache 112 can be shared by each processor 102, the problem then becomes one of cache coherency. In most SMP systems, a snoopy bus protocol is used to maintain cache coherency. In a snoopy bus protocol, a memory access transaction, such as a read or write, is broadcasted to all the processors 102 connected to the bus 104. Each processor 102 monitors or "snoops" the bus 104 for a memory access transaction that pertains to a cache line that is associated with the processor's cache 112. When the processor 102 finds such a transaction, it takes appropriate action to ensure that each cache line is coherent within the system 100.
There are several disadvantages with this type of SMP system. The primary disadvantage is the use of the bus as the interconnect structure. Although the use of the bus provides cache coherency, it is a limiting factor for improving the system's throughput. First, the use of the bus constrains the number of transactions that can be processed simultaneously. The same bus is used to process both memory and I/O transactions initiated by each processor. As such, only one transaction can be processed at a time.
Second, the contention for the bus by each processor to access main memory unnecessarily increases the overhead in servicing a memory access transaction. Various approaches have been tried to overcome this limitation such as increasing the width of the bus, running the bus at a higher clock speed, and increasing the size of the caches. However, each of these approaches greatly increases the expense and complexity of the system.
Another limitation with the use of the bus are the well-known transmission line effects associated with buses. These transmission line effects are attributable to the complicated electrical phenomenon present in the connections made to each device coupled to the bus. These transmission line effects limit the speed at which the bus operates thereby reducing the system's throughput.
Accordingly, there exists a need for a SMP system that overcomes these shortcomings.