As it is known in the art, multiprocessor computer systems generally include a plurality of processing systems coupled to a common, shared bus for communications with a shared memory device. In addition, an I/O subsystem may also be coupled to the bus for communication with the memory device.
Each of the processing systems generally includes a cache memory for temporary storage of data from the shared memory device. The cache allows the processor to process an instruction stream and modify data independent of the activities of the other processors. A problem arises, however, when more than one of the caches in the respective processing systems needs to utilize data which is currently being modified in a cache of a different processing system. In such an event, some arbitration protocol must be implemented to ensure that the data used by the respective caches remains coherent.
Because all of processors are coupled to the memory device via a shared memory bus, each of the processors `see` the data that is provided to or received from the memory device in the same order. Therefore, each processor can regulate the contents of their internal cache to ensure that it contains the most updated data. Multiprocessor systems have typically used the shared bus arrangement because it provides a straight forward mechanism for maintaining cache coherency among caches in a plurality of different processing systems.
However, there are some limitations to the shared bus arrangement. The performance of a multiprocessor system is generally a function of the cycle time of the shared bus. In order to increase the performance, the cycle time of the shared bus must be decreased. However, in order to decrease the cycle time of the bus, the number of processors coupled to the bus and the length of the bus must be decreased. Thus it is difficult to provide a shared bus with a desired cycle time that is capable of supporting all of the processors which are required in a multiprocessor system.
Present day technology is finding it difficult to build busses with a cycle time faster than about 10 nanoseconds. Even so, with processor cycle times decreasing into the 2-3 nanosecond range, the performance of multiprocessor systems are constrained by the performance of the shared bus. It would be desirable to provide a multiprocessor architecture which would be able to utilize the increasing performance provided by present day processors.