The demand for the development of high performance computers has required system designers to seek new computer architectures which maximize the performance of the available hardware and software. One such approach to high performance computing has been massively parallel processing systems employing up to tens of thousands of processors simultaneously working together to solve complex problems. One particular means of implementing massively parallel processing, which is rapidly gaining acceptance, involves architectures using clusters of processing nodes each composed of one or more standard microprocessors and distributed memories. The nodes themselves are interconnected by various networks such that all nodes can, among other things, communicate with each other, share operating system services and share input/output devices. While such architectures have substantial advantages, the limitations on the available hardware has caused difficulties in its actual implementation.
The communications bandwidth of the state of the art microprocessors becoming available is beginning to exceed the bandwidth of the available standard network interconnects. Further, even though new interconnect networks have recently been developed, the state of the art of the available components and interconnection media necessary for implementing these new networks still remains a limitation on bandwidth. While it may be possible to develop a completely new standard for interconnection networks, such an effort would not be cost effective and would not immediately be available for wide usage.
Another consideration in the design of high performance computing systems is the organization of cache memory. Cache memory requires particular consideration in architectures such as those discussed above where clusters of processing nodes are being used. In these cases, a cache coherency scheme must be provided which is operable not only within the processing nodes, but also compatible with the interconnection networks. Because of the limitations on the currently available language and compiler technology, such a cache coherency scheme is preferably implemented in hardware rather than software. Any hardware implementation however must effectively use the available components, be organized for efficient data flow, operate in accordance with the required interfaces between the microprocessors and the interconnection network, and provide for increased bandwidth.
Thus, the need has arisen for an improved processing system architecture for implementation of massively parallel processing which overcomes the disadvantages of currently available massively parallel processing schemes. In particular, the improved architecture should include an interconnection network scheme which provides increased bandwidth without resorting to the creation of a new networking standard. Further, such an improved architecture should efficiently provide for coherent cache memory using hardware.