The present invention relates to computer systems having multiprocessor architectures and, more particularly, to a distributed parallel messaging unit for high throughput networks.
To achieve high performance computing, multiple individual processors have been interconnected to form a multiprocessor computer system capable of parallel processing. Multiple processors can be placed on a single chip, or several chips—each containing one or more processors—become interconnected to form single- or multi-dimensional computing networks into a multiprocessor computer system, such as described in co-pending U.S. Patent Publication No. 2009/0006808 A1 corresponding to U.S. patent application Ser. No. 11/768,905, the whole contents and disclosure of which is incorporated by reference as if fully set forth herein, describing a massively parallel supercomputing system.
Some processors in a multiprocessor computer system, such as a massively parallel supercomputing system, typically implement some form of direct memory access (DMA) functionality that facilitates communication of messages within and among network nodes, each message including packets containing a payload, e.g., data or information, to and from a memory system, e.g., a memory system shared among one or more processing elements.
Generally, a uni- or multi-processor system communicates with a single DMA engine to initialize data transfer between the memory system and a network device (or other I/O device). However, with increasing bandwidth requirements (and increased number of processors on a chip), a single DMA can not keep up with the volume of message communication operations required for high performance compute and I/O collective operations.
Further in the art, multi-channel DMAs that provide multiple channels from one source to one destination in a time multiplexed manner (such as described in U.S. Pat. No. 6,738,881) and with scheduled ports.
In a highly optimized high-bandwidth system, it is desirable to provide for alternate system architectures, for example such as star, or point-to-point implementations.
It would thus be desirable to provide in a multiprocessor system a distributed parallel messaging unit for configuring high throughput networks, for example, that implement such alternate system architectures.