1. Field of the Invention
The present invention relates generally to a class network routing, and more particularly pertains to class network routing which implements class routing in a network such as a computer network comprising a plurality of parallel compute processors at nodes thereof, and which allows a compute processor to broadcast a message to one or more other compute processors in the computer network, such as processors in a column or a row. Normally this type of operation requires a separate message to be sent to each processor. With class network routing pursuant to the invention, a single message is sufficient, which generally reduces the total number of messages in the network as well as the latency to do a multicast.
The present invention relates to the field of message-passing data networks, for example, a network as used in a distributed-memory message-passing, parallel computer, as applied for example to computation in the field of life sciences.
The present invention also uses the class function on a torus computer network to do dense matrix calculations. By using the hardware implemented class function on the torus computer network it is possible to do high performance dense matrix calculations.
The present invention also relates to the field of distributed-memory, message-passing parallel computer design and system software, as applied for example to computation in the field of life sciences. More specifically it relates to the field of high performance linear algebra software for distributed memory parallel supercomputers.
2. Discussion of the Prior Art
A large class of important computations can be performed by massively parallel computer systems. Such systems consist of many compute nodes, each of which typically consist of one or more CPUs, memory, and one or more network interfaces to connect it with other nodes.
The computer described in related U.S. provisional application Ser. No. 60/271,124, filed Feb. 24, 2001, for A Massively Parallel Supercomputer, leverages system-on-a-chip (SOC) technology to create a scalable cost-efficient computing system with high throughput. SOC technology has made it feasible to build an entire multiprocessor node on a single chip using libraries of embedded components, including CPU cores with integrated, first-level caches. Such packaging greatly reduces the components count of a node, allowing for the creation of a reliable, large-scale machine.
A message-passing data network serves to pass messages between nodes of a network, each of which can perform local operations independently of other nodes. Nodes can act in concert by passing messages between them over the network. An example of such a network is a distributed-memory parallel computer wherein each of its nodes has one or more processors that operate on local memory. An application using multiple nodes of such a computer coordinates the actions of the multiple nodes by passing messages between them. The words switch and router are used interchangeably throughout this specification.
A message-passing data network consists of switches and links, wherein a link merely passes data between two switches. A switch routes incoming data from a node or link to another node or link. A switch may be connected to an arbitrary number of nodes and links. Depending on their location in the network, a message between two nodes may need to traverse several switches and links.
Prior art networks efficiently support some types of message passing, but not all types. For example, some networks efficiently support unicast message passing to a single receiving node, but not multicast message passing to an arbitrary number of receiving nodes. Efficient support of multicast message passing is required in various situations, such as numerical algorithms executed on a distributed-memory parallel computer, which is a requirement in the applications disclosed herein for dense matrice inversion using class functions.
Many user applications need to invert very large N by N (N×N) dense matrices, where N is greater than several thousand. Dense matrices are matrices that have most of their entries being non-zero. Typically, inversion of such matrices can only be done using large distributed memory parallel supercomputers. Algorithms that perform dense matrix inversions are well known and can be generalized for use in distributed memory parallel supercomputers. In that case a large amount of inter-processor communication is required. This can slow down the application considerably.