Computer networks, e.g., local area (LAN's), wide area (WAN's), and others, abound and are increasing in number and variety. However, most installations are designed and used mainly for client-server applications, i.e., wherein multiple computers/workstations (clients) share the resources (e.g., application programs, data files, etc.) of the server, but otherwise operate independently. In such installations, various software systems provide the necessary message passing functions that allow the client stations and the server station of a network to communicate with each other.
Quite recently, a few networks have been organized into clusters of nodes that cooperate with each other so as to execute a single application problem in a parallel-distributed mode with no client-server relation between nodes. Many computing platforms have been designed to perform parallel-distributed computation. The essential idea is to distribute parts of an application problem to a group of processor elements and to organize these individual processors to run in parallel, independently, except at certain synchronization points where they must communicate their partial results to each other before continuing their computation tasks. As well, various schemes for doing the communication have been proposed. Some so-called supercomputers work this way, using special processor elements connected by special network hardware. However, with the advent of computer networking hardware-software systems, an alternative to the traditional supercomputer is available by using commodity computers (e.g. personal computers with standard operating systems) and commodity network hardware to connect the computers is in clusters. Cluster networks of simple design have been tested by the instant Applicant and others. Results indicate that, for many compute-intensive applications, clusters have the potential to provide a computing platform which greatly speeds up the execution of the application, and that it does so at comparatively moderate cost compared to present alternatives for use with such applications (e.g., supercomputers).
The simplest cluster network architecture consists of a single ethernet segment (e.g., a single cable or hub) and many network nodes (e.g., personal computers) connected to the segment through standard interface hardware. Communication is provided by a message-passing software system interfacing with standard message passing protocols (e.g. TCP/IP), and through them with software drivers for standard network interface cards (NICs). For many application problems, this simple architecture and the available message passing software system do not provide efficient speedup, resulting in high-cost or insufficient speedup, or both. Programming the mode of parallel-distributed execution noted above requires an efficient message-passing software system which provides a suite of commonly used message passing operations, such as node-to-node send and receive, and collective multi-node message passing operations, such as broadcast and all-gather (a standard repertoire of message passing operations for cluster programming is proposed in the document known as MPI (Message Passing Interface Forum. MPI: A Message Passing Interface Standard, Computer Science Department, Technical Report CS-94-230, University of Tennessee, Knoxville, Tenn. 1994) which is hereby incorporated by reference as though fully set forth herein. While several message passing software systems exist that do provide a suite of message passing operations which may be used in writing cluster application programs (e.g., ROCC95 software developed by the instant inventor, wherein "ROCC" stands for Reduced Overhead Cluster Communication), these software systems are based on standard, widely-used networking protocols (e.g., TCP/IP). Since some standard networking protocol is available in most computers/workstations as part of their operating system, these message passing software systems use this protocol layer as their interface with the network. Therefore, their design and implementation (i.e., algorithms and program code) are not part of any integrated hardware-software network system for cluster computing and, thus, are inefficient at best, if not inoperable, for most compute-intensive programming applications.
To obtain efficient speedup on a cluster, it must be possible for the application programmer to easily match his/her program to the cluster. Conversely, specific cluster hardware-software should match many applications to provide for economy of scale. As well, cluster design should allow the network architecture to be reconfigurable and scalable in size so as to match new and ever larger applications. Heretofore, such flexibility in design and application of cluster networks has not been realizable or practical.
The main hardware-software integration problem in designing a cluster's connectivity, i.e., network architecture, and a matching message passing software system, is how to reduce communication overhead to a point which allows sufficient and efficient (i.e., cost-effective) speedup of many applications by execution in a parallel mode. Since the parallel-mode involves both calculation on many nodes and communication of partial results between nodes, the reduction of communication overhead time relative to calculation time has been the subject of much research and development in the field of parallel computation. However, real cost-effective reduction of communication overhead has remained an open, unsolved problem--until now.