1. Field of the Invention
The present invention relates to multi processor systems, such as High Performance Computing (HPC) systems with a high degree of inter-processor communication.
2. Description of the Related Information
Communication between software entities (applications) on different host computers is frequently carried in packets over standard transmission protocols, such as TCP. Many application programs may be running concurrently on each computer, and methods have been developed to allow such programs to communicate independently. The operating system in each computer, specifically the part of the operating system referred to as the “operating system kernel” or “kernel”, has the task of managing the processes under which the application programs run. The kernel also provides the communications services for the entire computer, in that it mediates between the application programs and the hardware such as Ethernet interfaces or customized I/O interfaces that provide the circuitry for receiving and sending data packets. An example of an operating system so structured is Linux.
In a system such as a massively parallel multi-processor system, or “super computer” that contains a large number of computing modules, a very large number of communication paths may be required to carry data from the memories of one computing modules to the memories or the CPU the other computing modules. A common example of a distributed application in which such data communication occurs is the computation of certain mathematical algorithms such as matrix multiplication. A full mesh interconnection of N computing modules would require N×(N−1) independent data communication paths to allow every computing module to communicate directly with each of the other computing modules.
State of the art HPC systems are multi-processor systems with a high degree of inter-processor communication. Such systems are designed to provide the capability to run distributed applications. A distributed application may be designed using the Message Passing Interface (MPI) library for inter-process communication. Another method of programming an HPC system or super computer is based on the UPC (Unified Parallel C) programming language, which provides programmers with the capability to write a single program that will run on the multiple CPUs of the system while using the memory units of the CPUs as a shared distributed memory. Both the MPI standard, published as “MPI: A message-Passing Interface Standard, November 2003, © 1993, 1994, 1995, University of Tennessee, Knoxville Tenn.) and the UPC programming language specification (published by the UPC Consortium, May 2005) are hereby incorporated by reference in their entireties.
In either case, the communication path from one process running in one computer to another process running in another computer must by necessity traverse a physical interconnect network as well as the software/hardware interface in each computer. Modern computer operating systems such as Linux are multi-tasking process oriented and include a kernel that schedules the processes (e.g. application processes) to run, and that provides the interfacing to the hardware input/output (I/O) devices.
The overhead, both in terms of processing power and latency that is associated with the inter-process communication based on standard protocols, is a major performance bottleneck in HPC systems. This overhead includes the number of CPU cycles associated with context switching between application processes, and the corresponding memory accesses. Commonly assigned U.S. patent applications “High Performance Memory Based Communications Interface”, Ser. No. 60/736,004, filed on Nov. 12, 2005 and “Methods And Systems For Scalable Interconnect”, Ser. No. 60/736,106, filed on Nov. 12, 2005 disclose data communications protocols that may be advantageously used to reduce latency. The goal of high performance computing is to apply the combined CPU instruction cycles, measured in Teraflops or Petaflops, of many CPUs to solving a computational problem. Inter-processor communication is a necessary evil, and any CPU cycles spent while a CPU is waiting for data to arrive are cycles that are not available for problem solving.
The latency, from one running application process in one CPU to an application process in another CPU, is the sum of the hardware delay, the communications protocol processing in the kernels of both CPUs, and the interaction between the kernel and the I/O hardware. In order to achieve very high performance in a distributed multi-processor system, any reduction in this latency is believed to be valuable and worthwhile.