High performance computing (“HPC”) systems perform complex or data-intensive calculations using a large number of computing nodes. For example, some HPC systems may be used to multiply matrices that have thousands or even hundreds of thousands of rows and form outer products of vectors having hundreds of thousands of elements. HPC software developers break up such problems into smaller problems that may be executed with relative independence. For example, in a matrix multiplication C=A*B, calculating the value of an element in the matrix C requires as input only a single row of the matrix A and a single column of the matrix B. Thus, the overall multiplication can be divided into a number of independent sub-problems, each of which may be solved by a different node in the HPC system. Such systems use a parallel programming paradigm such as a message-passing interface (MPI).
MPI is a standardized and portable message-passing system that defines the syntax and core library routines that can be used by hardware vendors to create and control a distributed computing system.
Purchasers of an HPC system desire to obtain the best performance vs. cost. Thus, testing of system components (cores, co-processors, processors, switches and interconnect) with a known application to see how the components affect performance is desirable using analytical tools for the HPC system. Given the complexity and size of such systems, testing often occurs through simulation or testing of selected groupings of components. For example, a real HPC system may exist that has 100 nodes with 800 cores with an Infiniband switching fabric. A potential purchaser may wish to know what the impact would be if the switching fabric was switched to a different switching fabric (e.g. fibre channel, PCI express, serial ATA, etc.). A change to a slower switching fabric may result in latency in the transmission of instructions. Thus, the potential purchaser may wish to know whether the selection of a slower fabric would affect the performance of their application.
On way to perform such a test would be to replace the switching fabric in an existing HPC system. However, this is prohibitively expensive. Thus, there is a need for testing an HPC system without having to replace the components of the system.