1. Technical Field
This disclosure generally relates to massively parallel computing systems, and more specifically relates to parallel debugging of software executing on a large number of nodes of the massively parallel computer system by gathering a differences of data in the memory of the nodes.
2. Background Art
Supercomputers continue to be developed to tackle sophisticated computing jobs. These computers are particularly useful to scientists for high performance computing (HPC) applications including life sciences, financial modeling, hydrodynamics, quantum chemistry, molecular dynamics, astronomy and space research and climate modeling. Supercomputer developers have focused on massively parallel computer structures to solve this need for increasingly complex computing needs. One such massively parallel computer being developed by International Business Machines Corporation (IBM) is the Blue Gene system. The Blue Gene system is a scalable system with 65,536 or more compute nodes. Each node consists of a single ASIC (application specific integrated circuit) and memory. Each node typically has 512 megabytes of local memory. The full computer is housed in 64 racks or cabinets with 32 node boards in each. Each node board has 32 processors and the associated memory for each processor. As used herein, a massively parallel computer system is a system with more than about 10,000 processor nodes.
The Blue Gene supercomputer's 65,536 computational nodes and 1024 I/O processors are arranged into both a logical tree network and a logical 3-dimensional torus network. Blue Gene can be described as a compute node core with an I/O node surface. Each I/O node handles the input and output function of 64 compute nodes. The I/O nodes are connected to the compute nodes through the tree network and also have functional wide area network capabilities through its built in gigabit ethernet network.
On a massively parallel computer system like Blue Gene, debugging the complex software and hardware has been a monumental task. Prior art systems for parallel debugging are effective for a few thousand nodes, but are unscalable to the number of nodes in massively parallel systems. The typical prior art debugging system requires sending a great deal of data from the compute nodes to a front end node for processing. Sending data to the front end node is inefficient and may overwhelm the front end node resources and the network used for transferring the data.