Many high performance computing systems may use a message passing system to coordinate communications between processors operating in parallel. The message passing system may send data and status between different processors as those processors perform their work.
Applications that use a message passing system often perform complex calculations in parallel. Such applications may be finite element analysis, computational fluid dynamics, complex visual renderings, or other computationally expensive operations. In some cases, the applications may process a single problem for many hours or even days. When a failure occurs due to a hardware or software issue, many such applications may not be able to recover and may then be restarted.