Cluster computing typically uses a group of linked computers that can work together to form a single cluster computer. The computing components of a cluster computer are commonly connected to each other using computer networks such as a fast local bus, fast local area networks, a blade network, or even across the internet. Clusters are usually deployed to improve performance and availability over a single computer alone, while typically being more cost-effective than a single computer of comparable speed or availability.
A cluster computer can receive jobs from a user, and these jobs may be divided up into a number of tasks and processes for the tasks that are assigned across a plurality of compute nodes. The compute nodes may be a single computer, server, or a processor that can accept many processes from a job. Alternatively, each computer or server may execute one process from the job. However, when a job on cluster processing system crashes or terminates unexpectedly then the user may receive information that an overall job has failed but the user will not receive any other information about why the job failed.