Clusters of commodity computers are increasingly the platform of choice for running computationally intensive jobs in a variety of industries. Computations ranging from wind-tunnel simulations, gene and protein analysis, drug discovery, and many others are run on commodity computers with increasingly successful results. A typical cluster configuration as in FIG. 1 may comprise a collection of compute servers 110a . . . n, connected by a fast commodity network (typically 100 Mbps or 1 Gbps Ethernet), and a smaller number of machines acting as storage servers 120. Users that want to use the system submit jobs through one or more gateway machines 130 that are responsible for providing an interface between the users 140 and the cluster network 100, scheduling work on the cluster machines 110, 120 and returning the results of the jobs to the users.
The roles of machines 110, 120 in such a cluster need not be exclusive and membership of machines in the cluster may be transient or persistent. Most of the work done for such clusters to-date has focused on solving a number of important problems, such as, discovery of idle resources; management of job priorities; dealing with faults and the transient nature of compute servers; and automatic configuration of a smaller cluster out of a pool of resources based on an end-users description of their computational needs.