A computing environment can include multiple nodes in communication with one another via a network. The nodes are typically used to execute jobs, such as data-processing jobs. But the nodes can fail (completely or partially) for a variety of reasons. For example, the nodes can fail in response to filesystem or kernel errors. And node failures can lead to a variety of other problems. For example, each node in the computing environment will have certain computing resources allocated to it. And if a node fails, these computing resources typically cannot be released until the node is rebooted.