A distributed computing system may be composed of many computing nodes. Depending on the architecture of the distributed system, each of these nodes may comprise numerous hardware components such as memory, storage devices and so forth, all of which have the potential to fail. In addition, the software components operating on the computing nodes may be subject to various faults, crashes, performance degradations and so forth. In some cases, the faults may cause the immediate failure of a computing node, while in others the fault may portend an approaching failure. In yet other cases, a fault may be safely ignored. Accordingly, management of a distributed computing system may involve investigating and responding to numerous faults and conditions. A delayed response to these faults and conditions may have a variety of detrimental effects, such as increasing the risk of a serious failure or increasing the length of time a computing node is unavailable.