Storage cluster systems may have multiple nodes that cooperate to provide fault tolerant storage, in part by maintaining identical copies of stored client data within separate storage devices that are each normally under the control of separate nodes. A feature of such storage cluster systems may be the ability to arrange for one node to continuously monitor and remain prepared to take over for another node on an occasion where that other node may suffer a malfunction or otherwise fail to function normally.
A node that takes over for another node takes over the performance of various functions that were performed by that other node, including storing client data within and/or retrieving client data from one or more storage devices that were under the control of the other node. Alternatively or additionally, the node that takes over for the other node may take over communications with one or more client devices from which requests to store and/or retrieve client data may be received, and to which responses may be required within a predetermined maximum period of time.
For one node to take over for another node, and thereby take on the performance of functions of that other node in addition to performing its own functions, the node that takes over for another node must have sufficient processing, memory, network bandwidth and/or other resource(s) to have sufficient “headroom” to successfully take over those functions of that other node. Approaches to determining whether a node has such headroom available are usually oversimplified to the extent of achieving inaccurate and/or otherwise unusable results.