To meet growing data storage and application requirements, server pool are increasing in size. Accordingly, data center hosts are required to manage a large number of servers having various failure rates in a manner that provides a high quality of service for users. Monitoring systems are often used to determine server health by periodically pinging the servers to determine whether the servers are up or down. If a server is determined to be down based on the monitoring, a mitigation action can be taken such as rerouting network traffic away from the down server.
For example, receipt of a negative acknowledge message in response to a ping message sent to a server, such as a connection denied message, route unavailable message, or a message indicating a network condition that makes it impossible to reach a server, can cause the monitoring system to consider the server to be down. In another example, a lack of a response from a server to a ping message within a timeout period can also cause the monitoring system to consider the server as being down.
Unfortunately, current monitoring systems are only able to monitor servers and make a binary decision as to whether a server is currently up or down, which may provide false information if a server is oscillating between states. Additionally, monitoring only for server failure does not provide any insight regarding the quality of service currently being provided by a server or how close a server may be to reaching its capacity. Accordingly, prior failure monitoring techniques are not robust and do not provide sufficient information to make early and effective decisions with respect to the management of a server pool.