Monitoring servers in an enterprise environment can be a daunting task due to the sheer volume of servers deployed throughout an enterprise. For example, it is not uncommon for a large enterprise to deploy hundreds of thousands of servers, all of which require some level of monitoring. In addition, the servers may differ based on Operating Systems (OSs) and the role they are employed for within the enterprise (e.g., test/non-production, production and the like) and the category of applications hosted thereon. Monitoring of servers entails tracking the performance of key/core parameters associated with the server to identify problems that are critical, or are soon to be critical and, in response to identifying the problems, notifying the proper personnel so that corrective actions can be taken to resolve the problem.
Currently most monitoring of servers is done on an individualized basis, meaning the entity responsible for the server, or the entity responsible for the applications running on the server not only define which parameters should be monitored, but also define the threshold values for monitoring (i.e., the limits which prompt actions, such as alerts or the like). Such individualized decisions on which parameters should be monitored and at what threshold/level tend to be arbitrary and subjective.
For example, if a responsible entity defines an allowable Central Processing Unit (CPU) usage threshold at 80% for a specified server and, after a period of time (e.g., six months or the like) the computational demands on the CPU increase, numerous false alerts are likely to be generated when the CPU usage exceeds the subjective 80% usage threshold. However, in addressing the false alert issue, if the responsible entity arbitrarily adjusts the threshold to 95% prompting fewer alerts from being generated, underlying problems (e.g., an application using a high volume of CPU cycles) may go undetected.
Therefore, a need exists to develop systems, apparatus, computer program products, methods and the like that provide a unitary means of consistently monitoring all of the servers with an enterprise environment. The desired approach should eliminate the individualized approach to monitoring servers, whereby each server has its own specified server parameters being monitored and its own threshold values for the specified server parameters. Moreover, the desired systems, apparatus, computer program products, and methods should provide a meaningful and optimal way to monitor such that critical problems apparent in a server are identified in a timely manner so that appropriate corrective action can ensue. In this regard, the desired approach should result in a self-evolving means that identifies the most optimal threshold values for crucial/core performance parameters.