The present invention relates to cloud computing, and more specifically to an on demand cloud monitoring mechanism to quickly identify the root cause of cloud operation problems.
Monitoring is necessary in cloud environments to make sure that service level agreements (SLA) with the cloud customer are met. With more and more micro services and de-centralized applications using clouds within the cloud environment, the root cause of a failure in a whole application/service clusters is hard to identify via simple single tier monitoring. However, increased monitoring increases the resources consumed and in the cloud environment it is difficult to identify the root causes when a cloud operation failure occurs. The primary task of monitoring in a cloud environment is to find the potential problems in the system, and provide data for analyzing.
Currently within cloud environments, the monitoring scope can be manually adjusted to aid in determining a root cause of a failure within the environment. However, manually adjusting the monitoring scope is hysteretic and mistakable in large scale IT systems, the response as to why the failure has occurred is slow and an administrator still does not know which service host with which application in the cloud environment had the failure.
Another solution to determining the root cause of a failure within a cloud environment is to monitor granularity based on a situation which can be triggered by monitored key performance indicators (KPIs). This solution does not change the monitoring scope, and just changes the monitoring granularity (e.g. Level 2→Level 3) in the same component.