In general, cloud service providers maintain operational resources to meet service level agreements (SLA) with customers. The providers continuously monitor the performance metrics of the cloud services they provide to ensure the services' conformance to SLAs. However, because available tools may lack the capability to predict or detect impending SLA violations, the operational resources may be unable to circumvent the violations. Additionally, because the tools may lack the capability to diagnosis the root causes of SLA violations, the operations may take longer to resolve such violations when they do occur. As a result, the customer experience may be adversely affected.
Furthermore, such SLAs might require that data be analyzed systematically and actionable information in the data be acted upon proactively to avoid SLA violations and also to determine whether the agreement is being satisfied. Following the service level agreements and other requirements can be very burdensome, and can grow more burdensome with the passage of time.
For obtaining the capabilities mentioned above, what is needed are techniques that represent the system using high-level state models that are easily updated based on low-level events of the system and system measurements. With regards to obtaining metrics on low-level events, one can instrument application programs underlying the system to collect the exact measurements of the events. In such an approach, however, the instrumentation itself can affect the measurements. This problem can be more pronounced when the execution time of the instrumentation code around a method dominates the execution time of the method itself (e.g., if the invocation count of the method is high).