With an IT Solution, the basic means to articulate business level objectives for a service application between the service provider and a service consumer is a service level agreement (SLA) that provides the parameters, objectives and acceptable thresholds related to performance, security, availability, business continuity, and response time requirements. The objectives typically are explicitly defined in service level objective clauses (SLOs). Whenever these objectives are not met by the service provider, a penalty is usually incurred for noncompliance.
Today, consideration and optimization of business goals and IT performance parameters is not concurrently handled by IT performance management tools but rather by human experts. That is, human experts are needed to set the IT level SLOs in an SLA in order to optimize business level objectives, including SLOs that cover the response time for an IT solution. Typically, response time SLO clauses define (i) target average response times of service for transactions; (ii) means for sampling average response time for verification of compliance with target values; (iii) sampling frequency; (iv) penalty terms and (v) compliance evaluation period.
Generally, if the target response times are met, no penalty is incurred on the service provider. Currently, a simple percentile analysis is used to identify an acceptable response time for a given IT solution. A cumulative distribution function Fn may be computed for a historical response times sample of sufficiently large size n. This function may be applied to the level of compliance as specified in the response time SLO clause to yield the needed target response time.
Unfortunately, when setting target response time thresholds via percentile analysis, even though the total number of breaches detected during an evaluation period is within the allowed total number of breaches (i.e., within the “Breach Budget”), the SLO may be suboptimal from other business perspectives. In other words, the same transactions executed at different times (i.e., usage windows) during a business day may carry a different financial gain/loss for the service provider which is not taken into account using the simple percentile analysis.
Thus, optimization methods and systems are needed that can overcome the aforementioned shortcomings by observing and evaluating response time distribution across multiple usage windows of varying business importance.