1. Technical Field
The present invention is generally directed to allocation of computer system resources based on service level agreements (SLAs). More specifically, the present invention is directed to an apparatus and method for allocating computer system resources based on predictions of whether a SLA will be breached and the associated costs.
2. Description of Related Art
In the information technology (IT) based business place of today in which information services are provided by computer resource suppliers and are consumed by information services consumers, requirements for service level guarantees have created a demand for accountability that transcends enterprise and service provider environments. Inside enterprise organizations, service commitments are needed to justify astronomical expenditures for IT infrastructure. Service providers must prove the value of services being delivered, particularly in light of the fact that these services are often obtained at a premium price—failure to deliver may mean the success or failure of the service provider. For both, service level agreements (SLAs) define the terms for measuring service accountability. Service Level Management (SLM) enables the definition, measurement and reporting of SLA compliance.
SLAs can apply to almost any service imaginable. Historical examples for IT include the outsourcing of wide area network (WAN) managed services or remote local area network (LAN) services. Other cases may involve the combination of technologies for a given business service such as network and servers that support an overriding quality of service (QoS) for an enterprise financial application. Trends show that application and transaction-oriented SLAs are on the rise.
Increasingly, IT managers and service providers are seeking flexible, standards-based SLM tools to measure adherence to SLAs. The challenges of delivering IT services center around ensuring end-to-end availability and performance across diverse technological infrastructure with the goal of maintaining and improving end-user satisfaction. One solution that addresses these issues with measuring the adherence to SLAs is IBM Tivoli's Service Level Advisor™ (hereafter, Service Level Advisor).
The Service Level Advisor provides a number of SLM functions that aid in simplifying the building, managing and reporting of SLAs. Service Level Advisor provides for automatic discovery of service-level resources, automatic evaluation of service-level agreements, and provides trend analysis capabilities. Service Level Advisor uses system management information stored in a data warehouse on service-level metrics and available components. This information is automatically available for selection as part of an SLA. In addition, Service Level Advisor automatically compares the terms of the SLAs (such as metrics, thresholds, business schedules, etc.) with monitored data from IT environments and generates alerts when any of the terms of the SLA are violated.
Of particular note, the Service Level Advisor uses a trend-analysis algorithm to proactively maintain service levels. The Service Level Advisor uses a linear-based algorithm and an exponential stress detection algorithm that provide predictive abilities to provide advance warning of breaches of an SLA and facilitate the fixing, optimizing and protecting of IT service elements.
FIG. 1 is an exemplary diagram of the architecture for the Service Level Advisor. The Service Level Advisor is a SLM solution composed of multiple components that are fully integrated in a seamless fashion. At the core of the solution is the data warehouse 110, where all the data from external sources is stored. The data warehouse 110 collects data from multiple sources, such as IBM Tivoli's Security and Storage Tools™, Business Systems Manager™, IBM Tivoli Monitoring for Transaction Performance™ (formerly Tivoli Web Services Manager™ and Tivoli Application Performance Manager™), IBM Tivoli Enterprise Console™, databases, log files, and mainframe applications as well as custom and third-party applications data. Once the data from these various sources is collected in the data warehouse 110, the data is aggregated and correlated by the data warehouse manager 115.
Data specific to defined service level offerings (SLOs) within the Service Level Advisor are rolled from the data warehouse 110 into the SLM Datamart 120. The data rollups consist of delta data so that the impact to the infrastructure from the data transfer is kept to a minimum. The SLM Datamart 120 contains measurement data of how monitored components are doing in comparison to SLOs. The SLM Datamart 120 also contains summary data as a result of service level evaluations. It is the data in the SLM Datamart 120 that is used for Service Level Advisor reporting and trend analysis.
The SLM database 130 stores the definitions of the data sources. For example, IMB Tivoli Distributed Monitoring™ is a data source and within this application, the user has visibility into disk, cpu, memory, network, as well as process monitors. This information becomes visible to the user through the SLM database 130 and is also available as service threshold triggers in the Service Level Advisor. New data source definitions are rolled up to the SLM database 130 from the data warehouse on a user specified interval. It is these components within the SLM database 130 that are used in the building of an order, i.e. a SLA.
The server components that make up the Service Level Advisor solution are the administrative server 140, the SLM server 150, and the reports server 160. These components can reside on a single system or can be distributed. The administrative server 140 facilitates the definition and maintenance of offerings and orders (i.e. SLOs and SLAs), while the SLM server 150 provides the core services monitoring functionality. That is, the SLM server 150 determines when service breaches occur or predicts when they will occur through its trend analysis capabilities and notifies the user.
The reports server 160 within the Service Level Advisor, enables viewing of the data within the SLM Datamart 110. These reports can be pulled up and viewed in an ad-hoc fashion or they can be scheduled. Examples or reports that may be viewed using the reports server 160 include aggregate and summary views of data targeted at the Executive, Customer, and Operations level, reports showing trends towards violations, actual SLA violations, results achieved, as well as SLA components.
Typically, computer resources are allocated to service consumers as the resources are required within the confines of the SLAs and if such resources are available. That is, the SLM systems, such as Service Level Advisor, monitor the demand for computer resources, the current allocation of computer resources, the various metrics representative of the level of service being provided, and the SLA with the service consumer. If the demand is high enough such that the current allocation of computer resources does not provide a level of service in the SLA requirements, as determined from the monitored metrics, and there are available resources that may be allocated, these computer resources are then allocated to the service consumer such that the minimum level of service agreed upon in the SLA is met.
However, the pool of available computer resources is finite and may not be sufficient to meet all the demands of the various service consumers being serviced. For example, if a cluster of servers is the host of a plurality of popular web sites, the demand for bandwidth, processor usage, and the like, in order to achieve the minimum levels of service under each of the SLAs for the various web sites may not be possible with the finite resources available. This will lead to breaches of SLAs with the service provider having to pay a penalty to the service consumer for not having met the minimum requirements of the SLA, or worse—loss of business from the service consumer.
The penalties for breaching a SLA may be significant depending upon the particular service consumer and the agreement reached with them. It would be beneficial to be able to minimize the amount of loss due to breaching of SLAs because of limited available resources. Therefore, it would be advantageous to have an improved method, apparatus, and computer instructions for weighing the penalties of breaching SLAs to determine a lowest cost alternative for resource allocation.