The present invention relates to computer networks, and particularly to the assignment of infrastructure resources in the context of an internet and intranet.
Organizations of all sizes are using the World Wide Web (xe2x80x9cWebxe2x80x9d) for commerce and to improve productivity, market share and internal/external processes. Web sites have become a mission critical necessity in today""s business environment. Under such mission critical conditions, unpredictable service levels will result in loss of revenue and market leadership. To avoid this costly impact, web sites must be highly available and dependable.
The Web is comprised of host computers that are contacted by one or several client computers via browsers. A high level protocol (i.e. at the application layer level in the layered OSI network model), Hypertext Transfer Protocol (HTTP), is used to open a connection to an indicated server, send a request, receive a reply and then display the contents of the reply to the user. In response to requests from client computers, host computers transmit data in the form of web pages. The servicing of these requests is typically performed on a xe2x80x9cfirst come first served basisxe2x80x9d. Host computer resources are committed to servicing clients"" requests, and, as a result there is a finite limit on the number of client computer requests that host computers can simultaneously handle. That is, when host computers receive several requests from many clients the host computers may not be able to service the requests in a timely fashion due to the depletion of the host computer resources. Further disadvantageously, under heavy loads host computers will completely stop accepting new requests until host computer resources are freed up, leaving clients seeking access to heavily used sites in some cases unable to gain access.
It is known that capacity-planning techniques such as over provisioning of computer resources to meet and exceed projected peak demands can alleviate this problem. However, this only increases the capacity of host computers to accept more connections and postpones the above-mentioned behavior. Due to the xe2x80x9cfirst come first servedxe2x80x9d nature of the HTTP protocol the web site will admit and commit the host computer resources to the next request on the queue. When deployment is a corporate site, it is perfectly conceivable that under peak load this policy of treating every request uniformly will result in applying the corporate resource, such as the web site, inappropriately and indiscriminately resulting in a non-optimal use of the resource in the revenue generating process. On sites where e-commerce is conducted this can translate into a real loss of revenue.
In known networks, Class of service (Cos) is honored in the network infrastructure, i.e. at the physical/hardware level, and is implemented as a means of determining the network bandwidth that is appropriated to the flows based on and commensurate with pre-negotiated policies. Unfortunately, this known Cos policy is terminated at the network layer, below the Application layer, in the layered OSI network model. This means that end-to-end Cos is not presently available in known network configurations because once the flow reaches the application layer (Web layer or HTTP layer), there are no metrics to implement and honor Cos. Consequently, the notion of an end-to-end Cos policy in the context of a policy enabled network, such as an Internet or Intranet, breaks down and any negotiated differentiated service to the client is not universally available. Network based Cos does not deal with back-end server resource allocation, and it does not facilitate differentiation of service on the backend server. Thus even with network level Cos/Qos, all users will uniformly experience degradation of service as a result of back-end server utilization conditions (e.g. bottlenecks, overloads).
Site resource and performance management implementations are known for the intended purpose of improving host availability and dependability in the context of internets and intranets. Known xe2x80x9cload balancingxe2x80x9d implementations reallocate requests for overloaded servers to available servers. While such implementations monitor server loads and help to avoid disconnects due to overloading, they are typically based on a fixed or static algorithm that effects reallocation. Known load balancing techniques do not provide enhanced quality of service (QOS) as a function of client characteristics, rather QOS is enhanced only by reallocation of load. Load balancing implementations known in the art treat all requests equally, and merely re-direct traffic with little regard for the client generating the request or what the client is requesting or the nature of the transaction. In an ISP environment existing QOS solutions do not permit provision of differentiated services to each virtual site. High-end clients are treated the same as low-end clients. Known load balancing approaches have no mechanism to distinguish or prioritize users or transactions or requests. Thus even with implementation of load balancing there is still a first come first served policy. All requests are treated equally and there is no provisioning of resources based on a user or request.
One known implementation that purportedly provides enhanced QOS as a function of client characteristics is HP WebQos, available from Hewlett-Packard Company, Palo Alto, Calif. HP WebQos enhances web performance, capacity and availability only in the HP-UX operating environment. It permits site managers to prioritize web site service levels allowing higher service quality to be allocated as a function of certain, limited, client characteristics. HP WebQoS monitors service-levels and adjusts workload scheduling and processing rates based on policies configured by the administrator. The HP WebQoS technology prioritizes access as a function of client characteristics by scheduling HTTP requests into three different priority queues. The three queues are static and not further extensible, and control resources allocated to servers and applications. The implementation disadvantageously depends on a particular operating environment, and relies on a proprietary controller (i.e. Cisco LocalDirector), to effect functionality.
The HP WebQos architecture is illustrated in FIG. 1, and comprises essentially four components: a request controller 10, a service resource controller 12, a LocalDirector (LD) controller 14 to manage the proprietary CISCO LocalDirector, and a management system 16. The request controller 10 classifies requests into high, medium, or low priority based on a configured policy. The three priority levels are used to determine admission priority and performance-level. Classification into the three priority levels can be done as a function of: Source IP Address; Destination IP Address; URL; Port number; Hostname; and IP Type-of-service. Initial requests are classified, and thereafter a session is marked and all subsequent requests associated with that session continue to be classified at the same priority.
The request controller 10 controls admission decisions for new sessions, and admits, redirects, defers, or rejects new sessions based on the configured policy. Configurable admission policies are based on the user or service class and on various performance or resource conditions, such as CPU utilization, or length of queues handling the high, medium or low priority http requests. Admitted requests are queued into high, medium, and low priority queues. The queues are serviced based on the configured policy resulting in variation in performance at each level.
The service resource controller 12 manages hardware resource allocation. Greater resources are allocated per unit of workload for higher priority services. The service resource 12 controller controls allocation as a function of: percent CPU utilization and percent disk I/O utilization.
The LD Controller 14 runs on web servers and manages the proprietary Cisco LocalDirector. The LD Controller 14 dynamically manages server weights by setting Cisco LocalDirector""s relative weights for each server in a cluster using SNMP_Set. Initial weights for each server are generated using tested throughput results. The LD Controller then dynamically adjusts the LocalDirector weightings to match each server""s actual capacity during operation. The LD Controller is loaded on each server using the management system.
The management system 16 in HP WebQos provides a GUI for creating, editing, and deleting Service Level Objectives (SLOs). The SLOs are the capacity and performance objectives configured per user or service class. Admission of requests, resource allocation, and priority queue servicing are configured using the management system to issue directives to the request controller 10 and the service resource controller 12.
Disadvantageously, HP WebQos architecture is highly platform dependent, requiring a particular operating environment, HP-UX, and it relies on a proprietary controller, Cisco LocalDirector, to effect functionality, limiting its applicability to systems including those proprietary components. The implementation is focused on applying policy for differentiated services to categorize users into queues. Flash crowds, i.e. unusually high periodic traffic, requesting services may end up being categorized into one or the other of the queues, thus leading to a queue imbalance. As a result, traffic may end up being rejected in a queue despite availability of server resources to service these requests through another queue. Further, in a web farm configuration, HP WebQos does not permit implementation of class management at the back-end server. Only one load balancing mechanism is used for all back-end servers. The concept of categorizing back-end servers into various priority classes does not exist. The prioritization is limited to the (maximum) three front-end queues. Under circumstances of flash crowds, low priority users will be denied access to the site(s). This may result in long term business consequences, e.g. loss of revenue, and it will result in a negative quality experience for users. The notion of provisioning and reserving resources does not exist in the HP WebQos implementation.
Further, disadvantageously, the HP WebQos does not allow prioritization of traffic or site resources based on a virtual site in an ISP environment. Still further, dependence on the three front-end priority queues, significantly limits the allocation of services among classes by limiting the number of classes of requests as a function of the available queues. Accordingly, HP WebQos imposes a significant limitation on the differentiation of service(s) allocated to admitted requests. The three priority levels are used to determine only admission priority and performance-level, which again imposes a significant limitation on the differentiation of service(s) allocated to admitted requests . A limited number of client characteristics can be used to classify requests with HP WebQos, severely limiting the classification of traffic. In conjunction with the limited number of classification queues, the limited classification characteristics in HP WebQos makes administration of complex policies and rules for classification of requests virtually impossible. The limited classification characteristics in HP WebQos significantly limits client service differentiation and does not facilitate differentiation based on adaptive modeling of client behavior.
The present invention provides a method and apparatus for robustly enhanced Class of Service (COS) at the application layer (e.g. HTTP protocol layer), that permits highly flexible privilege based access and enables implementation of complex policies and rules for classification and differentiation of services. Differentiation at the application layer facilitates categorization of traffic to permit flexible design and implementation of multiple Class of Service levels.
According to the invention, a front-end processor or routing host, e.g. in the form of a TCP router, is configured to receive all client requests for sites and virtual sites implemented on a plurality of service hosts or back-end servers (or servers). A monitoring processor incorporating an Adaptive Policy Engine (APE), in communication with the router (and agents installed on back-end servers), dynamically monitors workload and availability of servers to enable requests to be sent to the most appropriate and optimal server. Incoming traffic is first processed to assign a class based on user defined policies. The APE is employed to monitor the incoming traffic to the routing host. Traffic is measured to each hosted site and further, to each class of a hosted site. The APE has a rules based engine that correlates this information and uses it to come up with a dynamic, real time balancing scheme for each hosted site. The APE or policy engine in conjunction with the router then intelligently distributes incoming traffic to the most available and/or efficient server within each class or xe2x80x9ccluster,xe2x80x9d by using one or more of a plurality of selectable load distribution algorithms for the class/cluster, including: weighted percentage load balancing; round robin load balancing; CPU availability load balancing; probabilistic load balancing and least connections load balancing. Thus each back-end server in communication with the router is subject to a selectable one of a plurality of load balancing algorithms so that traffic is routed to the plurality of back-end servers, as a function of class, in a manner that maintains consistent response times and service level commitments even with increases in traffic and processing loads.
Intelligent agents deployed on each of the back-end servers monitor several server attributes/parameters and report back to the policy engine at the router. The server attributes (or service level metrics) reported to the router include: response time by user; URL; request; transaction type; content type; application type; service/protocol type; domain of origin; file size; online/offline status; total hits per second; CPU utilization (i.e. number of processors and percent utilization); number of processes; total open connections; disk space (i.e. disk size in bytes, bytes used, percent used, percent free); response times of back-end servers; URL/content availability; server and virtual site availability; application availability; and memory utilization (i.e. total memory, memory used, free memory). A subset of these attributes are monitored by the intelligent agents and reported back to the policy engine for each virtual site and for each web farm.
These parameters reported to the router are available via a Management Information Base (MIB) kept by the policy engine on the router. The information is also made available by the policy engine to application layer programs (i.e. on an NT platform via performance monitoring registers, xe2x80x9cperfmon registersxe2x80x9d). The policy engine uses this information, in conjunction with the router, in making load distribution decisions. The policy engine in conjunction with the router uses the information/parameters to determine the configuration of the class and cluster(s), as well as in making load distribution decisions. The policy engine repackages some of the information into a Simple Network Monitoring Protocol (SNMP) MIB. The SNMP MIB provides access to these important real-time site performance metrics via an industry standard SNMP MIB browser.
Class of service (COS) involves the classification of incoming requests by the policy engine, into classes based on the Source IP address, Destination IP address, Port Number, URL, service or protocol, virtual site, transaction or request, or authenticated user. Backend server sites are clustered into virtual user definable cluster groups. Each cluster group can be managed/designated with a particular class of service. Based on its class, the connection/request will be directed to one of the clusters. The specific machine selected will depend upon the load balancing algorithm defined for the cluster or class, and implemented as a function of the parameters reported to the policy engine, for making load balancing decisions. An adaptive balancing module balances the number of service hosts in a cluster dynamically and guarantees optimal use of the resources by moving unused resources to service requests as needed. Based on information/parameters received via a UDP packet and based on service level commitments, the composition of the clusters can be changed dynamically by the adaptive module so that service level metrics fall within committed levels.
In further accord with the invention failover capability for the router provides service with minimum disruption in the event of a hardware or software failure by providing a redundant (secondary) optional warm router. The redundant router will assume the functions of the failed (primary) router. The primary and secondary routers are configured as symmetric peers with each having a single Network Interface Card (NIC) and Internet Protocol (IP) address, however, the IP address of the primary will be the only published address.
In still further accord with the invention, a control center implementing a central control Graphical User Interface (GUI) allows administrative users, e.g. Internet Service Providers (xe2x80x9cISPsxe2x80x9d), and IT/IS administrative personnel for e-commerce merchants, corporate organizations, and customer service organizations, to interact with the system. The control center provides a means for starting and stopping all services; for configuring the services and for editing existing services.
Features of the invention include selectively grouping servers/hosts (xe2x80x9cservice hostsxe2x80x9d) into clusters; recognizing and categorizing incoming traffic or requests based upon their domain of origin, transaction, UkL, service or protocol, Source or Destination IP address, virtual site, or based upon authenticated user name and then directing client requests to a specific cluster to provide differentiated service. Assigning more resources to a cluster to support higher end requests guarantees that more resources are available to this class, and it is given priority over other classes. Further, load balancing among service hosts in a cluster avoids the imbalance of load among members of a cluster.
A high level of service that is responsive to the individual needs of the site""s users can be provided. Service providers can determine how many service hosts they will assign to a hosted site based upon tiered Service Level metrics. Assigning service hosts to clusters provides for another means of providing differentiated services. Service providers can provide users with access levels and content that is appropriate to their subscribed Class of Service or Service Level Agreement (SLA). ISPs can enter into SLAs with virtual web site hosting customers where they will be able to guarantee response times, error rates, access to site resources and generate quantifiable periodic reports to measure the SLA metrics. Based on SLA metrics corrective actions can be taken so that response times, open connections, and percentage of content and server related errors fall within acceptable levels.
Routing by class ensures that users are directed to web servers and content commensurate with their service levels, enabling sites to meet user""s expectations. ISPs can guarantee network uptime and also assure site and content availability and response times commensurate with SLAs. Clustering service hosts into groups based on service level metrics, in conjunction with network level COS protocols guarantees the delivery of end-to-end policy metrics of a policy-enabled network.