1. Field
The disclosure relates generally to an improved data processing system, and more specifically to a computer implemented method, apparatus, and computer program product for dynamically altering the delivery of web content to end users based on current and projected server load.
2. Description of the Related Art
From its very beginnings as a collection of hyperlinked text documents, the World Wide Web (WWW) has grown increasingly complex as it evolves to bring end users increasingly rich web experiences. Such experiences now include embedded programs and large binary objects as well. This evolution has mostly occurred with backwards compatibility, such that new features are added without removing support for the old features. Therefore, while the modern methods of delivering a web experience are computationally more expensive, the older methods are still valid and may be leveraged to fulfill the core mission of a web application. By creating several variants of the web content with increasing degrees of richness, an appropriate level of web experience may be delivered to an end user. Providing such richness content variants is similar to methods in which multilingual versions of content are developed and used.
Content negotiation is a mechanism defined in the HyperText Transfer Protocol (HTTP) specification that enables a web server to serve different versions of a document (or more generally, a resource) under the same Uniform Resource Indicator (URI), so that a client agent can specify which version best fits the client's capabilities. Each of these different content versions is called a ‘variant’. Content negotiation helps determine what form content should take, given the characteristics and preferences set on both the server and client side. Thus, the same source data may be rendered in various ways, based on different access scenarios and available equipment. One of the most classic uses of the content negotiation mechanism is serving an image in multiple image formats, such as GIF or PNG format. If a user's browser does not understand one format (e.g., PNG), the browser can still display the other (e.g., GIF) version. Additionally, a document or resource may be available in several different representations. For example, the resource might be available in different languages or different media types, or a combination of the two. One way of selecting the most appropriate content to serve the user is to provide the user with an index page and allow the user to manually select the particular content variant to be delivered. However, it is often possible for the server to automatically choose the web content variant to be delivered. Automatic content variant selection by the server can be implemented because browsers can send, as part of each content request, information about the variants they prefer. For example, a browser may indicate that it would like to see information in French, if possible, otherwise English is acceptable. Browsers may indicate their variant preferences in headers in the request, as detailed in RFC 2295—Transparent Content Negotiation. Transparent content negotiation is an extensible negotiation mechanism, layered on top of HTTP, for automatically and efficiently retrieving the best variant of content when a GET or HEAD request is made (i.e., when the URL is accessed). Transparent content negotiation is called ‘transparent’ because it makes all variants which exist inside the source server visible to outside parties. Extensions to the transparent content negotiation are detailed in RFC 2506—Media Feature Tag Registration Procedure. Content negotiation, being a dialogue to produce an agreement on a course of action, requires participation from all parties. While transparent content negotiation may be used by browsers to specify the type of web experience to provide their end users, browsers are only aware of their individual capabilities and know nothing about the overall web content usage patterns or adoption levels. Browsers therefore are not in an advantageous position to furnish any actionable information to the content servers, and therefore are not in a position to participate in this type of content negotiation.
The flash crowd phenomenon where real-life crowds gather suddenly, spontaneously, and unpredictably is a well understood phenomenon that has also been observed on the World Wide Web, where the effect is the inability of a web site to serve resources to users at the desired level, and sometimes even crashing. This flash crowd effect often occurs when a relatively unpopular web site catches the attention of a large number of people and receives an unexpected surge in traffic. Typically less robust websites cannot cope with the instant surge in traffic and quickly become unavailable. However, there are relatively large websites that must contend with flash crowds on a regular basis, and at times lack capacity and suffer the same inability to serve resources at the desired level. These may be sites that by their nature provide ephemeral or event-driven content, and often include vendor, sports, news, and weather sites.
People familiar with the art of delivering web content will find it evident that it is more computationally expensive to deliver a complex, dynamic, and feature-rich web experience than simpler ones based on mostly static or solely on text elements. To maintain a consistent web experience for end users that request a website, organizations that expect flash crowds on their websites usually design their infrastructure to the expected peak. This is an expensive proposition that leads to under-utilized resources when the interest is low. In addition, web content delivery failure may occur if the assessment of expected peak is too low. Some organizations choose to lessen the impact of fluctuating interest by varying the amount of computational resources that are available as the interest ebbs and flows by reducing resources available to workloads that are considered to be less critical. However, if the computational requirements of delivering the web content can be dynamically altered to match the amount of interest a website is receiving at any given time, organizations would extract better value from their infrastructure investment. Additionally, organizations would benefit from the value created in their user base by the ability to maintain a satisfactory end user experience at all times for all workloads. Organizations do vary the computational requirement of delivering web content. Content shedding is the process of temporarily removing the more expensive or heavy content from the website, especially when heavy traffic results in the system being overloaded and immediate need is required to reduce hits on the site. However, this is a slow and manual process that is often done too late and becomes restorative in nature.