The World Wide Web (WWW) is the Internet's multimedia information retrieval system. In the Web environment, client machines communicate with Web servers using the Hypertext Transfer Protocol (HTTP). The Web servers provide users with access to resources, which may be any type of content that can be stored in a file and presented to a user, such as program files, media files, text, graphics, images, sound, video, etc. Page description languages, such as Hypertext Markup Language (HTML) or eXtensible Markup Language (XML), are often used to describe web pages. HTML provides basic document formatting and allows the developer to specify connections known as hyperlinks to other servers and files. XML is used to represent the content and structure of data without describing how to display the information, with a stylesheet or schema provided for applying visual formatting to the XML document.
In the Internet, a network path to a server is identified by a resource address called a Uniform Resource Locator (URL), and has a special syntax for defining a network connection. Application programs called web browsers, which run on client computer systems, enable users to access resources by specification of a link via the URL and to navigate between different HTML/XML pages. Most browsers natively support a variety of formats in addition to HTML, such as the JPEG™, PNG and GIF™ image formats, and can be extended to support more through the use of plug-ins. The type of content contained in an HTTP message is identified by an HTTP content type. This allows web page designers to embed images, animations, video, sound, and more into a web page, or to make them accessible through the web page.
HTTP is a request/response protocol between clients and servers. An HTTP client, such as a web browser, typically initiates a request by establishing a Transmission Control Protocol (TCP) connection to a particular port on a remote host (port 80 by default). An HTTP server listening on that port waits for the client to send a request string, such as “GET/HTTP/1.1” (which would request the default page of that web server), followed by an email-like MIME (Multipurpose Internet Mail Extensions) message that has a number of informational header strings that describe aspects of the request, followed by an optional body of arbitrary data.
A web browser normally performs many requests to get a single webpage. The web server first delivers the HTML file for the page. From this the web browser makes a list of the resources that are embedded in the page and which it needs to request as well. Once this list is complete, the web browser issues more requests, one for each resource of the page it needs (images, inline midi, etc.). If, for example, you direct your browser to get www.ibm.com/uk/, the browser will assume the prefix ‘http://’, connect to port 80 of the server that has the Domain Name Service (DNS) name: www.ibm.com, and issue the command: GET/UK HTTP/1.0. By convention, a path name that ends in a directory name rather than a simple file name refers to a file in that directory called index.html. The web server thus sends the resource file called index.html and the browser starts rendering index.html as it comes across, and starts building an index of any embedded files it needs. The browser then issues GET commands for each other file it needs and renders them into the page as it receives them.
In HTTP version 1.0, the client requests one resource per HTTP request. If a web page contains a plurality of images, say eight images, then the browser will issue a total of nine requests to obtain the entire contents of the page, that is the HTML plus the images. Typically, browsers make several requests concurrently to reduce the overall delay to the user. Later versions of HTTP, such as HTTP v 1.1, use persistent connections that remain open over a series of request-reply exchanges.
The speed of download of a web page by a web browser is dependent, amongst other factors, upon the load on the web server. This load can vary greatly and depends in turn on factors such as the demand at any one time for the pages that it hosts. Certain types of web sites are particularly affected by peaks in demand, such as those that display news items. Such a web site may normally have a moderate numbers of users, but when a new story breaks, or a particular story hosted on the web server is included as a link on a major news site, the web server may suddenly experience a load potentially hundreds of times what it is used to. This interest is often directed at a specific item, such as a block of text or image on the page, but every page download request will result in a number of additional features that make up the web page being downloaded, in addition to those specific items. In the normal course of operation of the web server a website provider/developer would like these additional items to be seen by users, but in a critical load situation these take up extra bandwidth. This can lead to many users not receiving the part of the page that they wish to look at, but receiving other parts of the page instead; and others receiving error messages that the server is currently unavailable.
Another factor affecting load on a web server is the time of day. Typically 9 am Monday morning will see considerable load on a company's intranet server compared to the average over a given week. During such a time of load it is unhelpful and frustrating for employees to be waiting long for the essential part of a page to load, whilst, for example, old items are downloaded.
As more and more people start interacting, researching and shopping on-line the problem of peak loads and effective content delivery will become more pronounced. The present invention aims to address these problems.