Ready access to a multitude of World Wide Web (WWW) sites over the Internet has resulted in an astonishing growth in the amount of traffic and is likely to cause increasing delays in the transfer of data between servers and the client computers accessing them. This problem is particularly troubling to those using relatively slow modems to access the web. High speed access is often not readily available when the Web is being accessed from a personal data assistant or other small, portable computing device. Furthermore, even though higher speed modems are available for use by conventional computers, many slower modems remain in use. Clearly, a technique for increasing the efficiency with which HTML documents are transferred over the Internet (and over other networks) by slower speed devices would improve the effective data transfer rate, thereby benefiting all users of the network.
To better appreciate how data transfer efficiency might be improved on the Internet, it is helpful to understand the nature of the Web page documents being transferred. The HTML language is used for writing hypertext documents. These documents are more formally referred to as Standard Generalized Markup Language (SGML) documents and they conform to a particular Document Type Definition (DTD). An HTML document includes a hierarchical set of markup elements. Most elements have a start tag, followed by content, followed by an end tag. The content is a combination of text and nested markup elements. Tags indicate how the document is structured and how to display the document, as well as destinations and labels for hypertext links. There are tags for markup elements such as titles and headers, text attributes such as bold and italic, lists, paragraph boundaries, links to other documents or other parts of the same document, and in-line graphic images. The HTML language also enables an HTML document to include images that are stored as separate files. When the user views the HTML document, any included image is displayed as part of the document, at the point where the image element occurred in the document.
Although HTML files may include graphics, text in such files is more likely to change in a dynamic fashion. One solution to this problem is to store or "cache" HTML files that were previously transferred from the server computers, so that the next time the user connects to a site recently visited, the HTML file defining a web page for that Uniform Resource Locator (URL) site can be loaded into the client computer display from the cache stored on the client hard drive instead of being transferred over the Internet. The user perceives a much faster loading of a page from a cache, and less data need to be transferred over the Internet. However, under conventional procedures, if any change in the text content of an HTML file has occurred since the user last connected to a URL site, the cached file on the client hard drive will normally be discarded, and the entire revised HTML will be transferred from the server to the client computer. The user must wait for the transfer to complete before all of the content of the HTML file is fully visible. The server and the browser software used by the client typically cooperate to detect whether it will be necessary to transfer a new HTML file from the server to the client computer or whether a cached HTML file stored on the client computer can instead be used. Currently, there is no provision for splitting up an HTML file so that only the portion that has changed since the file was last cached is transferred from the server to the client.
To minimize the amount of data that must be transferred over the Internet in order to display an HTML file, it would be desirable to provide a mechanism for dividing an HTML file into a plurality of units, so that only those units that have changed must be transferred to synchronize the cached HTML file data on the client computer with that stored on the server. Furthermore, in those cases where it is necessary to avoid the need to modify existing web browser or web site server software, this mechanism should occur in a manner that is transparent to the server and browser software. By avoiding the need to modify server and web browser software so as to accomplish this objective, it should be possible to effect the desired result independent of the server and web browser software programs used to access the HTML files. However, as new browser and web site server software is developed, it would be preferable to integrate this approach into the respective software so that it operates more efficiently. By providing a method for transferring primarily the changed portions of an HTML file, a substantial improvement in the data transfer rate for web sites that have previously been accessed can be achieved when the sites are again accessed. The result will improve the apparent speed at which the HTML file for each such site is displayed on the client computer and greatly reduce the amount of data transferred over the net. Further, it will be apparent that this technique can be used when transferring an HTML file over almost any type of network, including local area and wide area networks and on intranets, between any two points that are connected, such as between two sites on a wide area network.