The worldwide packet data communications network now commonly referred to as the “Internet” has experienced extraordinary growth and acceptance. The Internet provides access to hundreds of millions of electronic documents, making it the largest single source of information in the world. As used herein, the term “electronic document” refers to any type of data or information in electronic form. Examples of electronic documents include, without limitation, text documents and web pages. In addition to providing access to vast amounts of information, the Internet provides a medium for a plethora of exciting and useful services such as electronic mail, user-to-user chat services and even the ability to conduct telephone calls, commonly referred to as “voice over IP.”
Arguably, one of the most valuable uses of the Internet is the ability for users to view and download enormous amounts of “content” from the Internet. In the context of the Internet, the term “content” broadly refers to almost any type of information or data. Common examples of Internet content include, without limitation, information about products and services offered by merchants, news and financial data. On the Internet, content is commonly provided to users in the form of web pages that are downloaded to users' personal computers and viewed using a web browser.
FIG. 1 is a block diagram of a conventional arrangement 100 for providing Internet content to a user. A user 102 uses a tool such as a Web browser to connect to an access provider 104, sometimes referred to as an Internet Service Provider (ISP), via a communications link 106. An example of access provider 104 is the Internet dial-up service provided by America Online. Communications link 106 may be any medium that allows data to be exchanged between user 102 and access provider 104. Examples of communications link 106 include, without limitation, a dial up connection, a cable modem connection, a Digital Subscriber Line (DSL) connection and a wireless connection.
Access provider 104 is communicatively coupled to the Internet 108 via a communications link 110. Communications link 110 may be any medium that allows data to be exchanged between access provider 104 and Internet 108 and is typically a broadband connection that allows for relatively large amounts of data to be exchanged between access provider 104 and Internet 108, since access provider 104 may provide Internet access to a large number of users.
Content providers 112, 114, 116 are communicatively coupled to Internet 108 via communications links 118, 120, 122, respectively, and provide content to user 102. Typically, user 102 views web pages hosted by access provider 104 and requests particular information by selecting icons or links to information that user 102 desires to see.
Two conventional approaches for providing content from content providers 112, 114, 116 to user 102 are the “retrieval approach” and the “cache approach.” According to the retrieval approach, user 102 requests content from access provider 104. Access provider 104 in turn requests the content from content providers 112, 114, 116 over communications link 110, Internet 108 and communications links 118, 120, 122. Content providers 112, 114, 116 provide the content to access provider 104 over communications links 118, 120, 122, Internet 108 and communications link 110. Access provider 104 then provides the content to user 102.
The primary benefit afforded by the retrieval approach is that the content provided to user 102 is generally the most recent content available from content providers 112, 114, 116 since the content is retrieved directly from content providers 112, 114, 116. The “freshness” aspect of the retrieval approach is particularly desirable to content providers who want users to always access the most recent content. One disadvantage of the retrieval approach is that a full “roundtrip” is required from access provider 104 to content providers 112, 114, 116 and back to access provider 104 to retrieve data. Thus, the time required for access provider 104 to provide the content to user 102 is adversely affected by data transmission latencies and failures in Internet 108 and communications links 110, 118, 120, 122, and the response time of content providers 112, 114, 116. Another disadvantage of the retrieval approach is that content providers 112, 114, 116 may become overloaded when a large number of content requests are received in a short time interval.
According to the cache approach, the first time that user 102 requests content from access provider 104, the content is retrieved and provided to user 102 in the same manner as the retrieval approach just described. In addition, the content is stored on a local storage medium, such as a cache, of access provider 104. Thereafter, when any user connected to the Internet through access provider 104 requests the content, the content is provided the user from the cache of access provider 104, without being retrieved from content providers 112,114, 116.
Content maintained locally by access provider 104 is updated or refreshed from content providers 112, 114, 116 based upon a combination of subsequent user requests for the content and a particular heuristic or algorithm used by the access provider to determine when to refresh content. For example, suppose that content provider 112 generates a particular electronic document. When user 102 first requests the particular electronic document, access provider 104 retrieves the particular electronic document from content provider 112, provides the particular electronic document to user 102 and stores the particular electronic document in the cache of access provider 104. Sometime later, user 102 requests the same particular electronic document. In response to the request from user 102, access provider 104 applies a particular heuristic to determine whether the copy of the particular electronic document maintained in the cache of access provider 104 should be provided to user 102, or whether a new copy of the particular electronic document should be retrieved from content provider 112. For example, access provider 104 may determine whether the cached copy of the particular electronic document is sufficiently new. If the copy of the content stored in the cache of access provider 104 is deemed to be sufficiently new, based upon the heuristic, then the copy of content stored in the cache of access provider 104 is provided to user 102. If, however, based upon the heuristic, the copy of the content stored in the cache of access provider 104 is too old, then a new copy of the content is retrieved from content provider 112.
One of the benefits afforded by the cache approach is that content can generally be provided to user 102 from access provider 104 much faster than from content providers 112, 114, 116. Thus, the time required to provide content to user 102 is not adversely affected by data transmission latencies in Internet 108 and communications links 110, 118, 120, 122 or the response time of content providers 112, 114, 116. The cache approach also reduces the amount of loading on content providers 112, 114, 116.
Despite the performance advantage provided by the cache approach compared to the retrieval approach, the cache approach has several drawbacks. First, the first requestor of content must incur the performance penalty associated with retrieving content from content providers 112, 114, 116.
Second, there is no guarantee that content will be maintained indefinitely in the cache of access provider 104. As a practical consideration, access providers have only a finite amount of cache storage and therefore cannot maintain all content indefinitely. This problem is particularly acute on the Internet, where the amount of available content is growing at an extraordinary rate. Because of limited storage space, access providers typically employ an algorithm, such as a least-recently used algorithm, to select which content from their caches should be overwritten with new content. Once content is replaced, the next requestor must wait for the content to be retrieved from the appropriate content provider 112. In addition, replacement algorithms generally do not know whether a particular version of content is the most recent version of the content. Thus, content re-retrieved from content providers 112, 114, 116 may not be any different than the content that was previously replaced, resulting in wasted communications bandwidth and wasted loading of content providers 112, 114, 116.
Third, content that is maintained in cache may not be the most recent version of content from content providers 112, 114, 116 and may therefore be “stale.” Limitations in heuristics and refresh algorithms therefore unavoidably cause some current content to be deleted and some old content not to be refreshed. Thus, the accuracy or coherence of access provider caches is adversely affected by limitations in the particular heuristic or refresh algorithm employed.
The cache approach effectively transfers the control of when users see new content from the content providers to the access providers. In addition to not having control over when new content will be made available to users, content providers 112, 114, 116 do not have any way of knowing statistics about access to their content by user 102, for example, which of their content is accessed and when their content is accessed by user 102. Being aware of access statistics for their content is very important to content providers because it allows them to better manage their content.
Given the need to provide content to users and the limitations in prior approaches, an approach for managing content that does not suffer from limitations of conventional approaches is highly desirable.
There is a need for an approach for providing content to users that provides greater control to content providers over which content is made available to users and allows the most recent content to be provided to users.
There is yet a further need for an approach for providing content to users that provides to content providers increased visibility into how and when users access their content.