1. Technical Field
The present invention relates in general to data processing systems and in particular to data distribution mechanisms for data processing systems. Still more particularly, the present invention relates to a mechanism for data distribution of information from the Internet to a large number of data processing systems.
2. Description of the Related Art
The Internet has become a cultural fixture as a source of both information and entertainment. Many businesses are creating Web sites as an integral part of their marketing efforts, informing consumers of the products or services offered by the business or providing other information seeking to engender brand loyalty. Many federal, state, and local government agencies are also employing Internet sites for informational purposes, particularly agencies which must interact with virtually all segments of society such as the Internal Revenue Service and secretaries of state. Operating costs may be reduced by providing informational guides and/or searchable databases of public records online.
Currently, the most commonly employed method of transferring data over the Internet is to employ the World Wide Web environment, also called simply "the Web." Other Internet resources exist for transferring information, such as File Transfer Protocol (FTP) and Gopher, but have not achieved the popularity of the Web. In the Web environment, servers and clients effect data transaction using the Hypertext Transfer Protocol (HTTP), a known protocol for handling transfer of various data files (e.g., text, still graphic images, audio, motion video, etc.). Information is formatted for presentation to a user by a standard page description language, the Hypertext Markup Language (HTML). In additional to basic presentation formatting, HTML allows developers to specify "links" to other Web resources, identified by a Uniform Resource Locator (URL). An URL is a special syntax identifier defining a communications path to specific information. Each logical block of information accessible to a client, called a "page," is identified by an URL.
Retrieval of information on the Web is generally accomplished with an HTML-compatible "browser"--a program capable of submitting a request for information identified by an URL--at the client machine. The request is submitted to a server connected to the client and may be handled by a series of servers to effect retrieval of the requested information. The information is provided to the client formatted according to HTML.
The largest segment of the consuming public does not currently have access to these Web resources. Such consumers are typically either unable or unmotivated to acquire both the requisite hardware and software and the necessary computer skills for taking advantage of these resources. While most computers currently being sold come preloaded with Internet access facilities, including Web browsers, a substantial number of households do not have personal computers. There is a need for low cost data processing systems which are simple to operate, allowing users without computer skills the opportunity to access the Internet. This need is being addressed, to some extent, by "set-top" systems, such as for example "WebTV." These systems allow a television to be rapidly switched between providing conventional television viewing, either broadcast or cable, and providing a user interface for Internet access. The user's television thus becomes part of a Web appliance.
In designing a low cost, simple data processing system for a Web appliance, however, it is necessary to presume that the target user is unsophisticated and/or inexperienced. Therefore, the operation of the data processing system must be both simple and intuitive, requiring little or no technical sophistication on the part of the user. In this regard, many of the features of conventional Web browsers must be adapted to be transparent to the user when implemented in a Web appliance.
One feature of Web browsers which would be particularly advantageous to implement in connection with Web appliances is off-line browsing. Large traffic demands to specific Web sites can make access to such sites difficult. Off-line browsing allows information at the site to be retrieved during off-peak periods without contemporaneous user interaction at the client for subsequent off-line viewing by the user. Off-line browsing is a process of viewing Web pages cached in a local memory, such as a hard drive, without connection to the Web site from which those pages originate. The pages are typically retrieved from the originating Web site by off-peak retrieval, or retrieval during periods when traffic to the site is at a minimum.
Typically, a scheduling utility allows a user to retrieve specific Web pages for storage on the user's hard drive and later viewing. While an off-line browser may provide benefits to an individual user, however, it cannot support optimization of communications between a group of clients and the Web. Individual clients, each employing off-peak information retrieval, may still tax communications resources when connected to the same server or group of servers. Such a situation will particularly arise where substantial numbers of Web appliances access the Internet through a single service provider. In addition to practical constraints on off-peak information retrieval which complicate off-line browsing in such environments, it is anticipated that service providers will limit the time allotted for off-peak information retrieval for off-line browsing.
It would be desirable, therefore, to provide an automatic and more efficient feature for downloading information from popular Internet sites to specific groups of users. Use of off-peak information retrieval by multiple users, even if staggered, creates bottlenecks between the server and the Internet and requires additional resources to satisfy the bandwidth requirements. It is further desirable, therefore, to provide a mechanism for eliminating the bandwidth requirements imposed. It would also be advantageous for the mechanism to minimize transfer time both from the source and to individual users, and to require minimal resources at the server.