This invention relates to data communications networks and, in particular, to methods and apparatus for accessing and retrieving information from a database, documents, or files maintained by a network server. The methods and apparatus of this invention are particularly useful for downloading pages from the World Wide Web (WWW).
An internet user may typically employ more than one device to access the WWW. For example, in an office environment a user may have access to a high performance data processor or workstation, as well as a high speed data connection, that provides access to the internet, while in other locations, such as in the home or while traveling, the user may have access only to a lower performance data processor, and a slower connection to the internet. In addition, the slower speed connection may require connect charges billed on a per minute basis.
The widespread availability of WWW phones, Personal Data Assistants (PDAs), and Windows-based CE machines with internet connectivity are expected to soon provide internet access capability to larger portions of the earth""s population, thereby making efficient techniques to access WWW pages (web pages) even more desirable.
For many users an internet connection made at home may be lower in cost than a connection made while travelling, since the user may have internet service at home that allows unlimited access for a flat fee by calling a local access number.
Accessing the internet from smaller devices (e.g., WWW phones and PDAs) may be more expensive because of higher connection charges via cellular phone or non-local numbers, or due to access through hotel telecommunication facilities. In addition, the amount of local memory, disk resources, and battery power may be limited on the smaller devices. Also, when users connect with the smaller devices they are likely to be traveling, and have less time to wait for large web pages to download. Given the interactive nature of browsing, it is often difficult to return to the same page in a later browsing session, making it attractive to download the web page immediately, instead of postponing the download for a later browsing session. Finally, the smaller devices may not support all aspects of a standard known as the Hypertext Markup Language (HTML), necessitating that some documents be viewed on more powerful, and fully compliant HTML devices. For example, a portable data processor may not support a Postscript(trademark) viewer.
At present there exist several techniques that are known to the inventor for indicating specific web pages to be downloaded at a later time. These techniques download the requested pages to the same (requesting) machine at a later time, for example at night when phone rates are lower and internet traffic is reduced.
There also exist so-called push technology schemes, such as one known as Pointcast(trademark), that periodically download information from certain sites to a given data processor. A user can schedule, for example, news, stock, and/or weather information to be downloaded at specific times or at specific intervals. However, these techniques also download the requested information to the requesting data processor.
Other techniques, such as one known as Webwhacker(trademark), enable a user to make a local copy of a web site, and allow the user to specify a number of links (i.e., Hyperlinks) to follow and download. However, the local copy is created on the same data processor on which the copying is scheduled.
A technology available from the assignee of this patent application, referred to as ARTour WebExpress(trademark), allows a user to browse the web more asynchronously than is possible with current browsers. For example, using conventional WWW browsers such as Netscape Navigator(trademark) 3.0 or Internet Explorer(trademark) 3.0 the user can scroll a current page while a next page is being downloaded, thereby providing a degree of asynchronous access. The WebExpress(trademark) technique takes this one level further by allowing the user to continue to specify links (Hyperlinks) to fetch while previously specified pages are being fetched. These requests are queued in a local buffer and the pages are fetched in a sequential manner. When the requested pages are available on the local machine, the user is made aware of it by a suitable signaling mechanism.
A proxy server is a World Wide Web server that acts as the sole web server for an entire domain, or for those client computers that are placed behind a firewall (i.e., a logical block between the clients and the rest of the internet). The proxy server typically resides at the firewall and intercepts all web requests originating from clients within the firewall. If a given web page request is not in the proxy server""s access control list, the request is processed normally and the retrieved web page is sent back to the requesting client. If, however, the requested web page or web site is on the control list, the client instead receives a message indicating that the URL is not accessible or is not valid.
A proxy server can improve a network""s performance by functioning as a caching server. Using its cached web pages, the proxy server will serve already-accessed web pages to requesting clients without requiring outside access to the internet. For example, consider a case of an environment where n client computers access the same web page, wherein each client computer outputs the address (URL) of the web page to be accessed. Without the use of the proxy server, n separate requests for the web page are initiated, and n separate copies of that same web page are retrieved and returned to the client computers.
Using a proxy server, however, the same n web page requests are handled more efficiently. Only the first request to reach the proxy server actually causes that web page to be retrieved from the WWW server, and only if that web page is not already stored in the proxy server""s cache. When retrieved, the web page is sent back to the requesting client computer, and is also cached on the proxy server""s hard disk. The remaining n-1 clients that request that same web page are then served instead from the proxy server""s cache, thus avoiding unnecessary duplicated requests and delays.
However, none of the existing techniques that are known to the inventor enable web pages and other data to be downloaded to another machine, preferably a more powerful machine, over a different link, preferably a higher speed link.
It is a first object and advantage of this invention to provide an improved method and apparatus for downloading information from a server that overcomes the foregoing and other problems.
It is a second object and advantage of this invention to provide a method and system for selectively identifying links (i.e., Hyperlinks) that are to be downloaded to a second data processor for subsequent retrieval by a user of a first data processor.
The foregoing and other problems are overcome and the objects of the invention are realized by methods and apparatus in accordance with embodiments of this invention.
A method is disclosed for downloading data, such as a web page, over a network. The method includes the steps of (a) initiating a data (e.g., web page) download request with a requesting entity having a first network address, the requesting entity being connected to the network; (b) fulfilling the web page download request with a web page source entity having a second network address; (c) transmitting a requested web page to a destination entity having a third network address; and (d) receiving and storing the requested web page in the destination entity for subsequent use by a user of the requesting entity. The step of receiving and storing may include a step of transmitting a web page download acknowledgement message from the destination entity to the initiating entity for indicating a receipt of a requested web page.
One advantage of the use of the teaching of this invention is that a low performance data processor may selectively specify one or more web pages to be downloaded to a higher performance processor over a higher bandwidth communication link, and may also specify desired postprocessing to be performed on retrieved web pages prior to a user of the first data processor accessing the stored web pages.
In one embodiment the step of initiating includes steps of generating a web page download command and transmitting the web page download command to the destination entity. This may be accomplished over the network, or over another network, such as an intranet, that connects the initiating and destination entities. In this case the step of fulfilling includes initial steps of formulating, in response to receiving the web page download command at the destination entity, a network web page request message and transmitting the network web page request message from the destination entity to the web page source entity. A confirmation message may be sent to the initiating entity to confirm the receipt of the web page download command.
In another embodiment of this invention the step of initiating includes steps of generating the network web page request message that includes the third network address and transmitting the network web page request message from the initiating entity to the web page source entity. In this case the web page source entity transmits the requested web page(s) to the destination entity at the third network address.
The step of initiating includes a preliminary step of responding to a signal from a user through a user interface, such as by redefining mouse clicks when interacting with a web browser, such that the signal indicates that a specified web page is to be downloaded to and stored in the destination entity, as opposed to being fetched and displayed in a conventional manner. In this embodiment the step of responding includes a step of prompting the user to enter information for specifying at least one parameter related to downloading the web page, and/or includes a step of retrieving at least one user default parameter related to downloading the web page.
In a preferred embodiment the web page download command sent by the initiating entity includes a plurality of fields, including fields intended to specify: the first, second and third network addresses; at least one user download preference; and at least one postprocessing operation to be performed on a received web page. The at least one user download preference includes at least one of: a number of web page levels to download; whether to download graphical data; a number of permissible retries to download a web page; and an interval between the retries. The at least one postprocessing operation includes at least one of: whether to decompress a received web page; whether to virus scan a received web page; and whether to print a received web page.
The step of receiving and storing the requested web page in the destination entity includes a step of writing data into an index entry associated with the received web page. The index entry is comprised of a plurality of fields, including fields intended to specify: the first and second network addresses, and a link summary of the web page. The index entry fields further specify at least one of: a time that the web page was downloaded; a number of bytes that were downloaded; a time that the web page download command was received by the destination entity; a number of retries that were required, if any, to download the web page; and an error report.
In a preferred embodiment of this invention the method includes a capability to transmit a cancellation message from the initiating machine to the destination machine. In response to receiving a cancellation message the destination machine one of terminates an on-going web page download, or deletes an already downloaded and stored web page, as well as the index information associated with the stored web page.
In a preferred, but not limiting embodiment, the network includes the internet, and the web page source entity is a WWW server compliant with conventional and/or extended HTTP protocols.