1. Field of the Invention
This invention relates to bookmarking pages at a client, and more specifically for enabling the client to get a next nearest page when the bookmarked page is not available from the Server.
2. Description of the Related Art
The Internet, initially referred to as a collection of xe2x80x9cinterconnected networksxe2x80x9d, is a set of computer networks, possibly dissimilar, joined together by means of gateways that handle data transfer and the conversion of messages from the sending network to the protocols used by the receiving network. When capitalized, the term xe2x80x9cInternetxe2x80x9d refers to the collection of networks and gateways that use the TCP/IP suite or protocols.
Currently, the most commonly employed method of transferring data over the Internet is to employ the World Wide Web environment, referred to herein as xe2x80x9cthe Webxe2x80x9d. Other Internet resources exist for transferring information, such as File Transfer Protocol (FTP) and Gopher, but have not achieved the popularity of the Web. In the Web environment, servers and clients effect data transfer using the Hypertext Transfer Protocol (HTTP), a known protocol for handling the transfer of various data files (e.g., text, still graphic images, audio, motion video, etc.).
A bookmark in a Web client, e.g., the Web browser IE5 or Netscape Navigator, is the address of a Web site that a user stores for later use. More specifically, a user clicks on this address to reach the Web file instead of typing the full address. Unfortunately, the Web is a dynamic environment where Web server sites are often updated and their pages rearranged. Consequently, if a user clicks on a previously stored bookmark that is no longer valid, an error code is returned from the Web Server. Typically, the error code returned as defined by the HTTP Protocol for such as situation is xe2x80x9c404: Not Foundxe2x80x9d as explained in the excerpt below from the HTTP standard:
404 Not Found
The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent. The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has not forwarding address. This status code is commonly used when the server does not wish to reveal exactly why a request has been refused, or when no other response is applicable.
Some Web sites are friendlier, and return additional information. For example, when a reference is made to a non-existing page on www.ibm.com, the following is returned:
Our Apologies
The document you have requested does not exist on this system.
Please check the URL and try again or use our search function in the menu bar to find the information you are looking for. If you believe you have received this message in error, please use the Contact link on this page to report this error.
404 multifail
Some Web browsers have a feature that when the URL is not found (be it from a bookmark or from some text typed in by a user), they search to find something that is close to the URL. For example, IE4 finds http://www.microsoft.com when xe2x80x9cmicrosoftxe2x80x9d is requested. Generally, such results are not very beneficial. Typically, the results are only helpful for finding the server name and results from categories kept in portal sites. For example, requesting xe2x80x9caltavistaxe2x80x9d, or a bookmark with a URL xe2x80x9chttp://www.altavista.com/page13.htmlxe2x80x9d, on IE4 gives:
AltaVistaxe2x80x94web and newsgroup search engine.xe2x80x94http://www.altavista.com/
AltaVista Emailxe2x80x94http://altavista.iname.com/
AltaVista Translation Servicexe2x80x94translate web pages or text between English and German, French, Portuguese, Spanish, and Italian.xe2x80x94http://babelfish.altavista.digital.com/
AltaVista Australiaxe2x80x94mirror site providing Australian, New Zealand, and Pacific Rim users with faster access.xe2x80x94http://altavista.senet.com.au/
AltaVista Clinton Impeachment Trial Videoxe2x80x94search video of President Clinton""s impeachment trial by word or phrase.xe2x80x94http://video.altavista.com/impeach/
Related Yahoo Categories
Business and Economy greater than Companies greater than Internet Services greater than Search and Navigation greater than AltaVista
Regional greater than U.S. States greater than Virginia greater than Cities greater than Altavista
Computers and Internet greater than Software greater than Reviews greater than Titles greater than System Utilities greater than Utilities greater than File greater than AltaVista Search My Computer
Business and Economy greater than Companies greater than Computers greater than Hardware greater than Systems greater than Manufacturers greater than Digital Equipment Corporation greater than Divisions greater than AltaVista Software
Since the search as described above is done when the page is not found, the search takes time. The actual amount of time that it takes depends upon the quality of the search engine. In addition to the problem of the time that it takes for the search, the bigger problem is the fact that the search typically returns results that are not usable.
As such, it is a problem for users when a Web page can not be found. Users want to be able to reach some Web page that can be considered to be the next closest piece of information to the unfound Web page.
It should be noted that the HTTP protocol does provide an error code to indicate that a link has moved to a new URI. The HTTP protocol states:
301 Moved Permanently
The requested resource has been assigned a new permanent URI and any future references to this resource SHOULD use one of the returned URIs. Clients with link editing capabilities ought to automatically re-link references to the Request-URI to one or more of the new references returned by the server, where possible. This response is cacheable unless indicated otherwise. The new permanent URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s). If the 301 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.
Although it would be very beneficial to receive a new URI reference as provided for above, the problem is that very few Web Server maintainers use this facility. There are also situations where the requested resource will just xe2x80x9cgo awayxe2x80x9d and not have a new URI reference.
When a user bookmarks a page on a Web browser, this indicates that the reader has an interest in that page. Hence, even if that page is not found on the Web server, when a user clicks on a bookmark it would be desirable for the Web client itself to go to a xe2x80x9cclose enoughxe2x80x9d or xe2x80x9cnext nearestxe2x80x9d page. The Web client should be able to go to another page that is close enough to the desired page independently of whether or not the Web Server maintainer implements code 301 from the HTTP Protocol. In addition, it would be desirable if the user could go to the xe2x80x9cclose enoughxe2x80x9d page without searching for it, even if a search were at all possible. A xe2x80x9cclosexe2x80x9d or xe2x80x9cproximatexe2x80x9d page is usually good enough for the user in many situations.
The system, method and program of the invention stores other close pages along with the bookmark for a page. This is done in the background in the spare cycles of the client by creating a hyperlinking site-map of the server. On clicking a bookmark, if the page is not found, another close page is obtained by the client. This technique requires that the basic structure of the Web Server is approximately intact. If the technique does fail, the client goes to the root document or home page of the Web server.
More specifically, the system, method, and program of the invention use web crawling techniques to create a site map which indicates the hyperlink structure of the site containing the desired page. Chains of hyperlinks from the desired page back to the root or home page are then stored along with the bookmark at the time of bookmarking.
If a page disappears from the site, the links in the chain are followed back from that page to the root page until a next available page is found. This next available page is presented to the user when the bookmark for the desired page is selected.