The World Wide Web (“web”) is a system of server computers on the Internet that support the standards defining both the structure of a web page and the protocol for passing information between client and server computers. Web pages are created using a so-called Structured Generalized Markup Language (“SGML”), such as HyperText Markup Language (“HTML”) or Extensible Markup Language (“XML”), to structure the presentation of the text, graphics, audio, and video content of a web page. The textual content of a web page includes hypertext links embedded in the document text to allow the reader to click on the hypertext link in the document text to quickly access another, related, resource on the web. In addition, a software development environment and programming language such as JavaScript or Java may be used to create and modify programs called from the web page HTML code. A web page author first creates or modifies a web page and then publishes the web page on a web site to make it accessible to web users.
The web and HTML make it relatively easy for a web page author to create and update a web page. This not only promotes the proliferation of information on the web, but also increases the risk that a hypertext link in a web page may be altered improperly.
Web pages are frequently set up and designed in an eclectic manner. Often, there is insufficient provision made for embedded links or hotspots in such web pages that link to target web pages which no longer exist or that have been moved and are reachable at a new Uniform Resource Locator (“URL”). This could potentially lead to chaotic web browsing as the user wastes time going up blind alleys.
In addition, a web page author cannot guarantee that a web resource referenced by the web page is correct and still accessible via the hypertext link. A web page that contains out-of-date links is useless to the web page user and causes the user to either continue examining other links in the search result set, perform a new search, or abandon the search altogether. To a user of the web, the web page content and the accuracy of the embedded hypertext links determine the reliability of both the web page and the hosting web site.
Proper management of a web site demands periodic testing of every web page associated with the site by following every link on the web page to test the validity and reliability of the link. The responsibility for this testing falls upon a web site manager. The web site manager typically determines the frequency of the link testing (e.g., once a month), but relies upon either the web page author, or someone hired by the author, to update the content, examine the hypertext links, and correct any errors. Since this testing requires a considerable amount of time, the cost to assure that a web site's links are up-to-date will increase in proportion to the number of links available on the web site. Though the number of accessible web sites will continue to increase, a similar increase in the existence of accessible and inaccessible web pages will likely result. In addition, the manual nature of the link checking process described above is highly prone to error.
Web site management software exists, as disclosed, for example, in U.S. 2004/0205076, which can detect a change in hypertext links embedded in a web page and can notify the author of such change.
However, such web site management software still places on the author the task to update afterward the modified hypertext links which then limits the speed, growth, and efficiency of the web.