The rapid growth of the web has been noted and tracked extensively. Recent studies, however, have documented the dual phenomenon: web pages often have small half-lives, and thus the web exhibits rapid decay as well. Consequently, page creators are faced with an increasingly burdensome task of keeping links up to date, and many fall behind. In addition to individual pages, collections of pages or even entire neighborhoods on the web exhibit significant decay, rendering them less effective as information resources. Such neighborhoods are identified by frustrated searchers, seeking a way out of these stale neighborhoods, back to more up-to-date sections of the web.
On Nov. 2, 2003, the Associated Press reported that the “Internet [is] littered with abandoned sites.” [20] The story was picked up by many news outlets from USA's CNN to Singapore's Straits Times. The article further states that [d]espite the Internet's ability to deliver information quickly and frequently, the World Wide Web is littered with deadwood—sites abandoned and woefully out of date.”
Of course this is not news to most net-denizens, and speed of delivery has nothing to do with the quality of content, but there is no denial that the increase in the number of outdated sites has made finding reliable information on the web even more difficult and frustrating. Part of the problem is an issue of perception: the immediacy and flexibility of the web create the expectation that the content is up-to-date; after all, in a library no one expects every book to be current, but, on the other hand, it is clear that books once published do not change, and it is fairly easy to find the publication date.
While there have been substantial efforts in mapping and understanding the growth of the web, there have been fewer investigations of its death and decay. Determining whether a URL is dead or alive is quite easy, at least in the first approximation, and, in fact, it is known that web pages disappear at a rate of 0.25-0.5%/week. However, determining whether a web page has been abandoned is much more difficult.
Thus, those skilled in the art desire a method for assessing the decay status or “staleness” of a web page. In addition, those skilled in the art desire methods for assessing the staleness of a web page so that the method can be used as a way of ranking web pages. Further, those skilled in the art desire methods and apparatus for use in web maintenance activities. Methods and apparatus that accurately assess the staleness of web pages are particularly useful in managing web maintenance activities.