The present invention relates to systems, computer-implemented methods and computer program products for the archival and recovery of network-based information in a dynamic and changing environment.
It is becoming more and more common to cite or otherwise refer to web pages as source material in formal writings, such as scholarly works. It has also become common to reference web pages as source material in informal writings, such as emails, presentations, and other communications. In this regard, citations to a web page may be provided to acknowledge previous works, identify background or contextual information, direct readers to authoritative materials or otherwise provide additional content.
However, the Internet, and in particular, the World Wide Web (WWW) is dynamic in nature. As such, web pages may change over time, typically without notice. Accordingly, a link to the reference on the Internet may become broken such that the content is no longer available. Moreover, the link may remain valid, but the content itself may change or move. Thus, dependability issues must be considered with regard to the use of links to web pages in electronic documents.
Certain websites provide a notification service to alert subscribers to changes made to the content on the website. However, such systems are typically implemented so that users can see updated information in a timely fashion. If the desired material is the older material, then such a subscription is inadequate as the older, desired material has been changed. The Internet also hosts various archival sites that are intended to archive and preserve older web pages. However, such sites cannot be depended upon to accurately preserve specific content desired by a user. For example, there may be a time lag between the removal of desired web page content and its availability at the archival site. Moreover, the archival site may not store the appropriate version or all necessary corresponding links associated with a web page of interest.
A user may also make a private copy of one or more web page references. However, this can become administratively cumbersome, time consuming and unreliable, resulting in inconsistent storage of such content due to the largely manual form of content management required.