The present invention relates to the field of information retrieval, and in particular to the field of providing document change information of networked documents using document monitoring agents.
Changes to such documents are of interest to users. Such changes can take many forms: substantive content change, cosmetic/syntactic changes, and disappearance of the document. In addition, administrators of data stores containing document references face the additional problem that the same document can be referenced by multiple references.
Some databases are equipped with tools, using triggers that help users deal with these problems by notifying users when referenced items of interest change in some way or are removed. On the World Wide Web, hereafter referred to as the Web, there are also several services available that help users monitor Web pages based on their Uniform Resource Locator (URL) address.
Generally, these services, which are called document monitoring agents, notify users when URLs they have registered with the service have changed in some way. Users can request to be alerted daily/weekly, etc. Current monitoring services save either (1) a reference copy of the document, which is updated periodically (e.g. daily); (2) a summary of the change; or (3) a complete version history for the document.
Saving a complete version history allows the service to highlight to the user all changes since a given date by computing the difference in the current version and a previous version (for example, the version last viewed by the user). This is a powerful feature, but very costly. On the other hand, saving only one reference version means that a user needs to view the changes each time she or he is notified or else miss them.
Thus, evaluating the nature of the change and its importance for the user is a difficult task to automate and is thus the weakness of such change monitoring systems. The agent notifications, while perhaps including a great deal of data concerning the change, may be potentially irrelevant to the user, and in the long run, the high noise versus signal ratio may cause the user more annoyance than aid.
Saving a revision history containing text/visual summaries of changes from version to version is a good compromise. For example, Webspector™ from Illumix is such an application that can provide a list of changes of retrieved documents, an example of a retrieved documents list is shown in FIG. 1. For each document, a revision history (report) can be shown. Furthermore, each modified page can be checked and the program by default highlights text that has changed within each page or allows for keywords to be entered by a user and indicates any changes on a page thereby highlighting the keyword.
Since a user does not always want to be notified of every single change, Webspector further provides a possibility to limit downloads so as to reduce the possibility of being notified of a page change due to a rotating advertisement. For example, if a size-parameter ‘400’ is entered by the user in a Size Threshold field, this means that if the page is less than 400 bytes bigger or smaller than the previous version, Webspector will not recognize the page as having been modified. Further, to avoid excessive clutter, it is possible to specify that Webspector only keep the latest version of a particular page.
However, many of the changes detected are spurious from the user's point of view. Thus, a non-trivial problem faced by URL monitoring systems is how to maintain a revision history for monitored URLs that tracks only changes significant to subscribers and filters out automatically detected changes that are of no interest.
In addition, within a given workgroup, work community, or organization, it is likely that the existence of substantive changes in a document will be relevant to a number of people, not just one. In this case, the work of evaluating the nature of the change is likely to be done not once but many times, as there is currently no good way to share this work.