The following relates generally to methods, apparatus and articles of manufacture therefor, for annotating documents, and subsequently sharing such annotations and searching annotated document collections.
Web-based services are available on the Internet today that enable social tagging of web pages, such as Yahoo's MyWeb and del.icio.us. Such web-based services, allow users to tag web documents (such as web pages) of interest for sharing or later recalling the web documents by allowing users to bookmark a web document and attach a set of freely chosen tags (or keywords) to the web document. Also, users may elect to share their bookmarks or tags with other users, which may subsequently be searched and browsed by the other users.
In addition to allowing users to discover bookmarked pages via tags defined and shared by other users, data from social tagging can also be used to enhance document search. Social tagging systems, however, are limited as they do not account for the nature of the content of tagged web pages (e.g., that the content of web pages may be dynamic, or that the content of one web page may be similar to that of another web page). For example, unless a user reviews and updates tags to web documents previously defined and shared with other users, each user-specified tag associated with a URL (universal resource locator) remains the same, even as a sub-document element of the underlying content of the web page pointed to by the URL changes in a way that less accurately or no longer reflects reason why the tag was applied to the document.
Further, available social tagging systems do not account for the similarity between published web documents. For example, different web sites may publish the same or a very similar news story. Because available social tagging systems do not account for the similarity between published content in different web documents, they are not adapted to propagate tags to similar content. Such propagation of tagged information would advantageously simplify a user's effort to tag similarly published content. Also, available social tagging systems are not integrated within a web browser (or reader). Instead, available social tagging systems require users to access a web page that is independent of the web document that is being read. Such lack of interoperability encumbers the user's ability to refer to the content of a web document at the same time a tag is created or reviewed for the web document.
Accordingly, there continues to be a need for systems and methods for supporting in situ tagging of sub-document elements (such as paragraphs of web documents) and the sharing of such tagged data (or more generally annotated data). A solution for tagging sub-document elements of web documents that is integral to web browsers would advantageously reduce the amount of cognitive and interaction overhead that is required to annotate web pages. Further, by providing an integral solution that facilitates social tagging of web pages, users would advantageously be more likely to collaborate and share tagged data. Also, by propagating tags to web pages with similar content and accounting for the dynamicity of web pages, the integrity of the association between tags and sub-document elements of a web page is maintained.
Further, there continues to be a need for improved systems and methods for searching collections of documents that have been tagged (or more generally annotated) through collaborative tagging (or more generally collaborative annotation). It would therefore be advantageous to provide improved systems and methods for searching tag-based collections of documents to increase the accuracy and/or precision of searching such document collections.