The World Wide Web (WWW), often referred as the Web, is a fast growing network that involves a vast quantity of data and numerous types of services aimed at accessing, organizing, and distributing that data. In particular, there are millions of documents on the Web and many on-line search services that enable the users to find documents that are of interest to them.
Furthermore, documents on the Web are linked via hyperlinks, created by the authors of the documents, which enable the users to browse through documents on their own by following the links that interest them.
The large quantity of the Web data and the fast rate of Web expansion have imminent implications on the ways the services on the Web can approach the problem of processing Web data.
Collecting and processing all or a majority of Web documents with an appropriate rate of updating the information that has been collected about these documents is often not feasible. Indeed, the processing power and the network bandwidth are not yet up to the task. However, there is also a more fundamental reason: because of the distributed nature of the data, the services are not in control of the document change—the authors of Web documents can change them at any time, as needed. That is why, among other reasons, search engines do not deliver the document text in response to the user's query. The search engines at best deliver the title and some type of summary of a document that is created by the search engine based on the version of the document available at the time the document was collected and indexed. The search engine points the user to the URL, i.e., the location of the document on the Web at the time the document was collected. It is up to the user then to execute the URL link and access the document text, which may or may not be the same as the text processed and summarized by the search engine.
This lack of control over the content of documents on the Web requires new approaches in providing some of the basic and commonly provided document management features of traditional document management systems. Such features include: marking of the query terminology in the document text to help the user identify the portions of the text that talk about the desired topic, to assess the document relevance to the topic, etc.; summarizing document text to extract most salient sentences or query specific portions of the text; analyzing the text to identify and extract entities that may be of particular interest to the user, e.g., person names, company names, locations, etc., or relations among these entities; creating various visual representations of the document to help with browsing through the document, assessing document relevance, etc.
Since the documents on the Web are frequently accessed in the browsing mode by following the hyperlinks in the documents, the same type of document management support is needed for browsing among and through Web documents.
Furthermore, since the type and the quality of services on the Web vary, the users on the Web often need to explore which of them can handle best a particular request for information. For example, if the user is engaging a couple of search engines to find certain types of documents, this often involves retyping the query in the appropriate search window of the individual search engines. There is a need for a facility that can assist the user in specifying the user's information need and that creates various representations of that need suitable for interfacing with various Web services.
In summary, there is a need to provide the user with the facilities for obtaining better information regarding the relevancy of documents pointed to by various services on the Web or accessed by browsing the Web documents. There is a further need to provide such information based on the current versions of the documents. There is still a further need to provide the user with a consistent manner in which such relevancy is identified regardless of the way the document is accessed (based on Web service information or browsing or the combination of). There is yet a further need to provide a rich representation of the user's information need.