With the ever-growing diversity and number of information sources, each day brings more and more total information through which to sift. Our ability to identify and aggregate only the most relevant pieces information from diverse sources has not kept pace with the dramatic increase in the sheer volume of available information.
In addition, information may also be very time-sensitive, in which case prompt and systematic review of large volumes of information may be very difficult to achieve. Meanwhile, both the importance and value of this information drops steeply if it cannot be mined in a timely way. Accordingly, there is a need for locating relevant information from a volume of information in prompt, effective manner.
The task of targeting and aggregating small but specific chunks of information from within documents and across diverse sources presents the following challenges. First, the particular information of interest within a document is often surrounded by pages of dense prose, much of which may or may not be of interest. Second, the most relevant information may need to be extracted from across diverse sources and presented collectively so that it can be effectively examined. Third, the daunting problem of how much context to include for each extracted piece of information must be solved.
Digital reading systems have been developed that permit users to access diverse documents electronically by loading them into a digital workspace. This enables users to interact with documents electronically. However, merely facilitating the availability of documents in electronic form may help somewhat but brings its own potential for “information overload”. It does not enable the user to rapidly “home in” on the topic of interest, and simultaneously creates the potential for “false leads” which waste time in follow-up.
In addition, there are particular difficulties when the user already has a collection of documents that may be deemed “equally” relevant—as by the mere return of results from a search—but is faced with time constraints, with seemingly repetitive information which in fact conceals subtle and important differences, and with the sheer volume of information.
In an effort to help address the issue of content comparison, document page retrieval systems have been developed which provide both keyword search and a zoom-in interface. In this approach, the process begins with a continuous tree-map visualization of a document collection. As the user types a search keyword, the view can be limited only to those documents that matched the query. The user can then zoom-in on a document and begin reading the page where the match is found. The user can then click on another document that had a match and zoom in on it and read it. Such concept is shown in “UC: A Fluid Interface for Personal Digital Libraries”, and “A Document Corpus Browser for In-Depth Reading”, which relate to interface systems.
While this approach provides some means for the comparison of content page-to-page, the method for aggregation of content is fixed (to pages) and inflexible (units of content cannot be altered). The method is fixed in that the user is presented with an entire page from a given source, even though the interest might be a single sentence on that page. The method is inflexible in that there is no mechanism to include/exclude appropriate relevant context. The user has to jump from one page to another page and cannot juxtapose only the relevant pieces of information from multiple sources. Also, relevant pieces of information may be imbedded in text that may not be relevant to cross-document comparability.