Reading is a unique and essential human activity that furthers our collective knowledge and history. Reading is impacted by the complexity of the information environment in which it occurs. The over-abundance of information affects the material selected for reading, as well as the depth in which it is studied.
One of the major advantages of electronic text is that it is much easier to search for keywords within electronic text than ordinary text on paper. Arguably, the onset of web search engines that enable massive search over a large amount of electronic text is the most revolutionary information access development since the invention of the paper book.
The amount of available time and resources to understand written text is shrinking in people's ever-busying lives. These changes in their environment have directly affected the way people interact with written text. Increasingly, reading is occurring online in web logs (“blogs”) and on the Internet, and less so on paper. Moreover, readers tend to skim quickly for relevant information nuggets instead of analyzing a piece of text for deep meaning.
Readers are increasingly skimming instead of reading in depth. Skimming also occurs in re-reading activities, where the goal is to recall specific facts surrounding a topic. Bookmarks and highlighters were invented precisely to help achieve this goal. These fundamental shifts in reading patterns have motivated researchers to examine possibilities for enhancing modern-day reading activities. For all these skimming activities, readers need effective ways to direct their attention toward the most relevant passages within text.
Unfortunately, there are current deficiencies in reading/browsing interfaces. For example, current search technology typically allows only exact keyword matches. Once the search is performed, a list of search results is displayed to the users, and they are then allowed to select from this list. Since only exact keyword matches are given, users searching for the keyword “tennis” will only find articles that explicitly mention “tennis.” Articles that are highly relevant to tennis but do not contain many mentions of “tennis” will be ranked low or may be missed completely.
There is a large body of work in text processing and information retrieval, much of it based upon latent semantic analysis (LSA) and similar techniques, including search-related summarization. Key sentences can be identified in a document to use as a summary of that document.
One related technique is conceptual search, also known as associative search, i.e., finding documents that refer to concepts described by a given set of terms. Typically, conceptual searches are performed by first applying keyword expansion techniques from information retrieval systems to find related conceptual keywords, and then using these conceptual keywords to perform a search. Related conceptual keywords will also be included in the results of this search process.
The results lists generated by search engines do not highlight relevant passages, but they do highlight exact keyword matches. It is desirable to develop a method and system to direct user attention to sentences or portions of the document that are most relevant to the concepts described by the user's keywords, whether or not these sections explicitly include the user-specified keywords.
In traditional text search systems such as Google, search terms occurring in the retrieved documents are highlighted to give the user feedback. However, conceptually similar keywords are not highlighted, which could be computed by techniques such as LSA. LSA is the basis of a variety of document analysis and search techniques. One aspect of using LSA for text-based searches is that it can locate a document that may be highly relevant to the specified search terms and yet may not actually contain those terms. In other words, LSA can be used to model semantic similarity between documents and passages. It is desirable to develop a system capable of highlighting the most relevant search results regardless of whether and with what frequency the search terms themselves are contained in the results.
Another potential model for modeling semantic similarity between words and documents is the cognitive model called spreading activation, which models human memory retrieval. Spreading activation has been studied extensively for the purpose of both information retrieval and modeling human semantic memory. Spreading activation has also been shown to intelligently model user behavior in browsing a web site. Spreading activation has been shown in cognitive psychology research to simulate how humans retrieve memory chunks in the brain. Spreading activation can be used to simulate and predict the degree of similarity between two pieces of memory chunks.
Word co-occurrence models the relatedness of concepts and the semantic network of a body of text. Word co-occurrence has been used in statistical language processing, and is constructed by understanding how often conceptual keywords occur near each other in the text.