1. Field of Invention
This invention is directed to an electronic document reading and skimming system. In particular, this invention is directed to a system that permits a person to rapidly and accurately skim a document to determine the relevance of the document. More specifically, this invention is directed to an electronic document reading and skimming system that varies emphasis attributes to present terms in a document in accordance with the degree with which the terms represent the content of the document.
2. Description of Related Art
Before a person decides to devote a significant amount of time reading a document, the reader tends to skim through the document to decide whether it is worth spending the time to read the entire document. Readers tend to quickly skim material to find terms in the text of the document that can give them a general idea of the overall content of the document. Skimming does not involve reading the entire document. Rather, skimming conventionally involves focusing on and reading only certain words in the text. Such a skimming technique is unreliable because the reader, when skimming, must assume that the portions of the text that are read indicate the content of the entire document. However, the read portions may or may not reflect the content of the entire document. If the read portions do not reflect the content of the document, skimming does not provide an accurate overview of the document and the reader may be misled.
Conventional electronic document reading support systems have focused on supporting the reading of documents rather than on the skimming of documents. One electronic reading support technique is called Rapid Sequential Visual Presentation (RSVP). RSVP displays the text one word at a time and rapidly overlays the words of the text onto the same space. RSVP displays all of the words in the text of the document and requires the reader to read all of the text. Therefore, RSVP supports reading rather than skimming. No distinction is made between the words of the text relative to the content of the document.
There are systems that analyze the degree to which each word of a text reflects the overall content of a document. Some of these systems rely upon inverse document frequency (IDF) calculations. IDF is a statistical technique that measures the ability of words to discriminate among documents in a collection. Although inverse document frequency is generally known, it is usually used only for determining document similarity. IDF is a technique that is described in "Introduction to Modern Information Retrieval", G. Salton et al., McGraw-Hill, 1983, incorporated by reference herein in its entirety. IDF is used to identify potential hypertext links in a dynamic hypertext application in a system described in "What the Query Told the Link: The Integration of Hypertext and Information Retrieval", G. Golovchinsky, Proceedings of Hypertext '97, April 1997, South Hampton, U.K., ACM Press, incorporated herein by reference in its entirety. In that work, however, the links were either present or absent and no intermediate gradation is available. In addition, the user interface was designed to support interactive browsing rather than skimming.
There are text summarization techniques that emphasize important passages visually. Such text summarization techniques are described, for example, in "Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Text", G. Salton et al. Science, 264(3), pp. 1421-1426, June 1994; "A Trainable Document Summarizer", J. Kupiec et al., Proceedings of SIGIR '95, July 1995, Pittsburgh, Pa., ACM Press; and "Variable Length On-Line Document Presentation", N. O'Donnell, Proceedings of the Sixth European Workshop on Natural Language Generation, March 1997, Duisburg, Germany, incorporated herein by reference in their entireties. However, these systems only provide summaries of the document. Full documents are not available to the users of these systems without additional, cognitively expensive, interface operations. An electronic document reading and skimming system is needed that allows the user to skim the document quickly by skimming for interesting terms, and at the same time, provides the user immediate access to the full text of the document.
Conventional information retrieval interfaces highlight terms that cause a document to be retrieved. Examples of such systems are described in "Super Book: An Automatic Tool for Information Exploration--Hypertext?", J. R. Remde et al., Proceedings of Hypertext '87, November 1987, Chapel Hill, N.C., ACM Press and "Queries? Links? Is There A Difference?", G. Golovchinsky, Proceedings of CHI '97, March 1997, Atlanta, Ga., ACM Press, incorporated herein by reference in their entireties. These systems highlight the search terms to indicate how the document was retrieved. The highlighted terms do not reflect the entire content of the retrieved document. Typically, these systems provide lists ranked in accordance with the frequency of the occurrence of the search terms within the individual documents. However, these highlighted terms do not necessarily reflect the content of the entire retrieved document.
A useful skimming tool would highlight or emphasize the text which generally reflects the content of an entire document and individual portions of a document. Such a tool would permit the user to rapidly skim the document and read only the more characteristic words of the document. Thus, a tool is needed that supports the skimming of a document by highlighting or emphasizing the terms that reflect the overall content of the document.