1. Field of Invention
This invention relates to systems and methods for organizing a collection of electronic text passages.
2. Description of Related Art
Document retrieval systems, such as World-Wide Web search engines, typically produce a set of result documents in response to a user""s query. These search results are organized as a linear list of documents, typically ranked according to a degree of matching with the query. The documents are typically displayed by document title, and, in some cases, are accompanied with a short extract from the beginning of the document, or an excerpted summary that is obtained from the document. The user navigates by viewing the list of titles and/or the extracted text, and successively accessing the documents in an arbitrary order. Words in the extracted documents that correspond to the words used in the query may be highlighted to facilitate review of the document by the user.
U.S. Pat. No. 5,708,825 discloses a system that uses automatically-identified terms to navigate or index document content, without requiring a query to be supplied by a user. This system automatically produces term-based indices. The indexed terms are presented as an alphabetically ordered list.
U.S. Pat. Nos. 5,519,608 and 5,696,962 describe document retrieval systems in which a user inputs a query in natural language, and in which terms are produced that are responsive to the query. The terms are called xe2x80x9canswer hypothesesxe2x80x9d because they are chosen as being possible answers when specific questions are input.
The World-Wide Web search engine Excite produces words or terms as an aid to the user in formulating a new query. In this system, search results are presented traditionally, as simple ranked lists of document titles, each with attendant summary information intended to be representative of the document as a whole.
The Hyper-Index Browser Prototype generates a xe2x80x9chyper-indexxe2x80x9d from the search results for a query and allows navigation by terms created from the search results, and also uses the terms for purposes of query expansion. It appears that all result terms shown to the user contain words that were part of the query. It further appears that all terms presented to the user must include all of the query terms.
U.S. Pat. Nos. 4,972,349 and 5,062,074 describe methods that recursively segment a document collection into separate non-overlapping groups of whole documents. Each new group is determined by the most frequently occurring word occurring in the current group, and labeled by that word. The recursive application of this method yields a hierarchical, or xe2x80x9ctreexe2x80x9d, description. This hierarchy is organized according to a maximum frequency count of a word.
This invention provides systems and methods for organizing text content of one or more text passages, such as text passages obtained in response to a search query, and/or other text passages, not obtained in response to a search query, using an organization based on concept terms obtained from the one or more text passages.
This invention separately provides methods and/or systems for organizing text content of at least one text passage, which may or may not have been obtained in response to a search query.
A hierarchical structure is used to organize the documents in a way that informs the user about co-occurrence relationships among terms that represent concepts, indicating the relative degree of co-occurrence and context of discussion of the terms within the search results.
In various exemplary embodiments, a plurality of terms from the at least one text passage are automatically selected, and at least some of the plurality of selected terms are organized into a hierarchy according to co-occurrence relationships among the some of the plurality of terms. The hierarchy is then displayed.
Before displaying a final hierarchy, one or more candidate hierarchies may be generated, with one or more respective candidate terms placed in the most-dominant position of the hierarchy or respective hierarchies. The one or more candidate hierarchies can be evaluated, and a final hierarchy for display can be selected based on the evaluation.
Selectable elements may be associated with at least one term of a hierarchy such that, when the selectable element is selected, a text passage associated with the term is displayed. In some exemplary embodiments, the display space required to indicate the content of many documents is reduced. This allows a user to view more results in a given display frame of a display device.
In some exemplary embodiments, terms are used that expose terminology contained in search results. This improves user feedback and provides the user with at least a preliminary indication of the content of the results, beyond the terminology used in a search query.
In some exemplary embodiments, organization continues until the text has been broken into the smallest possible concepts. This provides a finer level of description.
In the systems and methods according to this invention, document content can be summarized with or without a query supplied by a user. Furthermore, the internal content of documents, rather than entire documents, can be organized. This allows a finer level of description.
Additionally, terms can be organized according to their co-occurrence with other terms in a document or group of documents. This allows a finer level of description than when words or terms are organized only their individual maximum frequency in a given group of documents.
Furthermore, in the systems and methods according to this invention, rather than relying on a single frequently-occurring word to label a group of different documents, a label term is used to label text units containing that term. The relation between a label term and a text unit containing the label term is therefore more clear than in the above-described prior method that uses a single label to characterize a group of whole documents.
Additionally, according to this invention, text units from a document may be referred to from arbitrary places in the tree. For example, the text units reached from a selectable element associated with a particular term may freely mix the content of several different documents. This provides a more useful organization than in the above-described prior methods in which, once a document is assigned to a label, that document""s content cannot be referred to by any parts of the tree that are not dominated by the label. Furthermore, according to this invention, document content need not be segmented into non-overlapping groups. Rather, overlapping tree relationships can be built on the same content.
These and other features and advantages of this invention are described in or are apparent from the following detailed description of exemplary embodiments.