1. Field of the Invention
This invention relates to apparatus for enabling a user to extract salient information from a text.
2. Description of the Related Art
U.S. Pat. No. 6,167,368 describes a method and a system for identifying significant topics in a document. In this method, the document text is tagged with part-of-speech tags (POS tags) by use of a publically available part of speech tagger so that each word in the text is associated with a tag representing the corresponding part of speech. The tagged text is then parsed by a parser that operates to extract noun phrases and, after discarding duplicate entries and pronouns, the head of each noun phrase is detected and each newly detected head is assigned a group number. The noun phrases are then clustered into groups by head in accordance with the assigned group numbers and the clusters are ranked in accordance with the frequency of occurrence of the heads.
After the grouping and ranking, the method disclosed in U.S. Pat. No. 6,167,368 provides one of two different types of output mode. In a first output mode, the groups of noun phrases are output so that each phrase in a group is listed together with a sentence number and a sequential phrase identifier with the different groups being listed in order of frequency. In the other output mode, the method simply outputs a list of the head nouns with the frequency of occurrence of each head noun indicated in brackets after the head noun.