The present invention relates to a sentence classification device and method and, more particularly, to a sentence classification device and method which classify documents in accordance with the contents of the respective documents and visualize/output the classification result.
In the highly information-oriented society, with advances in information processing and communication technologies, there is being provided an environment in which an enormous amount of computerized information can be easily acquired. The information acquired by using such an environment is also enormous in data amount, and hence desired information needs to be efficiently and accurately comprehended.
As a technique of analyzing the contents of information, a technique of classifying documents constituting each piece of information in accordance with the contents of the documents has been studied.
As a technique of classifying documents, there has been proposed a technique of preparing labels indicating the contents of classifications in advance, analyzing the contents of the respective documents according to a predetermined algorithm, and classifying the respective documents for each prepared label (for example, Masaaki Nagata, “Text Classification—Learning Theory Sample Fair”, Johoshori, Volume 42, first issue, January 2001).
According to such a technique, when documents are to be classified, labels indicating the contents of classifications are prepared, and the labels are accurately assigned to the respective documents by using various kinds of learning algorithms, thereby classifying the respective documents for each label.
According to such a sentence classification technique, in order to automatically display the result of classifying the respective documents on a screen, a technique of structurally visualizing the relationship between the respective documents is required. Conventionally, as a conventional technique of visualizing document classification results, there has been provided a technique of obtaining the degrees of relevance between elements as a plurality of documents of two document sets, and displaying the degree of relevance between two elements at the intersection of arbitrary elements (see, for example, Japanese Patent Laid-Open No. 2003-345811). In addition, there has been proposed a technique of visualizing keywords extracted on the basis of the co-occurrence of words (see, for example, Yukio Ohsawa et al., “KeyGraph: Automatic Indexing by Segmenting and Unifing Co-occurrence Graphs”, THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, Vol. J82-D1, No. 2, pp. 391-400, 1999, and Masami Hara et al., “Keyword Extraction Using Word Co-occurrences and Partial Word Matching”, IPSJSIG Technical Report, NL106, p. 16, 1995).
According to this conventional technique, however, since the relationship between words contained in documents are analyzed and visualized as a network (graph), the importance of each of a plurality of sentences contained in documents or the relationship between sentences cannot be automatically visualized.