1. Field of the Invention
The present invention relates to a text mining server and a text mining program for automatically extracting characteristic words.
2. Background Art
Text mining is an effective means for obtaining significant information from a mass of document information. Among various methods of text mining, one effective method extracts characteristic words and makes a list of them. In this method, words are extracted from documents relative to inputted document IDs and weighted, and then words having high weight are listed as characteristic words. The weighting can be realized by using tf (Term Frequency) and idf (Inverse Document Frequency) as weight, for example. The tf and idf is a method in which when T(W) represents the total number of documents that include a word W, N represents the total number of documents, and F(W, Q) represents the frequency of appearance of the word W in a document Q, the level of importance of the word W in the document Q is defined by “F(W, Q)*Log[N/T(W)]”. F(W, Q) corresponds to the tf, and Log[N/T(W)] corresponds to the idf.
The following is a flow of text mining where characteristic words are listed. Document IDs are transmitted from a client computer to a server computer. The server computer extracts characteristic words from document information that has the received document IDs using a characteristic word extraction program and the server computer obtains a characteristic word list. The characteristic word list is transmitted to the client computer, and the client computer receives the transmitted mining results and displays them, thereby ending mining. Documents relating to the text mining include the following Patent Document 1.
Patent Document 1: JP Patent Publication (Kokai) No. 2004-152035 A