<Definition of Terms>
The terms used in the present specification and claims will be explained.
‘characteristic word’: A collective term for a set of one or more word units, such as single words or phrases, extracted by text mining technique.
‘characteristic measure’: The measure indicating to which extent each characteristic word is characteristic of a category in question. It is also termed ‘score’ in the field of text mining.
‘correcting’: Estimating a correct value from error containing result using confidence measure and statistical information is termed ‘correcting’.
<Text Mining Technique>
There is a text mining technique which excerpts such words or phrases that may be found frequently in a large quantity of texts, such as questionnaires or business reports, or that are meaningful from statistical criterion, to analyze useful information, such as text tendencies.
Non-Patent Document 1 shows an example of this sort of the text mining technique. With the text mining technique of Non-Patent Document 1, a number of texts classified in two or more categories at the outset are entered as input.
If once an input text is provided, the number of times of occurrences of a characteristic word (see the definition of the term given above) appearing in the text in each category is counted and, from the count result, a characteristic measure (see the definition of the term given above) of each characteristic word is calculated from one category to another.
As regards the characteristic measure, there are techniques of:                directly using the number of times of occurrences of each characteristic word, as the characteristic measure, and        modifying the number of times of occurrences of each characteristic word from one category to another using a statistical criteria such as the mutual information quantity.        
Non-Patent Document 1 uses a statistical quantity ‘ESC’, disclosed in Non-Patent Document 2, as the characteristic measure.
The text mining technique searches which characteristic word has a high characteristic measure in a given category, and uses the result of the search for marketing or business analyses. For example, it is assumed that, as a result of sorting questionnaires for cars from one car producer to another and performing text mining of texts on impression entertained for each car producer, the characteristic measure of a characteristic word ‘for ordinary people’ has been the highest in a category of a company A. In this case, it is seen that an impression ‘for ordinary people’ is strong for many respondents to questionnaires as a brand image of the company A. If, on the other hand, the characteristic measure of a characteristic word ‘for ordinary people’ is low and the characteristic measure of a characteristic word ‘high class’ is high in a category of a company B, it is seen that the brand image of the company B is not a maker for ordinary people but rather a maker for ‘high class’ cars.
Patent Document 1 discloses, as an apparatus that captures the customers' subjective information by the text mining technique for use in conducting more accurate data analyses, such a configuration that finds the statistical information, exemplified by the frequency of occurrences of a certain noun phrase in a sub-document or its distribution in the entire questionnaires. Patent Document 2 discloses a document processing apparatus that may improve the accuracy in extracting important words in electronic documents that may contain an error(s). In this case, the frequency of occurrence of a word is adopted as the evaluation information, which evaluation information is used for representing the importance measure. An importance measure correction unit calculates a value for correcting the importance measure in connection with the similarity measure and the importance measure of control words. However, the inventions described in the Patent Documents 1 and 2 differ from the present invention, as now described in detail, in the processing manner, configuration, operation and the meritorious effects.
Non-Patent Document 1:
K: Yamanishi and H. Li, “Mining open answers in questionnaire data”, IEEE Intelligent Systems, Sep./Oct., pp. 58-63, 2002
Non-Patent Document 2:
K: Yamanishi, “A Decision-Theoretic Extension of Stochastic Complexity and its Applications to Learning”, IEEE Trans. Information Theory, vol. 44, No. 4, July 1988, pp. 1424-1439
Non-Patent Document 3:
Frank Wessel et al., “Confidence Measures for Large Vocabulary Continuous Speech Recognition”, IEEE Trans. Speech and Audio Processing, vol. 9, No. 3, March 2001, pp. 288-298
Patent Document 1;
JP Patent Kokai Publication No. JP2004-164079A
Patent Document 2:
JP Patent Kokai Publication No. JP2005-173950A