The present invention relates to an apparatus and a method for supporting acquisition of information. In particular, the present invention relates to an apparatus and a method for supporting acquisition of information from a document including a plurality of words.
In recent years, accompanying increases in the capacities of storage media and the like, large amounts of document data (hereunder, referred to simply as “documents”) are being accumulated in computer systems. Therefore, various kinds of technology for obtaining characteristic concepts or novel opinions and the like have been proposed to effectively utilize such large amounts of accumulated documents.
One such technology uses technology in which: a clustering control unit executes document clustering processing or word clustering processing with respect to a document set consisting of documents designated by a user from among a plurality of documents; a category classification method setting unit sets, in a category storage unit, methods of classifying categories into which subsets among the document set generated by the clustering processing are classified; an automatic classification control unit determines a classification destination candidate category set and a classification target document set in accordance with a user operation, and based on the classification method of each category of a classification destination candidate category set that is set in the category storage unit, controls rule-based automatic classification processing and case-based automatic classification processing with respect to the classification destination candidate category set and the classification target document set.
Another known method and a system captures useful knowledge by extracting a concept having a unique characteristic from a large amount of data including documents, in which: a concept extracting apparatus extracts concepts by category from data including document data; and a characteristic concept extracting apparatus extracts characteristic concepts from among the extracted concepts, and with respect to the concepts in separate categories, from among concepts that belong to the same category, extracts a concept for which a proportion occupied by the concept among concepts belonging to a corresponding other category exceeds a preset value.
Another known technology utilizes a document processing apparatus that includes: a characterization unit for determining characteristic data for each of a plurality of items of document data; a clustering unit for, based on the characteristic data determined by the characterization unit, clustering the plurality of items of document data into a plurality of clusters that are each a set of similar items of document data; and an extraction unit for extracting a cluster that does not reach a predetermined level of similarity from the plurality of clusters obtained by the clustering unit.
Another known technology pre-associates a plurality of data sets with each other, and for each data item included in the data sets, obtains a specificity index having a correlation with respect to a data item being far apart in value from other data items and the frequency of the data item being small, compares the specificity index with a predetermined reference index, and selects a plurality of data items based on the comparison results and performs data mining utilizing the selected data items.