The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for statistical analysis of documents with respect to facet.
Text mining is a technology for acquiring knowledge from a large amount of unstructured text data of documents without necessarily reading the entire content of the documents. A text mining system may analyze the unstructured text data, and extract facets, which are sets of words or phrases representing features of the documents. Further, the text mining system may narrow down the documents with queries (e.g., queries in natural language sentence search, queries in facet search), and perform various statistical analyses of the current documents (the narrowed-down documents) regarding the facets.
To acquire significant results of the text mining, one analysis process is insufficient and two analysis processes need to be executed. The two analysis processes may include the first analysis process of narrowing down documents into interesting documents and identifying words specific to the interesting documents, and the second analysis process of identifying the cause for appearance of the words.
However, since only the first analysis process is conventionally assumed to be executed, a problem arises that a user is not likely to acquire significant results of the text mining.