In recent years, for the purpose of marketing, trend survey, or unusual-situation monitoring or the like in a telephone record of a call center, investigation of a phenomenon and incident occurring in an attention period is requested. In the investigation like this, first, a collection of a document with respect to an object which a user wants to analyze (hereinafter, referred to as “analysis object document”) are collected. Then, from contents described in the analysis object document and an domain which is made to be an analysis object in the analysis object document, what kind of phenomenon and incident have arisen in the attention period is investigated.
As a technology for realizing the request of such investigation, a technology which carries out comparative analysis of a tendency of a document in the attention period and a tendency of a document in the past period before that based on a collection of the analysis object documents (time sequence document analytical technology) is known (refer to non-patent document 1, for example). Specifically, in the time sequence document analytical technology disclosed in the non-patent document 1, a feature expression which has seldom appeared in the past period, but appears in the attention period characteristically is extracted, and an analysis is performed based on the feature expression. Then, it is expected that the feature expression acquired by the time sequence document analytical technology disclosed in the non-patent document 1 (keywords etc., for example) indicates a phenomenon and an incident, etc. occurring in the attention period in the content described and the domain described in the analysis object document.
For example, it is assumed that a user investigates what kind of matters have become topics every month by making a blog including “health food A” be an analysis object. In this case, first, a collection of a blog including a description of “health food A” is acquired from the blog population as a collection of an analysis object document. Then, the collection of the acquired analysis object document (blog) is classified for every month based on the date of the blog, and furthermore, an appearance tendency of descriptive contents in the last month and this month is compared statistically. As the result, a user can know that feature expressions such as “herbal medicine”, “classification”, and “Northern Europe→new development” have appeared a great deal in November, 2009, as compared with the last month, for example. It becomes possible for a user to know efficiently a variation in an attention period in a domain which is made to be an analysis object by making such feature expressions be a clue.
Here, definitions of terms in the present specification will be described. A “feature expression” in the present specification means a linguistic expression which appears characteristically in a document collection which has become an attention object. Whether it corresponds to “appears characteristically” is determined from information, etc. of a document structure in each document such as a statistical deviation of appearance of the linguistic expression within the document collection, the document title, and the beginning of the document. A technology of seeking for such a linguistic expression which appears characteristically is a known technology for a person skilled in the art as a text-mining technology and a document abstract technology.
The linguistic expression means a chunk of one or more words cut from a text as a processing unit such as “word” and “phrase” etc. when an analysis of a text is carried out using a natural language processing technology. The linguistic expression may be what is acquired by performing a modification such as a synonym processing and a transformation processing which transforms a conjugational suffix into an end-form, for expressions which appear in the text. In addition, the linguistic expression may be what has a plurality of words and the information specifying the relation between the words, such as a dependency relation (example: “school”→“go”) and a sub-tree of a syntactic-analysis result.