A synonymous expression dictionary is one of the language resources necessary for conducting accurate search in response to a query with a complex syntactic structure such as a natural sentence. Synonymous expressions typically need to be organized for each field of documents to be searched. Securing experts with special knowledge for a long time, however, requires significant manpower cost. Hence, there is demand for a technique of automatically organizing the synonymous expression dictionary.
Consider, in particular, automatic extraction of synonymous expressions of binary relations each of which is represented by a combination of a nominal and a predicate. For example, “dengen o ireru (activate the power)” and “dengen suitchi o tonyu suru (turn on the power switch)” are synonymous expressions of binary relations each of which is represented by a combination of a nominal and a predicate. A predicate included in an input binary relation is hereafter referred to as “input predicate”, and a nominal included in the input binary relation as “input nominal”.
As a technique of extracting synonymous expressions of binary relations, there is a method in which surrounding contexts of binary relations are collected from a document set as feature values and binary relations having similar feature values are extracted as synonymous expressions, as described in Non Patent Literature (NPL) 1. The surrounding contexts used here include a predicate modified by the input predicate and a nominal, other than the input nominal, that is in a case relation to the input predicate in the document set. For instance, from a sentence “daigaku o shuseki de sotsugyo shi kaisha ni shushoku suru (graduate from the university with top honors and enter the company)”, “shuseki de (with top honors)” and “shushoku suru (enter)” are acquired as feature values of a binary relation “daigaku o sotsugyo suru (graduate from the university)”.
As another technique of extracting synonymous expressions of binary relations, there is a method in which, for each of an input predicate pair and an input nominal pair, whether or not the pair are in a synonymous relation is separately assessed and, in the case of assessing that both the input predicate pair and the input nominal pair are synonymous, they are extracted as synonymous expressions. This can be done simply by applying a technique of extracting synonymous expressions of nominals described in NPL 2. NPL 2 describes a technique of collecting, as a feature value of each input nominal, a distribution of occurrence frequencies of predicates that are in binary relations to the input nominal in a document set, and extracting input nominals having similar feature values as synonymous expressions.