In natural language processing, performing contextual analysis such as anaphora resolution, coreference resolution, and dialog processing is an important task for the purpose of correctly understanding a document. It is a known fact that the use of procedural knowledge such as the notation of script by Schank and the notation of frame by Fillmore in contextual analysis proves effective. The procedural knowledge relates to what is the procedure following a certain series of procedures. A model that reproduces the procedural knowledge by a computer is a script model.
Conventionally, it has been developed that a sequence of pairs of a predicate and a case associating with each other (hereinafter the pair is called an “event slot”) is acquired from an arbitrary group of documents, case example data is produced from the event slot sequence, and a script model is constructed by performing machine learning using the case example data as training data.
The event slot sequence is composed of the event slots. The event slot is a combination of a predicate having a shared argument and a type of case of the shared argument. In the event slot sequence, the event slots are arranged in order of appearances of the predicates. The event slot, which is the element of the event slot sequence, varies in many types. In order to construct a script model with high accuracy by performing adequate learning, a huge amount of learning data equivalent to that model is required. The acquisition of a large amount of highly reliable learning data requires huge costs. There is concern that insufficient collection of learning data causes a lack of learning data and thus the constructed script model has low accuracy.