With the advent of popular use of so-called search engines over the Internet, language processing techniques have been so advanced that considerably high level of services are being available for searching of “written information.” In contrast, the techniques for inferring “unwritten but useful information” as a hypothesis are still insufficient. This type of technique may be the “technique to enable a system to consider” and it will be the core technique for information services of the next generation. Though such techniques have long been studied as a kind of “artificial intelligence” both in private and public sectors, practically or commercially usable level are yet to be achieved.
So-called sentences or phrases are basic elements for the language processing technique. (In the present specification, the language to be processes is Japanese language and a “phrase” refers to a noun and a predicate connected by a postpositional particle. When a language other than Japanese is to be processed, a unit equivalent to the “phrase” in Japanese is to be processed.) The sentences or phrases describe some piece of event or action in natural language. Between some sentences or phrases, we can sometimes find causality. For example, what is expressed by a phrase “tabako-wo-suu (smoke a cigarette)” is found to be a cause of what is expressed by the phrase “hai-gan ni kakaru (suffer from lung cancer).” We can also find a semantically contradictory relation between sentences or phrases, such as the relation between “seihin wo siyou suru (use a product)” and “seihin wo haki suru (scrap a product).”
Such relations between sentences or phrases can readily be recognized by humans. However, this will pose a challenge when it is to be processed by automated language processing. For instance, if the technical level is to be enhanced in conjunction with information service related to language, information analysis and language processing so that it is well integrated with higher functions such as inference, it is considered to be essential to have capabilities of highly accurate recognition of the above-described relations between sentences or phrases. Up to the present date, however, no technique has succeeded to find the above-described relations with high accuracy from a full spectrum of language expressions.
Non-Patent Literatures 1 to 7 listed below describe prior art related to elements and components of such a technique.
(A) A Scheme for Automatically Recognizing Causality Between Phrases
Non-Patent Literature 1 describes a technique for obtaining unknown causality through machine learning from among a huge amount of manually prepared causality examples. Examples in Japanese include automatic recognition of relations between phrases using occurrence of conjunctions such as “tame (since)” and “node (hence)” explicitly expressing causality in texts as clues (Non-Patent Literature 2).
(B) A Scheme for Automatically Recognizing Contradiction Between Phrases
A scheme using manually prepared dictionary such as WordNet is available (Non-Patent Literature 3).
(C) A Scheme for Classifying Verbs
Researchers have been trying to classify verbs based on whether a verb in a noun-verb combination describes an event that enhances or stimulates a function or effect or the like of the object indicated by the noun, or to automatically acquire verbs having such nature (for example, Non-Patent Literatures 4 and 5).
(D) A Scheme for Generating Hypothesis Based on Language
A technique of generating a hypothesis in relation to a specific semantic relation, e.g. causality, between words has been known (Non-Patent Literature 6). By way of example, “cholesterol” and “arterial sclerosis” has a causal connection, and if causal connection between “arterial sclerosis” and “cerebral infarction” is recorded in a database, these causal connections are combined and a new hypothesis that “cholesterol” is a cause of “cerebral infarction” is inferred.
(E) A Scheme for Automatically Recognizing Synonyms and Entailments Between Phrases
Conventionally, regarding recognition of synonymity or entailments between words such as verbs, or between patterns such as “A causes B”, a technique has been known in which probability distribution of words occurring near a word of interest or occurring at slots occupied by variables such as A or B in patterns is calculated, and statistical similarity (referred to as “distributional similarity”) among them is utilized (Non-Patent Literature 7). For example, a pattern “A causes B” is recognized as substantially synonymous to “A is a cause of B.” According to this technique, such synonymity is acquired by finding the frequencies of occurrence of a series of nouns such as “dioxin” and “cancer” that appear at the slots of A and B, and then utilizing the similarity of their occurrence probabilities distribution.