In natural language text, anaphoric phenomena for (zero-)anaphors frequently occur. Let us take an example of text 30 in FIG. 1. Example text 30 consists of first and second sentences. The second sentence includes a referring expression (pronoun) 42 ┌┘ (it). Here, the referring expression refers back to the expression 40 ┌┘ (date of new year in MON calendar) in the first sentence. Such a process of identifying the word to which the referring expression refers back is called “anaphora resolution.” On the other hand, see another example of text 60 in FIG. 2. This example text 60 consists of first and second sentences. In the second sentence, the subject of the verb phrase ┌┘ (have self-diagnosis function) is omitted. Here, the portion 76 of the omitted subject corresponds to the words 72  (new exchangers) of the first sentence. Likewise, the subject of the verb phrase ┌200 ┘ (plan to install 200 systems) is omitted. At this omitted portion 74, the words 70 ┌N┘ (Company N) of the first sentence are omitted. Such a process of detecting zero anaphors and the like and to identify their antecedents is called “zero anaphora resolution.” In the following, anaphora resolution and zero anaphora resolution will be collectively referred to as “(zero-)anaphora resolution.”
In a field of so-called artificial intelligence, natural language processing is indispensable for realizing communication with humans. Machine translation and question-answering are major problems in natural language processing. The technique of (zero-)anaphora resolution is an essential technology for machine translation and question-answering. (Zero-) anaphora resolution, however, has not yet developed to a technical level sufficiently high to be used practically, though development varies depending on languages.
There are various reasons why it is difficult to improve performance of (zero-)anaphora resolution. One of the main reasons is that general knowledge is necessary for such resolution. To introduce general knowledge, however, it is necessary to consider human judgment regarding anaphora resolution. A resolution algorithm taking such factors into account, however, is difficult. Eventually, it becomes necessary to prepare a large number of human judgements as training data, and to build a resolver that performs (zero-)anaphora resolution through statistical learning. It has been known, however, that the cost for preparing training data to build such a resolver is prohibitively high. This leads to insufficient amount of training data and thus to insufficient performance of (zero-)anaphora resolution.
In order to enhance performance of natural language processing and to make artificial intelligence more intelligent, it is necessary to solve such a problem related to the (zero-)anaphora resolution.
Non-Patent Literature 1 listed below describes a technique of creating training data for (zero-)anaphora resolvers. According to this technique, the training data is created in the following manner. A human reads text from the beginning, and manually detects pronouns and zero anaphors. Machine assistance in this process is, by way of example, no more than listing antecedent candidates in advance.
Non-Patent Literature 2 listed below discloses a (zero-)anaphora resolution algorithm that automatically performs (zero-)anaphora resolution in accordance with a predetermined procedure. The technique disclosed in this article utilizes information output from morphological analysis, syntactic/dependency parsing (hereinafter simply referred to as “dependency parsing”) and the like, as well as external resource tools such as dictionaries. The dictionary used here is a collection of selectional restrictions, such as “objects of a verb ┌┘ (eat) are ┌┘ (foods).” The technique disclosed in Non-Patent Literature 2 additionally uses pieces of information obtained from text to identify antecedents for a given (zero-)anaphors.