In order to know a kind of a certain phrase (for example, a kind, such a person's name or a place name), it may be inspected whether the corresponding phrase is included in various dictionaries according to individual kinds (for example, a person's name dictionary or a place name dictionary). For example, if the certain phrase is included in the person's name dictionary, it is possible to know that a kind of the corresponding phrase is a person's name. In this case, in order to know the kind of the certain phrase, a dictionary according to the kind is needed.
Here, the phrase indicates a word or a unity of a plurality of words. Examples of the unity of the plurality of words may include a phrase that is composed of a plurality of words, a proverb or an idiomatic phrase. In addition, a proper noun, such as a person's name or a place name, is included in the concept of the phrase, too. In addition, the dictionary is assumed as a list of phrases of the same kind.
As a method for creating the above dictionary, there is a method in which a person reads a large amount of various types of documents, classifies a large amount of phrases according to the kinds of the phrases, and registers the corresponding phrases in the dictionary. In this method, it is possible to create a dictionary having high reliability where phrases of the same kind are collected. However, since the work is made by person's hands, it is inconvenient for the person who creates the dictionary.
In addition, a dictionary creation method is disclosed in Non-patent Document 1. In the dictionary creation method that is disclosed in Non-patent Document 1, patterns are automatically created from a document group of the same format, words between the patterns are extracted, and the words are registered in the dictionary. In this case, the “document of the same format” is a document where phrases becoming extraction subjects in the document appear adjacent to the same pattern. In addition, the “pattern” is a character string that distinguishes phrases, which are included in the dictionary (targeted phrases), from phrases, which are not included in the dictionary. The patterns include a pattern that is located in front of the phrase becoming the extraction subject (hereinafter, referred to as forward pattern) and a pattern that is located in the back of the phrase becoming the extraction subject (hereinafter, referred to as backward pattern).
An example of the case where a dictionary of company names is created using the dictionary creation method disclosed in Non-patent Document 1 will now be described. First, a person collects a document group having the same format where company names are arranged in a form of a table and described. Next, the person selects several documents from the document group and creates a list of company names included in the document. Next, an information processing device automatically specifies a forward pattern and a backward pattern of the company names that appear in the previously selected document in accordance with a program, and extracts words (in this example, company names) that are interposed between the forward pattern and the backward pattern. Finally, the person registers the extracted words in the dictionary. As such, in the method that is disclosed in Non-patent Document 1, the person inputs the documents selected as samples and a list of all words appearing in the corresponding documents to the information processing device. As a result, the information processing device automatically creates the dictionary.
In addition, Patent Document 1 discloses a method in which, with respect to xay and xby that are obtained by coupling character strings x and y in front of and in the back of each of the two words a and b, a score function is defined, and relevance between the two words is determined.    [Non-patent Document 1] Nicholas Kushmerick [Wrapper induction: Efficiency and expressiveness], Artificial Intelligence 118 (2000), 2008, p. 15 to 68    [Patent Document 1] JP-A-2003-256447 (paragraphs 0029 to 0032)