There is a device that recognizes input voice using a text included in a file or a web page. An example of the above device includes a device that calculates a similarity between a voice signal indicating input voice and a word included in a text or a character string in which words are connected and judges a word or a character string when the calculated similarity exceeds a threshold value as a word or a character string corresponding to the voice signal.
Further, another example of the device that recognizes the input voice using a text includes a device that generates all connection patterns of words included in a text and registers the generated connection patterns in a dictionary that may be used to recognize the voice to generate a dictionary. The device that generates a dictionary compares the connection pattern registered in the dictionary with the voice signal indicating input voice to recognize the voice. In addition, when the number of words included in the text is n, the device that generates a dictionary generates the connection patterns as many as the sum of one to n.
Furthermore, another example of the device that recognizes the input voice using a text includes a device that re-trains a language model by an N-gram. The device that re-trains a language model increases a probability of connected words in a text with respect to a word string in which words are connected, with respect to the language model trained from a corpus. In this case, the device that re-trains a language model generates patterns as many as N-th power of the number of words present in the text and increases the probability of words connected with respect to the language model using the generated patterns.
Patent Document 1 Japanese Laid-open Patent Publication No. 2002-41081
Patent Document 2 Japanese Laid-open Patent Publication No. 2002-342323
However, the device according to the related art does not precisely recognize the voice. According to a specific example, when a voice which is not registered in the dictionary as a connection pattern is input, the precision of the recognition result of the input voice of the above-mentioned device that generates a dictionary is low. This is because even though a pattern of combination of adjacent words included in the text is included in the connection patterns registered in the dictionary, a pattern of the combination of words which are included in the text, but are not adjacent to each other, is not included in the connection patterns registered in the dictionary.
Further, in the above-mentioned device that re-trains the language, since the patterns are generated as many as N-th power of the number of words present in the text, the amount of information of the patterns to be generated is large.