A speech recognition dictionary compilation assisting system of a related art will be described below. As shown in FIG. 6, the speech recognition dictionary compilation assisting system comprises text analysis means 201, frequency of occurrence counting means 202, updating means 203, background dictionary storing means 204, speech recognition dictionary storing means 205 and language model storing means 206.
The speech recognition dictionary compilation assisting system having such a constitution as described above operates in the following manner.
The text analysis means 201 receives text data, including a vocabulary that is subject to speech recognition, from the outside and processes the text data by applying morphological analysis using a dictionary stored in the background dictionary storing means 204, so as to divide the text data into a sequence of words, give pronunciation character string to it, attach a tag indicating the part of speech as required, and sends the data resulting from these operations to the frequency of occurrence counting means 202. The frequency of occurrence counting means 202 receives the sequence of words from the text analysis means 201, counts the number of times that each word appears and sends the result to the updating means 203. The updating means 203 calculates the probability of each word to appear, from the frequency that the word is used received from the frequency of occurrence counting means 202, compares a calculated probability with a occurrence probability of the word stored in the language model storing means 206, and corrects the occurrence probability stored in the language model storing means 206, so that the stored value of probability becomes proximate to that calculated from the text data. The updating means 203 also checks, for those of the words used in the text data that have occurrence probabilities higher than certain level, to see whether they are already entered in the speech recognition dictionary stored in the speech recognition dictionary storing means 205, so as to regard words that have not been entered in the speech recognition dictionary as unknown words and memorize the words and the occurrence probabilities thereof in the speech recognition dictionary storing means 205 and in the language model storing means 206.
It is a usual practice for the frequency of occurrence counting means 202 to count the frequency that a string of two or three consecutive words appears, in addition to the counting of the frequency of individual words. It is also a usual practice to provide the updating means 203 or the like with an interface used to correct the boundary between words and manually input the pronunciation in such cases as words are divided incorrectly or wrong pronunciation is allocated to a word during the morphological analysis of the text analysis means 201 (refer to Patent Document 1).
Another example of the speech recognition dictionary compilation assisting system of the related art is described in Patent Document 1. FIG. 7 shows one that is modified to allow comparison of the speech recognition dictionary compilation assisting system of Patent Document 1 with FIG. 6. The speech recognition dictionary compilation assisting system comprises character string comparison means 301, unknown word extracting means 302, updating means 303, speech recognition dictionary storing means 305 and language model storing means 306, and is characterized by the use of the result of correcting wrong recognition, rather than detecting unknown words by applying a statistical technique.
The speech recognition dictionary compilation assisting system having such a constitution as described above operates in the following manner.
That is, the character string comparison means 301 receives, from the outside, a speech recognition dictionary stored in the speech recognition dictionary storing means 305, text data resulted from speech recognition obtained through recognition of the speech to be recognized by using the speech recognition means, that is not shown, including a language model stored in the language model storing means 306 as a constituent element, and the text data having the wrong recognition corrected that is obtained by manually correcting the recognition errors included in the text data resulted from speech recognition, and extracts a word or a string of words in such a form that includes the recognition error at each position where difference has occurred, that is at each position where the recognition error has occurred, while sending the result of extraction to the unknown word extracting means 302. The unknown word extracting means 302 checks each of the words or the strings of words received from the character string comparison means 301 to see whether the word or the string of words is included in the speech recognition dictionary stored in the speech recognition dictionary storing means 305 and, if not, enters the word or the string of words as a new word in the speech recognition dictionary storing means 305. The unknown word extracting means also stores the new word and the occurrence probability thereof in the language model storing means 306
Patent Documents 2 to 4 describe other methods for extracting and entering unknown words in the speech recognition dictionary. Patent Document 2 discloses an unknown word entering apparatus that enables it to extract words by applying morphological analysis to a text file which includes unknown word and enter a word not found in the speech recognition dictionary after allocating the pronunciation and part of speech thereto by making reference to a background dictionary. Patent Documents 3 and 4 disclose unknown word entering apparatuses that have functions to conjecture the part of speech and pronunciation of the unknown word and automatically enter the unknown word in the dictionary.
Patent Document 5 discloses a method of counting the frequency of occurrence of words in pages collected from the World Wide Web sites, and updating the order of priority in selecting among words of the same pronunciation entered in the speech recognition dictionary.
Patent Document 6 discloses an acoustic model managing server and a language model managing server that send vocal utterance models (acoustic model and language model) used in collation with input speech data to a speech recognition apparatus, the acoustic model managing server and the language model managing server having a function to periodically update the acoustic model and the language
Patent Document 7 is also cited as a background technology of the present invention. While Patent Document 7 relates to a speech recognition apparatus, it also describes a method of generating a phoneme string from an unknown word that is not included in a background dictionary (morphological analysis dictionary).    [Patent Document 1] Japanese Patent Kokai Publication No. JP-P2002-229585A    [Patent Document 2] Japanese Patent Kokai Publication No. JP-P2003-316376A    [Patent Document 3] Japanese Patent Kokai Publication No. JP-P2004-265440A    [Patent Document 4] Japanese Patent Kokai Publication No. JP-P2002-014693A    [Patent Document 5] Japanese Patent Kokai Publication No. JP-P2005-099741A    [Patent Document 6] Japanese Patent Kokai Publication No. JP-P2002-091477A    [Patent Document 7] Japanese Patent Kokai Publication No. JP-P2004-294542A