A language model learning method using a conventional technique will be described.
In the conventional language model learning method, the language model is expressed with N-gram model as depicted in pp. 57-62 of Non-Patent Document 1, for example. In the N-gram model, the appearance probability of a word string configured with N-number of words is approximated by the probability of appearing the N-th word after a word string of (N−1)-number of words as a history. Provided that the word string is configured with a single and a plurality words or a character string of smaller than a word, the N-gram model can be computed with the maximum likelihood estimation when there is a learning corpus that is mass-capacity text data.
FIG. 6 shows a structure of the language model learning device system that is formed with such conventional technique. According to FIG. 6, the conventional language model learning system is configured with a text data storage device 107, a word string number counting device 105, a language model parameter updating device 301, and a language model storage device 110.
The word string number counting device 105 extracts all word strings configured with N-number of words from text data that is the learning corpus stored in the text data storage device 107, and computes the appearance number by each type of the word strings. For example, regarding a word string “of the” in which two words “of” and “the” are linked, the word string number counting device 105 computes how many times the word string “of the” appears in the text data.
The language model parameter updating device 301 computes the appearance probability of the word string by dividing the appearance number of the target word string by the number of all word strings. That is, the appearance number of the word string “of the” corresponds to the value that is obtained by dividing the appearance number of the word string “of the” by the total number of the two-word chains. In the cases of speech recognition, the conditional probability is used in a process of decoding. Provided that the probability of appearing “the” after “of” is “P(the|of)” and the joint probability of appearing the word string “of the” is “P(of, the)”, for example, it can be computed as “P(the|of)=P(of, the)/P(of)” by using the Bayes' theorem. Note here that “P(of)” means the probability of appearing the word “of”.
Non-Patent Document 1: “Language and Computation 4: Probabilistic Language Model”, Kenji KITA, University of Tokyo Press, 1999