1. Field of the Invention
The present invention relates to a voice recognition method and a word prediction method of use with the voice recognition, and more particularly to a method of predicting words using a structural language model to make the voice recognition.
2. Brief Description of the Prior Art
In the voice recognition, a language model for controlling the linguistic information to make the word prediction or the like is employed. A typical statistical language model commonly used in these days is an n-gram model. The n-gram model predicts words successively from the top to the end of a sentence. And the probability of a sequence of n words is calculated (learned) beforehand, and the score (likelihood) of a composition actually spoken is calculated.
Accordingly, with the n-gram model, to predict a certain word, n−1 words prior to the word are referred to, whereby the word is statistically predicted. However, the value of n or the reference range is fixed irrespective of the words to be referred to.
On the contrary, a variable memory length Markov model is provided as the model making the reference range variable for a linear history. This is an extension of the n-gram model having the fixed reference range.
In this variable memory length Markov model, the reference history is selectively lengthened, only when the prediction precision is expected to be improved. For instance, in the variable memory length Markov model, when a preceding word directly before a word of prediction object is “this”, the word before “this” is not distinguished, like the word 2-gram model, and when the preceding word directly before the word of prediction object is “of”, the word before “of” is distinguished, like the word 3-gram model. Further, it is possible to distinguish directly preceding three words, like the 4-gram model, depending on the directly preceding two words.
Generally, when the n-gram model and the variable memory length Markov model requiring the same size of storage area are compared, the variable memory length Markov model has a higher prediction power. Also, when the n-gram model and the variable memory length Markov model that are estimated from the same learning corpus are compared, the variable memory length Markov model has a higher prediction power.
By the way, with a technique of voice speaking and its processing (voice language understanding), the estimation of syntactic structure is important, in addition to the word prediction made in the voice recognition. On one hand, in the n-gram model and the variable memory length Markov model, the sentence is regarded as the word string without structure. Thus, for the purpose of estimating the syntactic structure, some structural language models have been offered. The examples of structural language model are described in the documents as below.
Document 1: Ciprian Chelba and Frederick Jelinek, Exploiting Syntactic Structure for Language Modeling, In Proceedings of the 17th International Conference on Computational Linguistics, pages 225-231, 1998
Document 2: Shinsuke Mori, Masafumi Nishimura, Nobuyasu Itoh, Shiho Ogino, and Hideo Watanabe, A stochastic parser based on a structural word prediction model, In Proceedings of the 18th International Conference on Computational Linguistics, pages 558-564, 2000
In these structural language models, like the n-gram model, etc., the words are predicted in succession from the top to the end of a sentence. However, the sentence is not a simple word string, but represented as a tree having words at leaves. Accordingly, in predicting each word, the history to be referred to is not a word string, but is a partial parse tree covering words from the top of sentence up to the word directly before the word of prediction object.
In the above document 1, a method of predicting words from the history of tree structure has been disclosed in which the next word is predicted from the rightmost two head words in the history (Chelba & Jelinek model). Also, in the document 2, another method has been disclosed in which a word is predicted based on the words involving the word of prediction object and the relevant words.