This application claims priority from Japanese Application No. 11-243311, filed on Aug. 30, 1999, which is hereby incorporated by reference as if fully set forth herein.
The present invention generally relates to a speech recognition apparatus and method for recognizing an utterance and transforming it into sentences, and in particular, for automatically inserting a xe2x80x9c,xe2x80x9d or a xe2x80x9c.xe2x80x9d when preparing text data.
The method by which to perform a statistical process to automatically insert punctuation marks when performing a speech recognition procedure is well known. A method for automatically inserting a xe2x80x9c,xe2x80x9d or a xe2x80x9c.xe2x80x9d into text data obtained through speech recognition is disclosed in, for example, xe2x80x9cWord-based Approach To Large-vocabulary Continuous Speech Recognition For Japanese,xe2x80x9d Nishimura, et al., Information Processing Institute Thesis, Vol. 40, No. 4, Apr. 1999, and Japanese Unexamined Patent Publications No. Hei 10-301930 and No. Hei 7-191690. In addition, the estimation of the N-gram model used for speech recognition is disclosed on page 15 of IBM Via Voice, Practice Edition (issued by Info Creates Publishing Department on Sep. 30, 1998).
But for this method, a language model for the prediction of punctuation marks, and a special pronunciation dictionary must be prepared as part of a recognition task. Specifically, according to this method, in order for punctuation marks to be automatically inserted, a large memory area of several tens of MB (e.g., 60 MB or greater) must be prepared for the automatic punctuation insertion processing, in addition to the memory area that is required for other, unrelated tasks.
Furthermore, while for a common speech recognition apparatus (a dictation system) it is assumed that a user will be able to select a punctuation insertion function as needed, in actuality, to make such a selection a user must restart a program, and thus, an extended period of time is required to switch to this function.
A need has thus been recognized in connection with improving upon the deficiencies presented by the current practices.
The present invention broadly contemplates a speech recognition apparatus and a method therefor, and in particular, for employing both a general-purpose vocabulary/language model and a specialized vocabulary/language model to insert a symbol such as a xe2x80x9c,xe2x80x9d or a xe2x80x9c.xe2x80x9d at an appropriate location in a sentence.
In accordance with one aspect of the invention, a speech recognition apparatus is provided which comprises: a transformer for transforming sequences of phonemes extracted from an utterance into one or more word sequences, and for assigning to the word sequences appearance probabilities, in accordance with which the word sequences are originally represented by the phoneme sequences; a renewer for renewing the appearance probability assigned to each of the word sequences by employing a renewal value represented by a language model corresponding to each of the word sequences; and a speech recognizer for selecting the word sequence having the highest appearance probability, by screening all the word sequences assigned appearance probabilities, according to which the word sequences are originally represented by the phoneme sequences, wherein the renewer calculates the renewal value using a first language model which is employed when each of the word sequences always includes a specific symbol as a word and a second language model which is employed in other situations to renew the appearance probabilities based on the renewal value.
In another aspect of the invention, the first language model is demonstratively reliable, in that when each of the word sequences includes a specific symbol, such as a word, each of the word sequences was originally represented by one of the phoneme sequences, and the second language model is also demonstratively reliable, in that when other situations are encountered, each of the word sequences was originally represented by one of the phoneme sequences. Further, the renewer calculates the renewal value based on the first and the second language models, and employs the renewal value to renew the appearance probability assigned to each of the sequences, so as to reflect the reliability demonstrated by the word sequences having been represented originally by the phoneme sequences.
In another aspect of the invention, the first language model which is demonstratively reliable in that when each of the word sequences includes a specific symbol, analogous to a word, each of the word sequences includes one or more words in the order for each of the word sequences. The second language model is demonstratively reliable in that in other situations the pertinent word sequence includes one or more words arranged in the same order as that of each of the word sequences. The renewer calculates the renewal value based on the first and the second language models, and employs the renewal value to renew the appearance probability assigned to each of the sequences, so as to demonstrate the reliability for each of the word sequences that was originally represented by the phoneme sequences.
In another aspect of the invention, in order to include the symbol in the speech recognition results that are obtained, the phoneme sequence/word sequence transformer transforms the phoneme sequences into one or more word sequences, or in the other situation, transforms the phoneme sequences into one or more word sequences each of which includes words other than the symbol, and provides the appearance probability. In order to include the symbol in the results obtained by the speech recognition process, the renewer employs the first and the second language models to renew the appearance probability assigned to each of the word sequences, or in the other situation, employs only the second language model to renew the appearance probability.
In another aspect of the invention, the first and the second language models are N-gram models, and the renewer employs, as the renewal value, a weighted average calculated for the first and the second language models.
In another aspect of the invention, the symbol is a comma or a period under and subject to a speech recognition method.
In accordance with another aspect of the invention, a speech recognition method is provided which comprises: a transformation step of transforming sequences of phonemes extracted from an utterance into one or more word sequences, and of assigning to the word sequences appearance probabilities, in accordance with which the word sequences are originally represented by the phoneme sequences; a renewing step of renewing the appearance probability assigned to each of the word sequences by employing a renewal value represented by a language model corresponding to each of the word sequences; and a speech recognition step of selecting the word sequence having the highest appearance probability, by screening all the word sequences assigned appearance probabilities, according to which the word sequences are originally represented by the phoneme sequences, wherein, at the renewal step, the renewal value is calculated using a first language model which is employed when each of the word sequences always includes a specific symbol as a word and a second language model which is employed in other situations to renew the appearance probabilities based on the renewal value.
In another respect of the instant invention, there is provided is a recording medium for storing a program that controls a computer to perform: a transformation step of transforming sequences of phonemes extracted from an utterance into one or more word sequences, and of assigning to the word sequences appearance probabilities, in accordance with which the word sequences are originally represented by the phoneme sequences, a renewing step of renewing the appearance probability assigned to each of the word sequences by employing a renewal value represented by a language model corresponding to each of the word sequences; and a speech recognition step of selecting the word sequence having the highest appearance probability, by screening all the word sequences assigned appearance probabilities, according to which the word sequences are originally represented by the phoneme sequences, wherein, at the renewal step, the renewal value is calculated using a first language model which is employed when each of the word sequences always includes a specific symbol as a word and a second language model which is employed in other situations to renew the appearance probabilities based on the renewal value.
For a better understanding of the present invention, together with other and further features and advantages thereof, reference is made to the following description, taken in conjunction with the accompanying drawings, and the scope of the invention that will be pointed out in the appended claims.