1. Field of the Invention
The present invention relates to a continuous speech recognizing apparatus for recognizing continuous speech, and particularly to a continuous speech recognizing apparatus for carrying out speech recognition using a probabilistic language model and to a recording medium.
2 Description of the Related Art
As one of conventional continuous speech recognizing apparatuses for carrying out speech recognition using a probabilistic language model, an apparatus for recognizing continuous speech using a multiple-pass decoder is known. The system narrows the list of word candidates for speech to be recognized by carrying out time synchronous search in a first pass circuit using a simple model. Subsequently, it determines the word candidate in the list obtained in the first pass in a second pass circuit using a complex model after completing the speech (Imai, et al., Technical Report of Information Processing Society of Japan, SLP-23-11 (October 1998)). The inventors of the present application also proposed a continuous speech recognizing apparatus for carrying out time synchronous Viterbi beam search using a bigram in the first pass (Imai, et al., Proceedings of Autumn Meeting of the Acoustical Society of Japan, 3-1-12 (September 1998)).
The continuous speech recognizing apparatus carries out word-dependent N-best search of a tree structure phoneme network (see, R. Schwarz, et al., ICASSP-91, PP. 701-704 (May 1991)).
It obtains N-best sentences by recursively tracing back a word lattice composed of the end time of each word candidate, its score and a pointer to a first previous word (see, R. Schwarz, et al., ICASSP-91, PP. 701-704 (May 1991)). Then, it determines a maximum likelihood word string as a recognition result by rescoring the N-best sentences using a trigram.
When the one-pass processing of continuous speech is executed in such a multiple-pass continuous speech recognizing apparatus, the speech recognition candidates of the word string at the final position of the current time has a tendency to be different from the word string which will be obtained at a next time at the location corresponding to the word string at the final position of the present time. As a result, the speech recognition candidates of a sentence are unstable until the speech input of the sentence has completed, and hence the second pass circuit cannot determine the speech recognition result until then. This will cause a large time lag (delay) between the input instant of the speech and the output of the speech recognition result from the continuous speech recognizing apparatus.
Such a time lag will presents a problem when producing real time subtitles by recognizing speech broadcast in news programs.
Therefore, an object of the present invention is to provide a continuous speech recognizing apparatus and a recording medium capable of reducing in a multiple-pass speech recognizing apparatus the time lag between the input of speech and the output of the speech recognition result.
In the first aspect of the present invention, there is provided a continuous speech recognizing apparatus that obtains from input continuous speech a plurality of speech recognition candidates of a word string using a simple probabilistic language model in a first pass processor, and that determines a speech recognition result of the plurality of speech recognition candidates using a complex probabilistic language model in a second pass processor, wherein
the first pass processor obtains word strings of the plurality of speech recognition candidates of the continuous speech at fixed time intervals from an input start time, and
the second pass processor comprises:
word string selecting means for selecting, using the complex probabilistic language model, a maximum likelihood word string from among the word strings of the plurality of speech recognition candidates obtained at the fixed time intervals, and
speech recognition result determining means for detecting a stable portion in word strings detected at every fixed intervals, and for successively determining a word string of the stable portion as the speech recognition result.
Here, the speech recognition result determining means may comprise:
a comparator for comparing a first word string with a second word string, the first word string consisting of a word string currently detected by the word string selecting means with the exception of a final portion of the word string, and the second word string consisting of a speech recognition candidates previously obtained by the word string selecting means; and
a determining section for determining, when the comparator makes a decision that a same word string as the second word string is contained in the first word string, the second word string as the speech recognition result.
The first pass processor may obtain the plurality of speech recognition candidates by tracing back a word lattice beginning from a phoneme with a maximum score as of now when a plurality of speech recognition candidates of a word string are obtained by using the simple probabilistic language model.
Trace back timing of the word lattice may be made variable.
The first pass processor may trace back the word lattice beginning from a plurality of currently active phonemes.
In the second aspect of the present invention, there is provided a recording medium having a computer executable program code means for obtaining from input continuous speech a plurality of speech recognition candidates of a word string using a simple probabilistic language model in a first pass, and for determining a speech recognition result of the plurality of speech recognition candidates using a complex probabilistic language model in a second pass,
wherein
the first pass comprises a step of obtaining, beginning from an input start time, word strings of the plurality of speech recognition candidates of the continuous speech at fixed time intervals, and
the second pass comprises:
a word string selecting step of selecting, using the complex probabilistic language model, a maximum likelihood word string from among the word strings of the plurality of speech recognition candidates obtained at the fixed time intervals, and
speech recognition result determining step of detecting a stable portion in word strings detected at every fixed intervals, and of successively determining a word string of the stable portion as the speech recognition result.
Here, the speech recognition result determining step may comprise:
a comparing step of comparing a first word string with a second word string, the first word string consisting of a word string currently detected in the word string selecting step with the exception of a final portion of the word string, and the second word string consisting of speech recognition candidates previously obtained in the word string selecting step; and
a determining step of determining, when the comparing step makes a decision that a same word string as the second word string is contained in the first word string, the second word string as the speech recognition result.
The first pass may obtain the plurality of speech recognition candidates by tracing back a word lattice beginning from a phoneme with a maximum score as of now when a plurality of speech recognition candidates of a word string are obtained by using the simple probabilistic language model.
Trace back timing of the word lattice may be made variable.
The first pass may trace back the word lattice beginning from a plurality of currently active phonemes.
The present invention detects a stable portion of a maximum likelihood word string (1-best word string in the following embodiments) obtained by carrying out the two-pass processing successively, and makes it a partial speech recognition result. This makes it possible to successively determine the speech recognition result while inputting continuous speech. In addition, this makes it possible to reduce the time lag between the speech input and the subtitle output to a minimum with maintaining recognition accuracy of the speech, even when generating subtitles automatically by recognizing speech of television news.
The above and other objects, effects, features and advantages of the present invention will become more apparent from the following description of embodiments thereof taken in conjunction with the accompanying drawings.