The invention relates to a speech recognition device in accordance with the introductory part of claim 1 and also a speech recognition method in accordance with the introductory part of claim 6.
Such a speech recognition device and such a speech recognition method are known from U.S. Pat. No. 5,031,113. The known speech recognition device includes receiving means, which are formed by a microphone and an audio memory for receiving and storing a speech communication uttered by a speaker in a dictation.
The known speech recognition device further includes speech coefficient memory means in which a speech coefficient indicator is stored that is necessary for the execution of the speech recognition method. Such a speech coefficient indicator contains context information, speech model data and phoneme reference information. The context information contains all the words that can be recognized by the speech recognition device, the speech model data contains probabilities for the sequence of words of the context information in voice information, and the phoneme reference information contains information how a word portion (phoneme) is pronounced by a speaker.
The known speech recognition device further includes speech recognition means which are arranged, during the execution of the speech recognition method, for recognizing text information that corresponds to the received voice information by evaluating the speech coefficient indicator stored in the speech coefficient memory means and for delivering this text information as recognized text information. The recognized text information is displayed by a monitor.
A text processing program and a keyboard form correction means by which recognized text information displayed by the monitor is corrected and shown again on the monitor as corrected text information. Habitually, a user on the one hand replaces words of the erroneously recognized text information during the speech recognition process with the actually spoken words and, on the other hand, also makes other corrections. Such other corrections may be, for example, the insertion of a standard text portion such as, for example, an address, the insertion of text portions forgotten during the dictation, or the substitution of text information entered by means of the keyboard for a text part of the recognized text information.
The known speech recognition device includes adjusting means for adjusting the speech coefficient indicator by a better adjustment to the speaker and the language so as to recognize correctly from then on in a subsequent speech recognition operation words that had previously been recognized erroneously. For adjusting the context information and the speech model data, the corrected text information is evaluated and for adjusting the phoneme reference information, also the voice information stored in the audio memory is evaluated, to provide an adjusted speech coefficient indicator and store it in the speech coefficient memory means.
In the known speech recognition device and with the known speech recognition method it has proved to be disadvantageous that corrected text information used for the adjustment of the speech coefficient indicator also contains text parts that do not have any connection at all with the voice information. When such text parts are used for adjusting the speech coefficient indicator, it may happen that the speech coefficient indicator, after the adjustment, is not adjusted better, but worse to a speaker and the language.
It is an object of the invention to provide a speech recognition device and a speech recognition method in which only corrected text information is used for adjusting the speech coefficient indicator, which corrected text information has sufficient connection with the received voice information. This object is achieved with a speech recognition device as defined in the introductory part of claim 1 by the measures of the characterizing part of claim 1, and with a speech recognition method as defined in the introductory part of claim 6 by the measures of the characterizing part of claim 6.
This achieves that prior to the adjustment of the speech coefficient indicator, a test is made whether the corrected text information contains text words which were heavily corrected or inserted quite new, and in that such text words are not used for adjusting the speech coefficient indicator. Advantageously, after each adjustment of the speech coefficient indicator, the recognition rate of the speech recognition device and of the speech recognition method has therefore improved considerably.
According to the measures of claim 2 and claim 7, text words of the recognized text information which have sufficient connection with the received voice information or recognized text information respectively, are concatenated to sequences of text words. The sequence of the text words of the recognized text information that has the largest aggregate correspondence value is used for the adjustment. This offers the advantage that also a text word among text words having a large correspondence indicator is used for the adjustment and thus the recognition rate of the speech recognition device and of the speech recognition method is further improved with each adjustment of the speech coefficient indicator.
According to the measures of claims 3, 4 and 5, all the information contained in the speech coefficient indicator is adjusted very well.