1. Technical Field
The invention generally relates to the field of computer-assisted or computer-based speech recognition, and more specifically, to a method and system for improving recognition quality of a speech recognition system.
2. Description of the Related Art
Conventional speech recognition systems (SRSs), in a very simplified view, can include a database of word pronunciations linked with word spellings. Other supplementary mechanisms can be used to exploit relevant features of a language and the context of an utterance. These mechanisms can make a transcription more robust. Such elaborate mechanisms, however, will not prevent a SRS from failing to accurately recognize a spoken word when the database of words does not contain the word, or when a speaker's pronunciation of the word does not agree with the pronunciation entry in the database. Therefore, collecting and extending vocabularies is of prime importance for the improvement of SRSs.
Presently, vocabularies for SRSs are based on the analysis of large corpora of written documents. For languages where the correspondence between written and spoken language is not bijective, pronunciations have to be entered manually. This is a laborious and costly procedure.
U.S. Pat. No. 6,064,957 discloses a mechanism for improving speech recognition through text-based linguistic post-processing. Text data generated from a SRS and a corresponding true transcript of the speech recognition text data are collected and aligned by means of a text aligner. From the differences in alignment, a plurality of correction rules are generated by means of a rule generator coupled to the text aligner. The correction rules are then applied by a rule administrator to new text data generated from the SRS. The mechanism performs only a text-to-text alignment, and thus does not take the particular pronunciation of the spoken text into account. Accordingly, it needs the aforementioned rule administrator to apply the rules to new text data. The mechanism therefore cannot be executed fully automatically.
U.S. Pat. No. 6,078,885 discloses a technique which provides for verbal dictionary updates by end-users of the SRS. In particular, a user can revise the phonetic transcription of words in a phonetic dictionary, or add transcriptions for words not present in the dictionary. The method determines the phonetic transcription based on the word's spelling and the recorded preferred pronunciation, and updates the dictionary accordingly. Recognition performance is improved through the use of the updated dictionary.
The above discussed techniques, however, share the disadvantage of not being able to update a speech recognition vocabulary on large scale bodies of text with minimal technical effort and time. Accordingly, these techniques are not fully automated.