Conventional speech recognition systems commonly insert spurious short speech events (e.g., words, phones) within the correctly recognized word or phone sequence. This may occur in a variety of contexts such as, for example: large vocabulary speech recognition where short grammatical words are often mistakenly inserted; and baseform generation which is used to automatically generate phonetic baseforms, i.e., sequence(s) of phones representing the pronunciation(s) of words.
Conventional acoustics-only baseform generation systems, where the phonetic baseforms are generated from acoustic data only, are especially prone to spurious phone insertion as the acoustic properties of certain phones, such as fricatives and plosives, for example, can easily be confused with those of non-speech segments (e.g., short pauses).
As is known, conventional acoustics-only baseform generation systems generate a phonetic baseform by retrieving the sequence of phones that has the highest likelihood, where the likelihood of the sequence of phones is computed by accumulating the scores of the subphone units in the hypothesized sequence. The score of a subphone unit is a combination of an acoustic score (e.g., measuring the acoustic match between the acoustic observation and the acoustic model of the subphone unit) and of a transition score (e.g., measuring the transition match between the previous subphone unit and the current subphone unit).
Examples of existing acoustics-only baseform generation systems include those described in R. C. Rose et al., “Speech Recognition Using Automatically Derived Baseforms,” ICASSP 1997; and B. Ramabhadran et al., “Acoustics-Only Based Automatic Phonetic Baseform Generation,” ICASSP 1998, the disclosures of which are incorporated by reference herein. An example of an existing acoustics-only baseform generation system that filters the generated baseforms by using a set of phonological rules is described in B. Ramabhadran et al., “Phonological Rules for Enhancing Acoustic Enrollment of Unknown Words,” ICSLP 98, the disclosure of which is incorporated by reference herein.