Acoustic models capable of evaluating fillers, disfluencies, and non-speech sounds at the same time as phonetic units are known. A speech recognition system to which such an acoustic model is applied is capable of removing fillers, disfluencies, and non-speech sounds in speech recognition, which increases recognition accuracy.
For correct recognition of fillers, disfluencies, and non-speech sounds, however, fragments including the fillers, the disfluencies, the non-speech sounds, and the like have to be registered as words in advance in a search model functioning as a recognition dictionary. Thus, in conventional speech recognition systems, a very high cost is required to register such fragments as words in a search model.