In recent years, study of speech recognition of a spoken language is carried on vigorously. It is difficult to recognize a spoken language at high precision because of various causes such as acoustic vagueness like speaking idleness and diversity of word arrangement. As a technique for improving the recognition precision of the spoken language, a technique for utilizing a phenomenon grasped from the spoken language is proposed. As an example thereof, a technique obtained by paying attention to speaking rate as described in Non Patent Literature 1 described later can be mentioned.
Unlike mechanical read aloud speaking or word speaking, a spoken language of human being is rarely vocalized at a constant speaking rate. Therefore, the rate of the spoken language largely fluctuates during speaking. Especially when the speaking rate is fast, it is difficult for the mouth movement to follow the speaking and deformation occurs in voice. It is considered that such deformation largely affects the degradation of the recognition precision.
A technique for using dedicated acoustic models learned for a voice having a fast speaking rate by using only a voice having a fast speaking rate or using a dictionary having speaking deformations registered therein is described in Non Patent Literature 1. The technique described in Non Patent Literature 1 attempts to improve the recognition performance by using models dedicated for speaking rate.