As a conventional voice recognition method, widely employed is a method of modeling acoustic characteristics with HMM and collating the same with a voice characteristic vector sequence expressing the voice characteristics at each frame having a certain temporal width, as described in Yukinori Takubo et al, 2004 “Science of languages 2, Voice”, Iwanami Shoten, (Takubo et al). In this voice recognition method, acoustic characteristics are modeled by HMM for each of a plurality of categories to be recognized and are collated with a voice characteristic vector sequence to find an HMM having a highest output probability of the voice characteristic vector sequence, and the category assigned to the HMM is outputted as a recognition result.
As a conventional method of efficiently reducing the number of times of calculation of the output probability, there is a method on the basis of a beam search, please see for example, Masaki Ida and Seiichi Nakagawa (1996), “Comparison between a beam search method and A* searching method in voice recognition”, The institute of Electronics, Information and Communication Engineers, Technical Report of “Voice” SP96-12, (to be referred as Ida and Nakagawa); and there is a method on the basis of standard frames (for example, Japanese Issued Patent No. 3251480).
However, the number of times of calculation of the output probability cannot be reduced efficiently only by simply combining the methods of reducing the number of times of calculation on the basis of the beam search and on the basis of the standard frame.