The present invention relates to speech recognition method and apparatus for recognizing unknown input speeches and, more particularly, to large vocabulary speech recognition method and apparatus which permit recognition of a large number of words.
For large vocabulary speech recognition, a method is extensively used, which relates to triphone HMMs (Hidden Markov Models). Specifically, this method uses "triphone units" as recognition units, which are each prepared for adjacent phonemes present as a phoneme unit in a word (or sentence). The "triphone HMM" is detailed in "Fundamentals of Speech Recognition, Part I, Part II, NTT Advanced Technology Co., Ltd, ISBN-4-900886-01-7" or "Fundamentals of Speech Recognition, Prentice Hall, ISBN-0-13-055157-2".
In the speech recognition based on triphone HMMs, however, as many different HMMs as the cube of the number of different phonemes are involved, and it is difficult to accurately estimate all the triphone HMMs. To reduce the number of the different triphone HMMs, top-down or bottom-up clustering or the like is adopted, as detailed in the references noted above. Where the number of HMMs is reduced, however, it is no longer possible to guarantee the best fitness of the HMMs as such. In addition, such problem as having resort to intelligence concerning unreliable phonemes is posed.