1. Field of the Invention
The present invention relates to model generation apparatus and methods. Embodiments of the present invention concern the generation of models for use in pattern recognition. In particular, embodiments of the present invention are applicable to speech recognition.
2. Description of Related Art
Speech recognition is a process by which an unknown speech utterance is identified. There are several different types of speech recognition systems currently available which can be categorised in several ways. For example, some systems are speaker dependent, whereas others are speaker independent. Some systems operate for a large vocabulary of words (>10,000 words) while others only operate with a limited sized vocabulary (<1000 words). Some systems can only recognise isolated words whereas others can recognise phrases comprising a series of connected words.
Hidden Markov models (HMM's) are typically used for the acoustic models in speech recognition systems. These consist of a number of states each of which are associated with a probability density function. Transitions between the different states are also associated with transition parameters.
Methods such as the Baum Welch algorithm such as is described in “Fundamentals of Speech Recognition” Rabiner & Hwang Juang, PTR Prentice Hall ISBN 0-13-15157-2 which is hereby incorporated by reference are often used to estimate the parameter values for hidden Markov models from training utterances. However, the Baum Welch algorithm requires the initial structure of the models including the number of states to be fixed before training can begin.
In a speaker dependent (SD) speech recognition, an end user is able to create a model for any word or phrase. In such a system the length of particular words or phrases which are to be modelled will not therefore be known in advance and an estimate of the required number of states must be made.
In U.S. Pat. No. 5,895,448 a system is described in which an estimate of the required number of states is based on the length of the phrase or word being modelled. Such an approach will however result in models having an inappropriate number of states where a word or phrase is acoustically more complex or less complex then expected.
There is therefore a need for apparatus and method which can discern an appropriate number of states to be included in a word or phrase models. Further there is a need for model generation systems which enables models to be generated simply and efficiently.