1. Field of Invention
The invention relates to automatic speech recognition using Markov processes on multidimensional curves.
2. Description of Related Art
Variations in speaking rate currently present a serious challenge for automatic speech recognition (ASR). It is widely observed, for example, that fast speech is more prone to recognition errors than slow speech.
A related effect, occurring at the phoneme level, is that consonants are more frequently misinterpreted than vowels. Consonants have short-lived, non-stationary acoustic signatures, while vowels have the opposite, namely stationary acoustic signatures. Thus, at the phoneme level, the error rate for recognition of consonants may be significantly increased as a consequence of locally fast speech.
A method and apparatus for speech recognition using Markov processes on curves is presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition.
As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.
These and other features and advantages of this invention are described in or are apparent from the following detailed description of the preferred embodiments.