Field of the Invention
The invention is directed to hidden Markov models for speech recognition systems, of a type suitable for use for a number of languages in that the acoustic and phonetic similarities between the different languages are exploited.
Description of the Prior Art
A great problem in speech recognition is comprised therein that new acoustic phonetic models must be trained for every language in which the speech recognition technology is to be introduced in order to be able to implement a national match. Hidden Markov models for modelling the language-specific sounds are usually employed in standard speech recognition systems. Acoustic word models that are recognized during a search process in the speech recognition procedure are subsequently compiled from these statistically modelled sound models. Very extensive speech data banks are required for training these sound models, the collection and editing of these representing an extremely cost-intensive and time-consuming process. Disadvantages thereby arise when transferring a speech recognition technology from one language into another language since the production of a new speech data bank means, on the one hand, that the product becomes more expensive and, one the other hand, causes a time delay in the market introduction.
Language-specific models are exclusively employed in standard purchasable speech recognition systems. Extensive speech data banks are collected and edited for transferring these systems into a new language. Subsequently, the sound models for the new language are re-trained from square one with these collected voice data.
In order to reduce the outlay and the time delay when transferring speech recognition systems into different languages, an examination should thus be made to see whether individual sound models are suitable for employment in different languages. The article by Dalsgaard et al. entitled "Identification of Mono- and Poly-phonemes using acoustic-phonetic Features derived by a self-organising Neural Network," in Proc. ICSLP '92, pages 547-550 discloses approaches for producing multilingual sound models and utilizing these in the speech recognition in the respective languages. The terms `polyphoneme` and `monophoneme` are also introduced there. The term polyphonemes means sounds whose sound formation properties are similar enough over several languages in order to be equated.
Monophonemes indicate sounds that exhibit language-specific properties. So that new speech data banks do not have to be trained every time for such development work and investigations, these are already available as a standard as described in "Data-driven Identification of Poly- and Mono-phonemes for four European Languages," Andersen et al., Proc. EUROSPEECH '93, pages 759-762 (1993); "ASCII Phonetic Symbols for the World's Languages: Worldbet." Hieronymus, preprint, (1993); and "The OGI Multi-language Telephone Speech Corpus", Cole et al., Proc. ICSLP '92, pages 895-898,(1992) The aforementioned article by Andersen et al. from Proc. EUROSPEECH '93discloses the employment of particular phonemes and hidden Markov sound models of these phonemes for a multilinguistic speech recognition.