The invention relates to a method for creating a speech database for a target vocabulary for training a speech recognition system.
There are speech recognition systems for a wide variety of applications. For example, in automatic dictation systems, speech recognition systems are used which can recognize a very extensive vocabulary but are usually embodied in a user-specific fashion, that is to say they can be used only by a single user who has trained the speech recognition system to his personal pronunciation. On the other hand, automatic switching systems in telephone equipment use speech recognition systems. These speech recognition systems require a significantly smaller vocabulary as, in telephone switching systems, for example only a small number of different words are spoken in order to set up a connection to a telephone subscriber.
A target vocabulary (application vocabulary) has been conventionally defined for speech recognition systems which are independent of speakers. Training texts which predominantly contain words from this target vocabulary are then composed. These training texts are spoken by speakers and recorded by a microphone. Such a training text can usually be spoken by 100 to 5000 speakers. The spoken texts are thus present as electrical speech signals. The texts to be spoken are also converted into their phonetic description. This phonetic description and the corresponding speech signals are fed to the speech recognition system during the training phase of the speech recognition system. By this, the speech recognition system learns the target vocabulary. As the target vocabulary has been spoken by a large number of speakers, the speech recognition system is independent of a particular individual speaker.
The generation of a special application with a predetermined target vocabulary and speaking by a plurality of speakers so that a speech database which is independent of speakers is created generally takes between two to six months. The generation of such application-specific speech databases gives rise to the greatest cost factor where an existing speech recognition system is adapted to a specific application. There is therefore a considerable need for a method with which a speech database for training a speech recognition system which is independent of a speaker can be generated cost-effectively and quickly.